Opened at 2014-06-27T18:59:33Z
Last modified at 2020-10-30T12:35:44Z
#2251 closed defect
Database full or disk full error, stops backups — at Version 4
Reported by: | CyberAxe | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | code-storage | Version: | 1.10.0 |
Keywords: | database full space logging | Cc: | zooko |
Launchpad Bug: |
Description (last modified by daira)
http://sebsauvage.net/paste/?86d37574a47fd008#Xcj23YbPlxSMZ63yIVfune6X+kxkHQSCQ9Hm7kJmqkI=
[...] C:\allmydata-tahoe-1.10.0\bin>Backup-Now C:\allmydata-tahoe-1.10.0\bin>ECHO OFF Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 211, in process newdircap = mkdir(create_contents, self.options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 47, in mkdir raise HTTPError("Error during mkdir", resp) HTTPError: Error during mkdir: 500 Internal Server Error "Traceback (most recent call last): File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 753, in receiveClose self.request.fail(f) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\foolscap-0.6.4-py2.7.egg\\foolscap\\call.py\", line 95, in fail self.deferred.errback(why) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 42 2, in errback self._startRunCallbacks(fail) File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 48 9, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File \"C:\\allmydata-tahoe-1.10.0\\support\\Lib\\site-packages\\twisted-12.3.0-py2.7-win-amd64.egg\\twisted\\internet\\defer.py\", line 57 6, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 604, in _got_response return self._loop() File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 516, in _loop return self._failed(msg) File \"c:\\allmydata-tahoe-1.10.0\\src\\allmydata\\immutable\\upload.py\", line 617, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: server selection failed for <Tahoe2ServerSelector for upload 66ww6>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the f ile. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being f ull and 1 placed none due to an error)) (last failure (from <ServerTracker for server xfia4afr and SI 66ww6>) was: [Failure instance: Traceb ack (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from re mote host -- Traceback (most recent call last): File \"/home/customer/LAFS_source/src/allmydata/storage/server.py\", line 258, in <lambda> canary)) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_backend.py\", line 175, in make_bucket_writer d.addCallback(lambda ign: BucketWriter(account, immsh, canary)) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 306, in addCallback callbackKeywords=kw) File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 295, in addCallbacks self._runCallbacks() --- <exception caught here> --- File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File \"/home/customer/LAFS_source/src/allmydata/storage/backends/cloud/cloud_backend.py\", line 175, in <lambda> d.addCallback(lambda ign: BucketWriter(account, immsh, canary)) File \"/home/customer/LAFS_source/src/allmydata/storage/bucket.py\", line 28, in __init__ share.get_allocated_data_length(), SHARETYPE_IMMUTABLE) File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 54, in add_share self._leasedb.add_new_share(storage_index, shnum, used_space, sharetype) File \"/home/customer/LAFS_source/src/allmydata/storage/leasedb.py\", line 176, in add_new_share (si_s, shnum, prefix, backend_key, used_space, sharetype, STATE_COMING)) sqlite3.OperationalError: database or disk is full ]'> ]) "
This was trying to backup 1 very small test file. I have 38GB of space locally. On windows
Change History (4)
comment:1 Changed at 2014-06-30T19:48:09Z by zooko
comment:2 Changed at 2014-06-30T22:07:43Z by Zancas
After the issue was reported to me I began investigating the state of CyberAxe's server. My investigation began at ~ 2014-06-27T19:45Z.
I ssh'd in to the server and ran:
Last login: Mon Apr 7 21:32:11 2014 from 184-96-237-182.hlrn.qwest.net ubuntu@ip-10-185-214-61:~$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 8256952 7838464 0 100% / udev 299036 8 299028 1% /dev tmpfs 60948 160 60788 1% /run none 5120 0 5120 0% /run/lock none 304720 0 304720 0% /run/shm
Here's how big the storageserver/storage db's were:
customer@ip-10-185-214-61:~/storageserver/storage$ ls -l ./* -rw------- 1 customer customer 0 May 1 09:05 ./accounting_crawler.state -rw------- 1 customer customer 16384 Apr 30 14:27 ./bucket_counter.state -rw-r--r-- 1 customer customer 21267456 Apr 14 05:16 ./leasedb.sqlite -rw-r--r-- 1 customer customer 32768 May 26 08:46 ./leasedb.sqlite-shm -rw-r--r-- 1 customer customer 1246104 May 25 08:55 ./leasedb.sqlite-wal
Here's what I pasted into IRC as the result of an experiment using 'du':
<zancas> customer@ip-10-185-214-61:~$ du -sh {storageserver,introducer}/logs [14:01] <zancas> 6.1G storageserver/logs <zancas> 20K introducer/logs <zancas> <zooko> aha <zancas> customer@ip-10-185-214-61:~/storageserver/logs$ pwd <zancas> /home/customer/storageserver/logs <zancas> customer@ip-10-185-214-61:~/storageserver/logs$ ls -l twistd.log.999 <zancas> -rw------- 1 customer customer 1000057 Apr 26 20:43 twistd.log.999
I bzipp2'd and tape-archived the log files, and scp'd them to my local dev machine.
Here's a 'df' I ran on Saturday, before informing CyberAxe that we had a fix to test:
customer@ip-10-185-214-61:~/storageserver/logs$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 1.5G 6.1G 19% / udev 293M 8.0K 293M 1% /dev tmpfs 60M 160K 60M 1% /run none 5.0M 0 5.0M 0% /run/lock none 298M 0 298M 0% /run/shm
Here's the result of a df run at about 2014-06-30T22:0930Z:
customer@ip-10-185-214-61:~/storageserver/logs$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 1.5G 6.1G 20% / udev 293M 8.0K 293M 1% /dev tmpfs 60M 160K 60M 1% /run none 5.0M 0 5.0M 0% /run/lock none 298M 0 298M 0% /run/shm customer@ip-10-185-214-61:~/storageserver/logs$
If I Understand Correctly the second df run above occurred *after* CyberAxe attempted another backup, and it failed in ticket:2254.
comment:3 Changed at 2014-07-11T17:26:50Z by CyberAxe
I tested this and can now backup/restore files after Za cleaned out the servers log files.
comment:4 Changed at 2014-07-24T00:25:47Z by daira
- Cc daira removed
- Component changed from unknown to code-frontend-cli
- Description modified (diff)
- Keywords file removed
This is probably related to #2254 -- it is happening on the same server as #2254 is and both error messages have to do with the server's filesystem being full.