[tahoe-lafs-trac-stream] [tahoe-lafs] #1280: deal with fragile, but disposable, bucket state files
tahoe-lafs
trac at tahoe-lafs.org
Tue Jul 9 14:25:47 UTC 2013
#1280: deal with fragile, but disposable, bucket state files
--------------------------------+--------------------------------
Reporter: francois | Owner: zooko
Type: defect | Status: reopened
Priority: normal | Milestone: 1.11.0
Component: code-nodeadmin | Version: 1.8.1
Resolution: | Keywords: pickle reliability
Launchpad Bug: |
--------------------------------+--------------------------------
Old description:
> If a bucket state file can't be loaded and parsed for any reason (usually
> because it is 0-length, but any other sort of error should also be
> handled similarly), then just blow it away and start fresh.
>
> ------- original post by François below:
>
> After a hard system shutdown due to power failure, Tahoe node might not
> be able to start again automatically because files
> '''storage/bucket_counter.state''' or '''storage/lease_checker.state'''
> are empty.
>
> The easy workaround is to manually delete the empty files before
> restarting nodes.
>
> {{{
> find /srv/tahoe/*/storage/bucket_counter.state -size 0 -exec rm {} \;
> find /srv/tahoe/*/storage/lease_checker.state -size 0 -exec rm {} \;
> }}}
>
> Here is what a startup attempt looks like in such case.
>
> {{{
> Traceback (most recent call last):
> File "/usr/lib/python2.5/site-packages/twisted/application/app.py",
> line 614, in run
> runApp(config)
> File "/usr/lib/python2.5/site-packages/twisted/scripts/twistd.py", line
> 23, in runApp
> _SomeApplicationRunner(config).run()
> File "/usr/lib/python2.5/site-packages/twisted/application/app.py",
> line 330, in run
> self.application = self.createOrGetApplication()
> File "/usr/lib/python2.5/site-packages/twisted/application/app.py",
> line 416, in createOrGetApplication
> application = getApplication(self.config, passphrase)
> --- <exception caught here> ---
> File "/usr/lib/python2.5/site-packages/twisted/application/app.py",
> line 427, in getApplication
> application = service.loadApplication(filename, style, passphrase)
> File "/usr/lib/python2.5/site-packages/twisted/application/service.py",
> line 368, in loadApplication
> application = sob.loadValueFromFile(filename, 'application',
> passphrase)
> File "/usr/lib/python2.5/site-packages/twisted/persisted/sob.py", line
> 214, in loadValueFromFile
> exec fileObj in d, d
> File "tahoe-client.tac", line 10, in <module>
> c = client.Client()
> File "/opt/tahoe-lafs/src/allmydata/client.py", line 140, in __init__
> self.init_storage()
> File "/opt/tahoe-lafs/src/allmydata/client.py", line 269, in
> init_storage
> expiration_sharetypes=expiration_sharetypes)
> File "/opt/tahoe-lafs/src/allmydata/storage/server.py", line 97, in
> __init__
> self.add_bucket_counter()
> File "/opt/tahoe-lafs/src/allmydata/storage/server.py", line 114, in
> add_bucket_counter
> self.bucket_counter = BucketCountingCrawler(self, statefile)
> File "/opt/tahoe-lafs/src/allmydata/storage/crawler.py", line 449, in
> __init__
> ShareCrawler.__init__(self, server, statefile)
> File "/opt/tahoe-lafs/src/allmydata/storage/crawler.py", line 86, in
> __init__
> self.load_state()
> File "/opt/tahoe-lafs/src/allmydata/storage/crawler.py", line 195, in
> load_state
> state = pickle.load(f)
> exceptions.EOFError:
> }}}
New description:
If a bucket state file can't be loaded and parsed for any reason (usually
because it is 0-length, but any other sort of error should also be handled
similarly), then just blow it away and start fresh.
------- original post by François below:
After a hard system shutdown due to power failure, Tahoe node might not be
able to start again automatically because files
'''storage/bucket_counter.state''' or '''storage/lease_checker.state'''
are empty.
The easy workaround is to manually delete the empty files before
restarting nodes.
{{{
find /srv/tahoe/*/storage/bucket_counter.state -size 0 -exec rm {} \;
find /srv/tahoe/*/storage/lease_checker.state -size 0 -exec rm {} \;
}}}
Here is what a startup attempt looks like in such case.
{{{
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/twisted/application/app.py", line
614, in run
runApp(config)
File "/usr/lib/python2.5/site-packages/twisted/scripts/twistd.py", line
23, in runApp
_SomeApplicationRunner(config).run()
File "/usr/lib/python2.5/site-packages/twisted/application/app.py", line
330, in run
self.application = self.createOrGetApplication()
File "/usr/lib/python2.5/site-packages/twisted/application/app.py", line
416, in createOrGetApplication
application = getApplication(self.config, passphrase)
--- <exception caught here> ---
File "/usr/lib/python2.5/site-packages/twisted/application/app.py", line
427, in getApplication
application = service.loadApplication(filename, style, passphrase)
File "/usr/lib/python2.5/site-packages/twisted/application/service.py",
line 368, in loadApplication
application = sob.loadValueFromFile(filename, 'application',
passphrase)
File "/usr/lib/python2.5/site-packages/twisted/persisted/sob.py", line
214, in loadValueFromFile
exec fileObj in d, d
File "tahoe-client.tac", line 10, in <module>
c = client.Client()
File "/opt/tahoe-lafs/src/allmydata/client.py", line 140, in __init__
self.init_storage()
File "/opt/tahoe-lafs/src/allmydata/client.py", line 269, in
init_storage
expiration_sharetypes=expiration_sharetypes)
File "/opt/tahoe-lafs/src/allmydata/storage/server.py", line 97, in
__init__
self.add_bucket_counter()
File "/opt/tahoe-lafs/src/allmydata/storage/server.py", line 114, in
add_bucket_counter
self.bucket_counter = BucketCountingCrawler(self, statefile)
File "/opt/tahoe-lafs/src/allmydata/storage/crawler.py", line 449, in
__init__
ShareCrawler.__init__(self, server, statefile)
File "/opt/tahoe-lafs/src/allmydata/storage/crawler.py", line 86, in
__init__
self.load_state()
File "/opt/tahoe-lafs/src/allmydata/storage/crawler.py", line 195, in
load_state
state = pickle.load(f)
exceptions.EOFError:
}}}
--
Comment (by daira):
I suspect that changeset
[changeset:3cb99364e6a83d0064d2838a0c470278903e19ac/trunk 3cb99364]
effectively fixed this ticket. (It doesn't delete a corrupted state file,
but it does use default values, and the corrupted file will be overwritten
on the next crawler pass.) Please close this ticket as fixed in milestone
1.9.2 if you agree.
See also #1290.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1280#comment:12>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list