[tahoe-lafs-trac-stream] [tahoe-lafs] #812: server-side crawlers: tolerate corrupted shares, verify shares
tahoe-lafs
trac at tahoe-lafs.org
Thu Nov 14 19:06:57 UTC 2013
#812: server-side crawlers: tolerate corrupted shares, verify shares
------------------------------+-------------------------
Reporter: zooko | Owner: warner
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: code-storage | Version: 1.4.1
Resolution: | Keywords: reliability
Launchpad Bug: |
------------------------------+-------------------------
Old description:
> From twistd.log on prodtahoe17 data6:
>
> {{{
> 2009/09/25 13:00 -0700 [-] Log opened.
> 2009/09/25 13:00 -0700 [-] twistd 2.5.0 (/usr/bin/python 2.5.2) starting
> up
> 2009/09/25 13:00 -0700 [-] reactor class: <class
> 'twisted.internet.selectreactor.SelectReactor'>
> 2009/09/25 13:00 -0700 [-] Loading tahoe-client.tac...
> 2009-09-25 20:01:14.954Z [-] Loaded.
> 2009-09-25 20:01:14.956Z [-] foolscap.pb.Listener starting on 39324
> 2009-09-25 20:01:14.956Z [-] twisted.conch.manhole_ssh.ConchFactory
> starting on 8226
> 2009-09-25 20:01:14.956Z [-] Starting factory
> <twisted.conch.manhole_ssh.ConchFactory instance at 0x8bfe2cc>
> 2009-09-25 20:01:14.957Z [-] nevow.appserver.NevowSite starting on 9006
> 2009-09-25 20:01:14.957Z [-] Starting factory <nevow.appserver.NevowSite
> instance at 0x8db516c>
> 2009-09-25 20:01:14.957Z [-] Manhole listening via SSH on port 8226
> 2009-09-25 20:01:14.958Z [-] twisted.internet.protocol.DatagramProtocol
> starting on 35896
> 2009-09-25 20:01:14.958Z [-] Starting protocol
> <twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
> 2009-09-25 20:01:14.960Z [-] (Port 35896 Closed)
> 2009-09-25 20:01:14.961Z [-] Stopping protocol
> <twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
> 2009-09-27 12:57:40.124Z [-] lease-checker error processing
> /data6/storage/storage/shares/g6/g6rvkc5jdkgoqhljuxgkquzhvq/5
> 2009-09-27 12:57:40.130Z [-] Unhandled Error
> Traceback (most recent call last):
> File "/usr/lib/python2.5/site-
> packages/twisted/internet/base.py", line 561, in runUntilCurrent
> call.func(*call.args, **call.kw)
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/crawler.py", line 262, in start_slice
> self.start_current_prefix(start_slice)
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/crawler.py", line 321, in start_current_prefix
> buckets, start_slice)
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/crawler.py", line 361, in process_prefixdir
> self.process_bucket(cycle, prefix, prefixdir, bucket)
> --- <exception caught here> ---
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/expirer.py", line 128, in process_bucket
> wks = self.process_share(sharefile)
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/expirer.py", line 171, in process_share
> for li in sf.get_leases():
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 242, in get_leases
> for i, lease in self._enumerate_leases(f):
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 247, in _enumerate_leases
> for i in range(self._get_num_lease_slots(f)):
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 227, in _get_num_lease_slots
> num_extra_leases = self._read_num_extra_leases(f)
> File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 129, in
> _read_num_extra_leases
> (num_extra_leases,) = struct.unpack(">L", f.read(4))
> File "/usr/lib/python2.5/struct.py", line 87, in unpack
> return o.unpack(s)
> struct.error: unpack requires a string argument of length 4
> }}}
>
> {{{
> $ tahoe --version
> tahoe-server: 1.4.1, foolscap: 0.4.2, pycryptopp: 0.5.16-r669, zfec:
> 1.4.0-4, Twisted: 2.5.0, Nevow: 0.9.26, zope.interface: 3.3.1, python:
> 2.5.2, platform: Linux-Ubuntu_8.04-i686-32bit, pyutil: 1.3.20,
> simplejson: 1.7.3, argparse: 0.8.0, pyOpenSSL: 0.6, z-base-32: 1.0.1,
> setuptools: 0.6c8
> }}}
>
> No incident logs.
New description:
From twistd.log on prodtahoe17 data6:
{{{
2009/09/25 13:00 -0700 [-] Log opened.
2009/09/25 13:00 -0700 [-] twistd 2.5.0 (/usr/bin/python 2.5.2) starting
up
2009/09/25 13:00 -0700 [-] reactor class: <class
'twisted.internet.selectreactor.SelectReactor'>
2009/09/25 13:00 -0700 [-] Loading tahoe-client.tac...
2009-09-25 20:01:14.954Z [-] Loaded.
2009-09-25 20:01:14.956Z [-] foolscap.pb.Listener starting on 39324
2009-09-25 20:01:14.956Z [-] twisted.conch.manhole_ssh.ConchFactory
starting on 8226
2009-09-25 20:01:14.956Z [-] Starting factory
<twisted.conch.manhole_ssh.ConchFactory instance at 0x8bfe2cc>
2009-09-25 20:01:14.957Z [-] nevow.appserver.NevowSite starting on 9006
2009-09-25 20:01:14.957Z [-] Starting factory <nevow.appserver.NevowSite
instance at 0x8db516c>
2009-09-25 20:01:14.957Z [-] Manhole listening via SSH on port 8226
2009-09-25 20:01:14.958Z [-] twisted.internet.protocol.DatagramProtocol
starting on 35896
2009-09-25 20:01:14.958Z [-] Starting protocol
<twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
2009-09-25 20:01:14.960Z [-] (Port 35896 Closed)
2009-09-25 20:01:14.961Z [-] Stopping protocol
<twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
2009-09-27 12:57:40.124Z [-] lease-checker error processing
/data6/storage/storage/shares/g6/g6rvkc5jdkgoqhljuxgkquzhvq/5
2009-09-27 12:57:40.130Z [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.5/site-
packages/twisted/internet/base.py", line 561, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.5/site-
packages/allmydata/storage/crawler.py", line 262, in start_slice
self.start_current_prefix(start_slice)
File "/usr/lib/python2.5/site-
packages/allmydata/storage/crawler.py", line 321, in start_current_prefix
buckets, start_slice)
File "/usr/lib/python2.5/site-
packages/allmydata/storage/crawler.py", line 361, in process_prefixdir
self.process_bucket(cycle, prefix, prefixdir, bucket)
--- <exception caught here> ---
File "/usr/lib/python2.5/site-
packages/allmydata/storage/expirer.py", line 128, in process_bucket
wks = self.process_share(sharefile)
File "/usr/lib/python2.5/site-
packages/allmydata/storage/expirer.py", line 171, in process_share
for li in sf.get_leases():
File "/usr/lib/python2.5/site-
packages/allmydata/storage/mutable.py", line 242, in get_leases
for i, lease in self._enumerate_leases(f):
File "/usr/lib/python2.5/site-
packages/allmydata/storage/mutable.py", line 247, in _enumerate_leases
for i in range(self._get_num_lease_slots(f)):
File "/usr/lib/python2.5/site-
packages/allmydata/storage/mutable.py", line 227, in _get_num_lease_slots
num_extra_leases = self._read_num_extra_leases(f)
File "/usr/lib/python2.5/site-
packages/allmydata/storage/mutable.py", line 129, in
_read_num_extra_leases
(num_extra_leases,) = struct.unpack(">L", f.read(4))
File "/usr/lib/python2.5/struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 4
}}}
{{{
$ tahoe --version
tahoe-server: 1.4.1, foolscap: 0.4.2, pycryptopp: 0.5.16-r669, zfec:
1.4.0-4, Twisted: 2.5.0, Nevow: 0.9.26, zope.interface: 3.3.1, python:
2.5.2, platform: Linux-Ubuntu_8.04-i686-32bit, pyutil: 1.3.20, simplejson:
1.7.3, argparse: 0.8.0, pyOpenSSL: 0.6, z-base-32: 1.0.1, setuptools:
0.6c8
}}}
No incident logs.
--
Comment (by zooko):
#1834 would remove the lease-checking crawler and bucket-counting
crawlers, making this ticket irrelevant. However, we ''might'' then want
to invent a share-verifying crawler, just for the purpose of looking for
corrupted shares, which would make this ticket relevant again.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/812#comment:6>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list