[tahoe-lafs-trac-stream] [tahoe-lafs] #812: server-side crawlers: tolerate corrupted shares, verify shares

tahoe-lafs trac at tahoe-lafs.org
Thu Nov 14 19:06:57 UTC 2013


#812: server-side crawlers: tolerate corrupted shares, verify shares
------------------------------+-------------------------
     Reporter:  zooko         |      Owner:  warner
         Type:  defect        |     Status:  new
     Priority:  major         |  Milestone:  undecided
    Component:  code-storage  |    Version:  1.4.1
   Resolution:                |   Keywords:  reliability
Launchpad Bug:                |
------------------------------+-------------------------

Old description:

> From twistd.log on prodtahoe17 data6:
>
> {{{
> 2009/09/25 13:00 -0700 [-] Log opened.
> 2009/09/25 13:00 -0700 [-] twistd 2.5.0 (/usr/bin/python 2.5.2) starting
> up
> 2009/09/25 13:00 -0700 [-] reactor class: <class
> 'twisted.internet.selectreactor.SelectReactor'>
> 2009/09/25 13:00 -0700 [-] Loading tahoe-client.tac...
> 2009-09-25 20:01:14.954Z [-] Loaded.
> 2009-09-25 20:01:14.956Z [-] foolscap.pb.Listener starting on 39324
> 2009-09-25 20:01:14.956Z [-] twisted.conch.manhole_ssh.ConchFactory
> starting on 8226
> 2009-09-25 20:01:14.956Z [-] Starting factory
> <twisted.conch.manhole_ssh.ConchFactory instance at 0x8bfe2cc>
> 2009-09-25 20:01:14.957Z [-] nevow.appserver.NevowSite starting on 9006
> 2009-09-25 20:01:14.957Z [-] Starting factory <nevow.appserver.NevowSite
> instance at 0x8db516c>
> 2009-09-25 20:01:14.957Z [-] Manhole listening via SSH on port 8226
> 2009-09-25 20:01:14.958Z [-] twisted.internet.protocol.DatagramProtocol
> starting on 35896
> 2009-09-25 20:01:14.958Z [-] Starting protocol
> <twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
> 2009-09-25 20:01:14.960Z [-] (Port 35896 Closed)
> 2009-09-25 20:01:14.961Z [-] Stopping protocol
> <twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
> 2009-09-27 12:57:40.124Z [-] lease-checker error processing
> /data6/storage/storage/shares/g6/g6rvkc5jdkgoqhljuxgkquzhvq/5
> 2009-09-27 12:57:40.130Z [-] Unhandled Error
>         Traceback (most recent call last):
>           File "/usr/lib/python2.5/site-
> packages/twisted/internet/base.py", line 561, in runUntilCurrent
>             call.func(*call.args, **call.kw)
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/crawler.py", line 262, in start_slice
>             self.start_current_prefix(start_slice)
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/crawler.py", line 321, in start_current_prefix
>             buckets, start_slice)
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/crawler.py", line 361, in process_prefixdir
>             self.process_bucket(cycle, prefix, prefixdir, bucket)
>         --- <exception caught here> ---
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/expirer.py", line 128, in process_bucket
>             wks = self.process_share(sharefile)
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/expirer.py", line 171, in process_share
>             for li in sf.get_leases():
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 242, in get_leases
>             for i, lease in self._enumerate_leases(f):
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 247, in _enumerate_leases
>             for i in range(self._get_num_lease_slots(f)):
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 227, in _get_num_lease_slots
>             num_extra_leases = self._read_num_extra_leases(f)
>           File "/usr/lib/python2.5/site-
> packages/allmydata/storage/mutable.py", line 129, in
> _read_num_extra_leases
>             (num_extra_leases,) = struct.unpack(">L", f.read(4))
>           File "/usr/lib/python2.5/struct.py", line 87, in unpack
>             return o.unpack(s)
>         struct.error: unpack requires a string argument of length 4
> }}}
>
> {{{
> $ tahoe --version
> tahoe-server: 1.4.1, foolscap: 0.4.2, pycryptopp: 0.5.16-r669, zfec:
> 1.4.0-4, Twisted: 2.5.0, Nevow: 0.9.26, zope.interface: 3.3.1, python:
> 2.5.2, platform: Linux-Ubuntu_8.04-i686-32bit, pyutil: 1.3.20,
> simplejson: 1.7.3, argparse: 0.8.0, pyOpenSSL: 0.6, z-base-32: 1.0.1,
> setuptools: 0.6c8
> }}}
>
> No incident logs.

New description:

 From twistd.log on prodtahoe17 data6:

 {{{
 2009/09/25 13:00 -0700 [-] Log opened.
 2009/09/25 13:00 -0700 [-] twistd 2.5.0 (/usr/bin/python 2.5.2) starting
 up
 2009/09/25 13:00 -0700 [-] reactor class: <class
 'twisted.internet.selectreactor.SelectReactor'>
 2009/09/25 13:00 -0700 [-] Loading tahoe-client.tac...
 2009-09-25 20:01:14.954Z [-] Loaded.
 2009-09-25 20:01:14.956Z [-] foolscap.pb.Listener starting on 39324
 2009-09-25 20:01:14.956Z [-] twisted.conch.manhole_ssh.ConchFactory
 starting on 8226
 2009-09-25 20:01:14.956Z [-] Starting factory
 <twisted.conch.manhole_ssh.ConchFactory instance at 0x8bfe2cc>
 2009-09-25 20:01:14.957Z [-] nevow.appserver.NevowSite starting on 9006
 2009-09-25 20:01:14.957Z [-] Starting factory <nevow.appserver.NevowSite
 instance at 0x8db516c>
 2009-09-25 20:01:14.957Z [-] Manhole listening via SSH on port 8226
 2009-09-25 20:01:14.958Z [-] twisted.internet.protocol.DatagramProtocol
 starting on 35896
 2009-09-25 20:01:14.958Z [-] Starting protocol
 <twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
 2009-09-25 20:01:14.960Z [-] (Port 35896 Closed)
 2009-09-25 20:01:14.961Z [-] Stopping protocol
 <twisted.internet.protocol.DatagramProtocol instance at 0x8db576c>
 2009-09-27 12:57:40.124Z [-] lease-checker error processing
 /data6/storage/storage/shares/g6/g6rvkc5jdkgoqhljuxgkquzhvq/5
 2009-09-27 12:57:40.130Z [-] Unhandled Error
         Traceback (most recent call last):
           File "/usr/lib/python2.5/site-
 packages/twisted/internet/base.py", line 561, in runUntilCurrent
             call.func(*call.args, **call.kw)
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/crawler.py", line 262, in start_slice
             self.start_current_prefix(start_slice)
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/crawler.py", line 321, in start_current_prefix
             buckets, start_slice)
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/crawler.py", line 361, in process_prefixdir
             self.process_bucket(cycle, prefix, prefixdir, bucket)
         --- <exception caught here> ---
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/expirer.py", line 128, in process_bucket
             wks = self.process_share(sharefile)
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/expirer.py", line 171, in process_share
             for li in sf.get_leases():
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/mutable.py", line 242, in get_leases
             for i, lease in self._enumerate_leases(f):
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/mutable.py", line 247, in _enumerate_leases
             for i in range(self._get_num_lease_slots(f)):
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/mutable.py", line 227, in _get_num_lease_slots
             num_extra_leases = self._read_num_extra_leases(f)
           File "/usr/lib/python2.5/site-
 packages/allmydata/storage/mutable.py", line 129, in
 _read_num_extra_leases
             (num_extra_leases,) = struct.unpack(">L", f.read(4))
           File "/usr/lib/python2.5/struct.py", line 87, in unpack
             return o.unpack(s)
         struct.error: unpack requires a string argument of length 4
 }}}

 {{{
 $ tahoe --version
 tahoe-server: 1.4.1, foolscap: 0.4.2, pycryptopp: 0.5.16-r669, zfec:
 1.4.0-4, Twisted: 2.5.0, Nevow: 0.9.26, zope.interface: 3.3.1, python:
 2.5.2, platform: Linux-Ubuntu_8.04-i686-32bit, pyutil: 1.3.20, simplejson:
 1.7.3, argparse: 0.8.0, pyOpenSSL: 0.6, z-base-32: 1.0.1, setuptools:
 0.6c8
 }}}

 No incident logs.

--

Comment (by zooko):

 #1834 would remove the lease-checking crawler and bucket-counting
 crawlers, making this ticket irrelevant. However, we ''might'' then want
 to invent a share-verifying crawler, just for the purpose of looking for
 corrupted shares, which would make this ticket relevant again.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/812#comment:6>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list