[tahoe-dev] [tahoe-lafs] #738: failure in block hash tree

tahoe-lafs trac at allmydata.org
Wed Jun 17 20:20:51 PDT 2009


#738: failure in block hash tree
-------------------------------+--------------------------------------------
     Reporter:  midnightmagic  |        Type:  defect       
       Status:  new            |    Priority:  critical     
    Milestone:  1.5.0          |   Component:  code-encoding
      Version:  1.4.1          |    Keywords:  integrity    
Launchpad_bug:                 |  
-------------------------------+--------------------------------------------

Old description:

> Running tahoe on the machine in which python2.5 setup.py test fails as
> reported in trac ticket#737, generates the attached incident report.
>
> Brief summary from flog debugger viewer:
>
> <ValidatedReadBucketProxy #1>(2-<ReadBucketProxy 150097484 to peer
> [mgnbpxki] SI jow42sylefxjxsns3alv5ptghe>-m2st7xk76cvd): hash failure in
> block=0, shnum=2 on <ReadBucketProxy 150097484 to peer [mgnbpxki] SI
> jow42sylefxjxsns3alv5ptghe>
>
> [...]
>
> <BlockDownloader #2>(<ValidatedReadBucketProxy #3>-2): failure to get
> block
>
> [... etc ...]
>
> failure in block hash tree
>
> .. and so on.
>
> Actual cli error message is:
>
> tahoe get
> URI:CHK:lapry55oui4psmeiyxhvitfmpi:75mb37to6iauypych6bkqfkxxxfk6nhekhomzipkqzwt46v64hdq:3:5:99758080
> meh
> Error, got 410 Gone
> NotEnoughSharesError: no shares could be found. Zero shares usually
> indicates a corrupt URI, or that no servers were connected, but it might
> also indicate severe corruption. You should perform a filecheck on this
> object to learn more.
>
> Finally, dump-share on the 33MB file:
>
> share filename:
> /v/tahoe/.tahoe/storage/shares/jo/jow42sylefxjxsns3alv5ptghe/2
>              version: 1
>            file_size: 99758080
>         num_segments: 762
>         segment_size: 131073
>        needed_shares: 3
>         total_shares: 5
>
>           codec_name: crs
>         codec_params: 131073-3-5
>    tail_codec_params: 11529-3-5
>
>       crypttext_hash:
> zodzh33f7mnowsxine5mzejiahlxsilgggpxmop5bfrh4zzzdpha
>  crypttext_root_hash:
> nuqsysg5zqkz5nsvpi32n5h6h5ilbepsbvmssji2xv773kqw53tq
>      share_root_hash:
> m2st7xk76cvdutgf5lzmkdzbf72h75cxpkytwjegi5jgntir3u5q
>             UEB_hash:
> 75mb37to6iauypych6bkqfkxxxfk6nhekhomzipkqzwt46v64hdq
>           verify-cap: URI:CHK-
> Verifier:jow42sylefxjxsns3alv5ptghe:75mb37to6iauypych6bkqfkxxxfk6nhekhomzipkqzwt46v64hdq:3:5:99758080
>
>  Size of data within the share:
>                 data: 33252694
>        uri-extension: 325
>           validation: 196648
>
>  Lease #0: owner=0, expire in 2607324s (30 days)
>
> Machine details:
>
> NetBSD quickie 4.99.7 NetBSD 4.99.7 (quickie) #0: Tue Jan 2 14:47:23 PST
> 2007 root at quickie:/v/src/sys/arch/i386/compile/quickie i386
>
> AMD Athlon(tm) XP 2500+ (single-core, 32-bit) 2.5GB RAM
>
> Python 2.5.2
>
> This is a transitional pthread machine, partway between the M:N -> 1:1
> threading model transition. The M:N threads *should* be functional and
> for all system and most application purposes they are. (KDE, etc.)
> However, on occasion some software makes assumptions or is built without
> threading support because configure detected anomalous behaviour.
>
> NOTE: The share file IS AVAILABLE UPON REQUEST. I will save it for
> posterity.

New description:

 Running tahoe on the machine in which python2.5 setup.py test fails as
 reported in trac ticket#737, generates the attached incident report.

 Brief summary from flog debugger viewer:

 {{{
 <ValidatedReadBucketProxy #1>
 (2-<ReadBucketProxy 150097484 to peer [mgnbpxki] SI
 jow42sylefxjxsns3alv5ptghe>-m2st7xk76cvd):
  hash failure in block=0, shnum=2
  on <ReadBucketProxy 150097484 to peer [mgnbpxki] SI
 jow42sylefxjxsns3alv5ptghe>

 [...]

 <BlockDownloader #2>(<ValidatedReadBucketProxy #3>-2): failure to get
 block

 [... etc ...]

 failure in block hash tree
 }}}

 .. and so on.

 Actual cli error message is:

 tahoe get
 URI:CHK:lapry55oui4psmeiyxhvitfmpi:75mb37to6iauypych6bkqfkxxxfk6nhekhomzipkqzwt46v64hdq:3:5:99758080
 meh
 {{{
 Error, got 410 Gone
 NotEnoughSharesError: no shares could be found. Zero shares usually
 indicates
 a corrupt URI, or that no servers were connected, but it might also
 indicate
 severe corruption. You should perform a filecheck on this object to learn
 more.
 }}}

 Finally, dump-share on the 33MB file:

 {{{
 share filename:
 /v/tahoe/.tahoe/storage/shares/jo/jow42sylefxjxsns3alv5ptghe/2
              version: 1
            file_size: 99758080
         num_segments: 762
         segment_size: 131073
        needed_shares: 3
         total_shares: 5

           codec_name: crs
         codec_params: 131073-3-5
    tail_codec_params: 11529-3-5

       crypttext_hash: zodzh33f7mnowsxine5mzejiahlxsilgggpxmop5bfrh4zzzdpha
  crypttext_root_hash: nuqsysg5zqkz5nsvpi32n5h6h5ilbepsbvmssji2xv773kqw53tq
      share_root_hash: m2st7xk76cvdutgf5lzmkdzbf72h75cxpkytwjegi5jgntir3u5q
             UEB_hash: 75mb37to6iauypych6bkqfkxxxfk6nhekhomzipkqzwt46v64hdq
           verify-cap: URI:CHK-
 Verifier:jow42sylefxjxsns3alv5ptghe:75mb37to6iauypych6bkqfkxxxfk6nhekhomzipkqzwt46v64hdq:3:5:99758080

  Size of data within the share:
                 data: 33252694
        uri-extension: 325
           validation: 196648

  Lease #0: owner=0, expire in 2607324s (30 days)
 }}}

 Machine details:

 {{{
 NetBSD quickie 4.99.7 NetBSD 4.99.7 (quickie) #0: Tue Jan 2 14:47:23 PST
 2007 root at quickie:/v/src/sys/arch/i386/compile/quickie i386

 AMD Athlon(tm) XP 2500+ (single-core, 32-bit) 2.5GB RAM

 Python 2.5.2
 }}}

 This is a transitional pthread machine, partway between the M:N -> 1:1
 threading model transition. The M:N threads *should* be functional and for
 all system and most application purposes they are. (KDE, etc.) However, on
 occasion some software makes assumptions or is built without threading
 support because configure detected anomalous behaviour.

 NOTE: The share file IS AVAILABLE UPON REQUEST. I will save it for
 posterity.

--

Comment(by warner):

 (wrapped some of the description text to improve formatting)

 I've looked a bit at the share file you sent me, and it seems ok (no
 corruption that I've seen so far). My next step is to examine the Incident
 report and see if I can figure out exactly which hash check failed, and
 compare them against hashes that I'll generate locally from that share.

 Another approach will be to get a copy of two more shares, put them in a
 private grid, and attempt to download the file. If successful, the shares
 must be ok, and we'll focus on how the download process might be acting
 differently on your host.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/738#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list