[tahoe-lafs-trac-stream] [tahoe-lafs] #393: mutable: implement MDMF

Wed Aug 10 11:27:57 PDT 2011

#393: mutable: implement MDMF
-------------------------+-------------------------------------------------
     Reporter:  warner   |      Owner:  zooko
         Type:           |     Status:  new
  enhancement            |  Milestone:  1.9.0
     Priority:  major    |    Version:  1.0.0
    Component:  code-    |   Keywords:  newcaps performance random-access
  mutable                |  privacy gsoc mdmf mutable backward-
   Resolution:           |  compatibility forward-compatibility review-
Launchpad Bug:           |  needed
-------------------------+-------------------------------------------------

Comment (by warner):

 Ah. The actual problem is in the code which pre-allocates space in the
 share for the various fields. In particular, layout.py line 554 is
 counting hashes, not bytes:

 {{{
 SHARE_HASH_CHAIN_SIZE = int(math.log(HASH_SIZE * 256)) + 1
 }}}

 This is meant to reserve space for a worst-case N=256 share hash chain,
 which would require 8 nodes (the root is stored separately, so we don't
 actually need the +1).

 In my 3-of-10 upload, the hash chain is length 4 (there are 16 leaves on
 the bottom layer {one per share}, then 8, then 4, then 2). Each node is
 stored with a two-byte node number and a 32-byte SHA256d output. So the
 hash chain is a total of 136 bytes long.

 Huh, looking more carefully at that equation, there are other problems:

  * {{{math.log}}} is base e, not base 2
  * the {{{HASH_SIZE*}}} should go outside the {{{int()}}}, not inside
  * because of the extra two-byte node-number, it should be
 {{{(HASH_SIZE+2)*}}}
  * the {{{+1}}} is unnecessary because the roothash is stored elsewhere
    * although it'd be a conservative replacement for the missing
      {{{ceil()}}} that'd be necessary if the 256 weren't hard-coded,

 (this sort of problem reinforces my belief that it's better to add up
 the sizes of the actual fields than to attempt to predict them or
 calculate maximum values. Kevan pointed out that he might have been
 advised to do it this way to enable parallelism between RSA key
 generation and encryption/FEC/upload of share data, but I think I'd have
 advised a more conservative approach)

 So I think this ought to be:

 {{{
 SHARE_HASH_CHAIN_SIZE = (HASH_SIZE+2)*mathutil.log_ceil(256, 2)
 }}}

 at least that gives me (32+2)*8, which feels right.

 Kevan: could you update this (and add a new patch with both this and the
 merge conflicts resolved)?

 Now, the *real* question is why the unit tests didn't catch this.
 There's a little bit of slack between the estimated maximums and the
 actual RSA field sizes, but only about 9 bytes, not enough to store an
 entire hash. So at first glace it seems like this could only have worked
 when N=1 (i.e. the share_hash_chain was empty).

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/393#comment:116>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage