[Tahoe-dev] [p2p-hackers] announcing Allmydata Tahoe

zooko at zooko.com zooko at zooko.com
Tue May 8 14:07:47 PDT 2007


 david wrote:
>
> Wow, the project looks really great!  I'm curious why you chose to use
> Merkle trees, however, instead of a simple hash list.  What's the advantage
> of the additional complexity?

Thanks!

The reason for the Merkle Tree is so that you can verify the correctness of the
first part of the file sooner.  We call this "alacrity" -- the elapsed time
between deciding you want to download a file and being able to read the first
few bits of cleartext.  With a simple hash list and a very large file, you have
to download a very long hash list (or else a short hash list and a very large
block to check if it matches the first hash in the list) before you can safely
use the first byte of the data.

Brian Warner wrote a script [1] to calculate the overhead (bytes used for
verification metadata divided by bytes used for data) and alacrity (bytes
downloaded before being able to use the first byte of cleartext) for the
hash-list and Merkle Tree options.  See appended output from that script.

It turns out that this might be useful not only for people who are impatient to
use the first part of a file that they are downloading, but also for
probabilistic verifiers who want to sample only one or a few blocks instead of
downloading the whole thing.

By the way, here is a document with pictures describing how this part of Tahoe
works: [2].

Regards,

Zooko

[1] http://allmydata.org/trac/tahoe/browser/misc/sizes.py
[2] http://allmydata.org/trac/tahoe/wiki/TahoeIssues/FileEncoding


In the following, k is the arity of the hash tree, and d is the depth of it.
So the hash list approach is where d == 1.

------- begin appended output from misc/sizes.py

$ python -OOu ./misc/sizes.py --mode beta
mode=beta # k=num_subblocks, d=1 # each subblock has a 20-byte hash
                    storage    storage
Size     blocksize  overhead   overhead     k  d  alacrity
                    (bytes)      (%)
-------  -------    --------   --------  ---- --  --------
     4     0.16        1.95k  50000.00      1  1    0.16 
    16     0.64        1.95k  12500.00      1  1    0.64 
    64     2.56        1.95k   3125.00      1  1    2.56 
   256    10.24        1.95k    781.25      1  1   10.24 
     1k   40.96        1.95k    195.31      1  1   40.96 
     4k  163.84        1.95k     48.83      1  1  163.84 
    16k  655.36        1.95k     12.21      1  1  655.36 
    64k    2.56k       1.95k      3.05      1  1    2.56k
   256k   10.24k       1.95k      0.76      1  1   10.24k
     1M   40.96k       1.95k      0.19      1  1   40.96k
     4M  163.84k       3.91k      0.10      2  1   81.94k
    16M  655.36k      15.62k      0.10      8  1   82.06k
    64M    2.56M      62.50k      0.10     32  1   82.53k
   256M   10.24M     250   k      0.10    128  1   84.40k
     1G   40.96M    1000   k      0.10    512  1   91.90k
     4G  163.84M       3.91M      0.10   2048  1  121.90k
    16G  655.36M      15.62M      0.10   8192  1  241.90k
    64G    2.56G      62.50M      0.10   32768  1  721.90k
   256G   10.24G     250   M      0.10   131072  1    2.58M
     1T   40.96G    1000   M      0.10   524288  1   10.08M
$ python -OOu ./misc/sizes.py --mode gamma
mode=gamma # self.subblock_arity = k = arity
                    storage    storage
Size     blocksize  overhead   overhead     k  d  alacrity
                    (bytes)      (%)
-------  -------    --------   --------  ---- --  --------
     4     0.16        0          0.00      2  0    0.16 
    16     0.64        0          0.00      2  0    0.64 
    64     2.56        0          0.00      2  0    2.56 
   256    10.24        0          0.00      2  0   10.24 
     1k   40.96        0          0.00      2  0   40.96 
     4k  163.84        0          0.00      2  0  163.84 
    16k  655.36        0          0.00      2  0  655.36 
    64k    2.56k       0          0.00      2  0    2.56k
   256k   10.24k       0          0.00      2  0   10.24k
     1M   40.96k       0          0.00      2  0   40.96k
     4M  163.84k       3.91k      0.10      2  1   81.94k
    16M  655.36k      27.34k      0.17      2  3   81.98k
    64M    2.56M     121.09k      0.18      2  5   82.02k
   256M   10.24M     496.09k      0.19      2  7   82.06k
     1G   40.96M       1.95M      0.19      2  9   82.10k
     4G  163.84M       7.81M      0.19      2 11   82.13k
    16G  655.36M      31.25M      0.19      2 13   82.17k
    64G    2.56G     125   M      0.19      2 15   82.21k
   256G   10.24G     500   M      0.19      2 17   82.25k
     1T   40.96G       1.95G      0.19      2 19   82.29k


More information about the Tahoe-dev mailing list