[Tahoe-dev] [p2p-hackers] announcing Allmydata Tahoe
zooko at zooko.com
zooko at zooko.com
Tue May 8 14:07:47 PDT 2007
david wrote:
>
> Wow, the project looks really great! I'm curious why you chose to use
> Merkle trees, however, instead of a simple hash list. What's the advantage
> of the additional complexity?
Thanks!
The reason for the Merkle Tree is so that you can verify the correctness of the
first part of the file sooner. We call this "alacrity" -- the elapsed time
between deciding you want to download a file and being able to read the first
few bits of cleartext. With a simple hash list and a very large file, you have
to download a very long hash list (or else a short hash list and a very large
block to check if it matches the first hash in the list) before you can safely
use the first byte of the data.
Brian Warner wrote a script [1] to calculate the overhead (bytes used for
verification metadata divided by bytes used for data) and alacrity (bytes
downloaded before being able to use the first byte of cleartext) for the
hash-list and Merkle Tree options. See appended output from that script.
It turns out that this might be useful not only for people who are impatient to
use the first part of a file that they are downloading, but also for
probabilistic verifiers who want to sample only one or a few blocks instead of
downloading the whole thing.
By the way, here is a document with pictures describing how this part of Tahoe
works: [2].
Regards,
Zooko
[1] http://allmydata.org/trac/tahoe/browser/misc/sizes.py
[2] http://allmydata.org/trac/tahoe/wiki/TahoeIssues/FileEncoding
In the following, k is the arity of the hash tree, and d is the depth of it.
So the hash list approach is where d == 1.
------- begin appended output from misc/sizes.py
$ python -OOu ./misc/sizes.py --mode beta
mode=beta # k=num_subblocks, d=1 # each subblock has a 20-byte hash
storage storage
Size blocksize overhead overhead k d alacrity
(bytes) (%)
------- ------- -------- -------- ---- -- --------
4 0.16 1.95k 50000.00 1 1 0.16
16 0.64 1.95k 12500.00 1 1 0.64
64 2.56 1.95k 3125.00 1 1 2.56
256 10.24 1.95k 781.25 1 1 10.24
1k 40.96 1.95k 195.31 1 1 40.96
4k 163.84 1.95k 48.83 1 1 163.84
16k 655.36 1.95k 12.21 1 1 655.36
64k 2.56k 1.95k 3.05 1 1 2.56k
256k 10.24k 1.95k 0.76 1 1 10.24k
1M 40.96k 1.95k 0.19 1 1 40.96k
4M 163.84k 3.91k 0.10 2 1 81.94k
16M 655.36k 15.62k 0.10 8 1 82.06k
64M 2.56M 62.50k 0.10 32 1 82.53k
256M 10.24M 250 k 0.10 128 1 84.40k
1G 40.96M 1000 k 0.10 512 1 91.90k
4G 163.84M 3.91M 0.10 2048 1 121.90k
16G 655.36M 15.62M 0.10 8192 1 241.90k
64G 2.56G 62.50M 0.10 32768 1 721.90k
256G 10.24G 250 M 0.10 131072 1 2.58M
1T 40.96G 1000 M 0.10 524288 1 10.08M
$ python -OOu ./misc/sizes.py --mode gamma
mode=gamma # self.subblock_arity = k = arity
storage storage
Size blocksize overhead overhead k d alacrity
(bytes) (%)
------- ------- -------- -------- ---- -- --------
4 0.16 0 0.00 2 0 0.16
16 0.64 0 0.00 2 0 0.64
64 2.56 0 0.00 2 0 2.56
256 10.24 0 0.00 2 0 10.24
1k 40.96 0 0.00 2 0 40.96
4k 163.84 0 0.00 2 0 163.84
16k 655.36 0 0.00 2 0 655.36
64k 2.56k 0 0.00 2 0 2.56k
256k 10.24k 0 0.00 2 0 10.24k
1M 40.96k 0 0.00 2 0 40.96k
4M 163.84k 3.91k 0.10 2 1 81.94k
16M 655.36k 27.34k 0.17 2 3 81.98k
64M 2.56M 121.09k 0.18 2 5 82.02k
256M 10.24M 496.09k 0.19 2 7 82.06k
1G 40.96M 1.95M 0.19 2 9 82.10k
4G 163.84M 7.81M 0.19 2 11 82.13k
16G 655.36M 31.25M 0.19 2 13 82.17k
64G 2.56G 125 M 0.19 2 15 82.21k
256G 10.24G 500 M 0.19 2 17 82.25k
1T 40.96G 1.95G 0.19 2 19 82.29k
More information about the Tahoe-dev
mailing list