[tahoe-lafs-trac-stream] [tahoe-lafs] #1543: rearrange share format to make downloads faster

Mon Sep 26 01:43:33 PDT 2011

#1543: rearrange share format to make downloads faster
-------------------------------+-------------------------
     Reporter:  warner         |      Owner:
         Type:  enhancement    |     Status:  new
     Priority:  major          |  Milestone:  undecided
    Component:  code-encoding  |    Version:  1.9.0a1
   Resolution:                 |   Keywords:  performance
Launchpad Bug:                 |
-------------------------------+-------------------------

Comment (by zooko):

 Replying to [comment:2 zooko]:
 >  * The hash tree data ''can'' still be be laid out in interesting ways
 to facilitate fast reads. There is never bulk data interspersed with the
 hash tree nodes that you are trying to read.

 For example, you could say:
   A typical reader might as well pull down a couple thousand bytes if it
 is going to pull down 1 byte. We will lay out 32 consecutive leaf hashes
 in a row, followed by the "uncle chain" which is necessary to validate
 those leafs hashes. So for example if there is a 10 GiB file with 128 KiB
 segments and K=3, then there are 245,763 blocks per share, which means the
 binary Merkle Tree will be 18 levels deep. 32 leaf hashes is sufficient to
 allow the downloader to compute the bottom-most 5 levels, which leaves 12
 uncle hash tree nodes (the root node is not needed for the uncle chain
 which is why it is 12 instead of 13). We could write out the 32 leafs,
 followed by the 12 uncles, followed by space sufficient for 52 more uncles
 (in case the file grows). Each of these 64-hashvalue-wide records would
 take a total of 2048 bytes, and a typical reader would need to slurp only
 (part of) one or two of these records in order to validate up to 32 or 64
 blocks of data, which would be around 1.3 or 2.6 MiB.
 (This is analogous to my earlier suggestion of storing the complete uncle
 chain after each block of data, but now it is storing the complete uncle
 chain after each run of leaf hashes.)

 The drawback of optimizing for the reader like that is that the writer
 then has to update ''all'' of those records whenever it changes the
 contents of any leaf (block). This is the opposite extreme of the layout
 where the writer has to do minimal work and the reader has to read from
 each of the separately located uncles. I think there may be a sweet spot
 tradeoff in which the first five levels of the binary tree are stored in
 the first record, so the writer always has to update that first record as
 well as the record(s) that are more local to its write and the reader
 always has to read that first record as well as the record(s) that are
 more local to its read.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1543#comment:3>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage