[tahoe-lafs-trac-stream] [tahoe-lafs] #1543: rearrange share format to make downloads faster
tahoe-lafs
trac at tahoe-lafs.org
Mon Sep 26 01:43:33 PDT 2011
#1543: rearrange share format to make downloads faster
-------------------------------+-------------------------
Reporter: warner | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-encoding | Version: 1.9.0a1
Resolution: | Keywords: performance
Launchpad Bug: |
-------------------------------+-------------------------
Comment (by zooko):
Replying to [comment:2 zooko]:
> * The hash tree data ''can'' still be be laid out in interesting ways
to facilitate fast reads. There is never bulk data interspersed with the
hash tree nodes that you are trying to read.
For example, you could say:
A typical reader might as well pull down a couple thousand bytes if it
is going to pull down 1 byte. We will lay out 32 consecutive leaf hashes
in a row, followed by the "uncle chain" which is necessary to validate
those leafs hashes. So for example if there is a 10 GiB file with 128 KiB
segments and K=3, then there are 245,763 blocks per share, which means the
binary Merkle Tree will be 18 levels deep. 32 leaf hashes is sufficient to
allow the downloader to compute the bottom-most 5 levels, which leaves 12
uncle hash tree nodes (the root node is not needed for the uncle chain
which is why it is 12 instead of 13). We could write out the 32 leafs,
followed by the 12 uncles, followed by space sufficient for 52 more uncles
(in case the file grows). Each of these 64-hashvalue-wide records would
take a total of 2048 bytes, and a typical reader would need to slurp only
(part of) one or two of these records in order to validate up to 32 or 64
blocks of data, which would be around 1.3 or 2.6 MiB.
(This is analogous to my earlier suggestion of storing the complete uncle
chain after each block of data, but now it is storing the complete uncle
chain after each run of leaf hashes.)
The drawback of optimizing for the reader like that is that the writer
then has to update ''all'' of those records whenever it changes the
contents of any leaf (block). This is the opposite extreme of the layout
where the writer has to do minimal work and the reader has to read from
each of the separately located uncles. I think there may be a sweet spot
tradeoff in which the first five levels of the binary tree are stored in
the first record, so the writer always has to update that first record as
well as the record(s) that are more local to its write and the reader
always has to read that first record as well as the record(s) that are
more local to its read.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1543#comment:3>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list