[tahoe-lafs-trac-stream] [tahoe-lafs] #1543: rearrange share format to make downloads faster

Sun Sep 25 17:02:01 PDT 2011

#1543: rearrange share format to make downloads faster
---------------------------+---------------------------
 Reporter:  warner         |          Owner:
     Type:  enhancement    |         Status:  new
 Priority:  major          |      Milestone:  undecided
Component:  code-encoding  |        Version:  1.9.0a1
 Keywords:  performance    |  Launchpad Bug:
---------------------------+---------------------------
 (I thought I had a ticket for this already, but now I can't find it)

 In a new share format, we could arrange the layout of the various share
 pieces to benefit download speed. The biggest slowdowns are:

  * extra round trips required by unpredictable/overflexible layouts: the
 downloader must first fetch an offset table, and then use a second query
 to fetch the actual data. Speculative fetches can mitigate this, but a
 deterministic layout is preferable.
  * scatter/gather reads (e.g. hash-tree nodes spread out over a large
 area) require discontiguous read operations. This requires larger read
 vectors, more disk IO operations, and in the worst case more disk seeks.

 Our current designs treat whole-file downloads and random-access as equal
 peers: we design first for minimum alacrity for the single-segment case.
 We could instead recognize that most downloads are whole-file, and accept
 a slight alacrity hit in exchange for faster throughput and higher
 transport efficiency:

  * we could fetch multiple segments at once, increasing alacrity and
 memory footprint for faster throughput
  * we could fetch hash nodes for multiple segments ahead of time

 and we could rearrange the share to help:

  * Zooko's idea was to store extra hashnodes. We could put the full
 merkle-chain after each data block. My thought was to further append them
 with the bottom-most nodes first. This would increase storage
 requirements, because we'd be storing multiple redundant copies of the
 hash tree nodes. But reads would be much faster: each segment would
 require a block fetch of the data block plus 0 or more trailing hash nodes
 (depending upon which nodes were already known). That means one contiguous
 IO operation per block, which will be the fastest/most-efficient for the
 storage servers
   * integrity storage overhead with this approach becomes O(nlogn) instead
 of O(n). Rough estimate: for a 10MB file, segsize=128KiB, I think that
 needs 18kB overhead instead of 5kB (per share), about N*0.1% total, maybe
 double that because there are two file-sized hash trees
 (crypttext_hash_tree, block_hash_tree).

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1543>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage