[tahoe-lafs-trac-stream] [tahoe-lafs] #510: use plain HTTP for storage server protocol

Thu Oct 24 18:34:24 UTC 2013

#510: use plain HTTP for storage server protocol
------------------------------+---------------------------------
     Reporter:  warner        |      Owner:  zooko
         Type:  enhancement   |     Status:  new
     Priority:  major         |  Milestone:  2.0.0
    Component:  code-storage  |    Version:  1.2.0
   Resolution:                |   Keywords:  standards gsoc http
Launchpad Bug:                |
------------------------------+---------------------------------

Comment (by warner):

 Hm. There *are* a set of minimum-sized portions of the share that
 it makes sense to retrieve, since we perform the integrity-checking
 hashes over "blocks". You have to fetch full blocks, because
 otherwise you can't check the hash properly.

 Each share contains some metadata (small), a pair of merkle hash
 trees (one small, the other typically about 0.1% of the total
 filesize), and the blocks themselves. Our current downloader
 heroically tries to retrieve the absolute minimum number of bytes
 (and comically/tragically performs pretty badly as a result, due to
 the overhead of lots of little requests).

 So we might consider changing the dowloader design (and then the
 server API, and then the storage format) to fetch well-defined
 regions: fetch("metadta"), fetch("hashtrees"), fetch("block[N]").
 If we exposed those named regions as distinct files, then we
 wouldn't use HTTP Request-Range headers at all, we'd just fetch
 different filenames. The downside would be the filesystem overhead
 for storing separate small files instead of one big file, and the
 request overhead when you make multiple independent URL fetches
 instead of a single one (with a composite Requets-Range header).
 And the server would have to be more aware of the share contents,
 which makes upgrades and version-skew a more significant problem.

 We could also just keep the shares arranged as they are, but change
 the downloader to fetch larger chunks (i.e. grab the whole hash
 tree once, instead of grabbing individual hashes just before each
 block), and then use separate HTTP requests for the chunks. That
 would reduce our use of Request-Range to a single contiguous span.
 If we could still pipeline the requests (or at least share the
 connection), it should be nearly as efficient as the
 discontiguous-range approach.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/510#comment:29>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage