[tahoe-lafs-trac-stream] [tahoe-lafs] #510: use plain HTTP for storage server protocol
tahoe-lafs
trac at tahoe-lafs.org
Thu Oct 24 18:34:24 UTC 2013
#510: use plain HTTP for storage server protocol
------------------------------+---------------------------------
Reporter: warner | Owner: zooko
Type: enhancement | Status: new
Priority: major | Milestone: 2.0.0
Component: code-storage | Version: 1.2.0
Resolution: | Keywords: standards gsoc http
Launchpad Bug: |
------------------------------+---------------------------------
Comment (by warner):
Hm. There *are* a set of minimum-sized portions of the share that
it makes sense to retrieve, since we perform the integrity-checking
hashes over "blocks". You have to fetch full blocks, because
otherwise you can't check the hash properly.
Each share contains some metadata (small), a pair of merkle hash
trees (one small, the other typically about 0.1% of the total
filesize), and the blocks themselves. Our current downloader
heroically tries to retrieve the absolute minimum number of bytes
(and comically/tragically performs pretty badly as a result, due to
the overhead of lots of little requests).
So we might consider changing the dowloader design (and then the
server API, and then the storage format) to fetch well-defined
regions: fetch("metadta"), fetch("hashtrees"), fetch("block[N]").
If we exposed those named regions as distinct files, then we
wouldn't use HTTP Request-Range headers at all, we'd just fetch
different filenames. The downside would be the filesystem overhead
for storing separate small files instead of one big file, and the
request overhead when you make multiple independent URL fetches
instead of a single one (with a composite Requets-Range header).
And the server would have to be more aware of the share contents,
which makes upgrades and version-skew a more significant problem.
We could also just keep the shares arranged as they are, but change
the downloader to fetch larger chunks (i.e. grab the whole hash
tree once, instead of grabbing individual hashes just before each
block), and then use separate HTTP requests for the chunks. That
would reduce our use of Request-Range to a single contiguous span.
If we could still pipeline the requests (or at least share the
connection), it should be nearly as efficient as the
discontiguous-range approach.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/510#comment:29>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list