[tahoe-lafs-trac-stream] [tahoe-lafs] #510: use plain HTTP for storage server protocol

Sat Oct 26 12:02:05 UTC 2013

#510: use plain HTTP for storage server protocol
------------------------------+---------------------------------
     Reporter:  warner        |      Owner:  zooko
         Type:  enhancement   |     Status:  new
     Priority:  major         |  Milestone:  2.0.0
    Component:  code-storage  |    Version:  1.2.0
   Resolution:                |   Keywords:  standards gsoc http
Launchpad Bug:                |
------------------------------+---------------------------------

Comment (by simeon):

 (PS Hope this reply not too long, irrelevant, or incoherent. Getting
 tired!)

 Replying to [comment:29 warner]:
 > Hm. There *are* a set of minimum-sized portions of the share that
 > it makes sense to retrieve, since we perform the integrity-checking
 > hashes over "blocks". You have to fetch full blocks, because
 > otherwise you can't check the hash properly.

 Indeed. I guess LAFS has played with the parameters to some degree, so far
 I have not, these ideas are in my head for 'next draft' and I guess I
 would like to be able to try various algorithms and crunch some numbers.

 The important thing wrt this ticket #510 is to get a URI schema that is
 simple but flexible enough, and so to find two things, those being 1) the
 best ways to express the necessary parameters, and 2) what parameters can
 be allowed to float, so as to tune them later.

 And at some stage that means you deciding if you even want to or can make
 URIs that are simple enough for users to eg plonk into an email inline, or
 is it that LAFS is best optimised with illegible URIs and people design
 'front-ends' such as the thing I am working on (which I haven't yet
 decided is or contains LAFS-compatible code ;-) not making any promises
 here.)

 > Each share contains some metadata (small), a pair of merkle hash
 > trees (one small, the other typically about 0.1% of the total
 > filesize), and the blocks themselves. Our current downloader
 > heroically tries to retrieve the absolute minimum number of bytes
 > (and comically/tragically performs pretty badly as a result, due to
 > the overhead of lots of little requests).

 Yup. I have been through many possible configuration in my head, and not
 sure what you guys have tried. Thanks for your quick summary, it's
 difficult with large ongoing projects to jump in and find what is
 currently relevant.

 My main realisation recently has been that if a sensible db/filesystem
 abstraction is made which works, then metadata can be stored in it, and
 this would lead to a very generic backend (each being part of the
 distributed cloud), potentially with everything else done at a client end,
 apart from exchanging objects between backend nodes and deprecation.

 So anyway, one way of doing this would be to distinguish between a
 'discrete' file type useful for a root metadata object pointing to what
 blocks make up a given 'file', and a 'aggregatable' type that could be a
 block including a bunch of parts of differing files. I imagine freenet's
 strategy was something like this. The idea being as you say, that checksum
 of a block is expensive, and files rarely ever being a fixed size,
 inevitably some of a file will be a chunk of data smaller than a block, so
 they can be aggregated a-la tail-packing in filesystems.

 Aggregating content from different files improves distribution of the data
 in the net, but the overhead I think is what killed (kills?) freenet. So I
 don't know if you have performance measures. But it seems that such
 distribution may also help keeping the encryption opaque, or does it? If
 the block boundaries are guessable then I guess there is no advantage. And
 making them unguessable sounds expensive.

 So if making block boundaries vary eg between nodes, and making them non-
 power-of-two is expensive, I guess this idea sucks? Perhaps the prefix
 being a varying sized chunk of metadata actually works better for that.

 I haven't looked at the encryption side of things very much at all, but
 would like to figure out how to solve the current URI/API problem of
 hyperlinked user-created text in a way that allows either storage in the
 clear, or encrypted, without penalty either way, since I figure that
 should be possible. Or if not, to eliminate the possibility from the
 search-path of potential algorithms to adopt :) then I suppose it is best
 to define to URI schema that best serve the purpose of user-generated
 hyperlink text stored in either form.

 > So we might consider changing the dowloader design (and then the
 > server API, and then the storage format) to fetch well-defined
 > regions: fetch("metadta"), fetch("hashtrees"), fetch("block[N]").

 Exactly what I have been imagining. :) If it can be well standardised and
 widely used, I would adopt this strategy too, but my priority remains that
 I want to make strongly archivable content easy for a user to use, and
 easy for them to verify. If a final file is expected to have hashsum XYZ
 then I would want that to be the part the user sees, then they can store a
 file in their homedir as a plain file, hashsum it, get XYZ and know it is
 what it says it is.

 I think this can be easily done, if the filename of the metadata portion
 is exposed as hashsum:XYZ or equiv.

 The other thing which is relevant is that users should be able to
 associate a suggested filename inline in the URI. RFC 6920 I think does
 not cater for this. This part of a URI should a) be optional, b) not
 really matter, c) should be able to handle storing more than one suggested
 filename for the object, although I think only one should be allowed in
 the URI itself.

 Actually a server could ignore this part, perhaps even the client does not
 actually pass it to the server, but I guess a server should know to ignore
 it in case of user hacks assuming that it is used.

 Going back to the encoding of the hashsum, RFC 6920 uses inefficient
 encodings, that make long URIs. Users typing content simply do not like
 long URIs. It doesn't matter that the RFC offers ways to help the user get
 it right, the fact is, nobody is going to use a scheme that is too clunky,
 unless they are made to, which is not something I am considering. The RFC
 authors perhaps, being large corporations in at least one case, can count
 on being able to make someone use their system.

 So to keep URIs short, at least two features are needed, which RFC doesn't
 really do: most efficient URI-friendly hashsum encoding format, and
 secondly, flexible user-customisable mechanism for truncating a hashsum to
 a desired length.

 Second part can be solved easily if a wildcard character is reserved, or
 probably better, 'match' instruction used, perhaps even as the default.
 Let users decide based on their context and the size of the db, how many
 characters of hashsum is enough for their confidence level. When a match
 hits multiple blocks, give some information to help the user decide which
 was the one they were after.

 First part ... base64 is actually (very close to being?) completely URI-
 compatible but with some characters that are problems simply because they
 are stupid choices for a filename, eg slash '/'. Flickr have defined
 base58 which is analogous but uses only alpha-numeric characters. This
 would be one good choice. Another would be to use base64 and simply change
 disallowed characters to something else. I dunno what to call that,
 case64? hack64? There is actually something defined iirc, but I

 The advantage to using base64 is that many hashsum generator binaries
 already can output base64 encoded hashsums. The disadvantage would be that
 if what we adopt is 'almost' base64, then when there is a small
 discrepency (say '-' in place of '/') the user might abort thinking it
 indicates corruption, without doing further research to find the cause of
 the larger similarity.

 So I think even if adopting something 'very similar to' base64, a strong
 effort would be needed to get the variation recognised as a standard form
 of hashsum. Which I think would actually be very slow, it has taken years
 for even base64 itself to become common in hashsum generators. :-\ Then
 again, perhaps since the bulk of the code is there, a change could happen
 quickly, if a strong case can be made for why.

 Wikipedia has a neat table showing variants of base64, some are well-
 standardised in RFCs, even for the explicite purpose of making URIs,
 https://en.wikipedia.org/wiki/Base64#Implementations_and_history

 What I am not 100% confident about because right now my brain is too
 broken to think any more, is whether there are any characters left over
 for things like wildcards, separators, splitting etc. But perhaps none of
 those things are needed, if the URI schema is arranged well.

 > If we exposed those named regions as distinct files, then we
 > wouldn't use HTTP Request-Range headers at all, we'd just fetch
 > different filenames. The downside would be the filesystem overhead
 > for storing separate small files instead of one big file, and the
 > request overhead when you make multiple independent URL fetches
 > instead of a single one (with a composite Requets-Range header).

 Of course, each file need not be just a single block. The server could
 make an abstraction to provide 'filenames' for sub-blocks of actual on-
 disk files. Which could be either actual files, or aggregated blocks from
 different files. These are the types of arrangements, which, in straying
 from common semantics, become worthwhile I think only if one is willing to
 code up a few different scenarios, and make some kind of tests that will
 indicate what performance differences there are, if any. All worthwhile
 only if a good case can be made that it might actually help something.
 Haven't thought enough about it to know. :-\

 > And the server would have to be more aware of the share contents,
 > which makes upgrades and version-skew a more significant problem.

 I don't think it needs to know anything really. At most, two types of
 files, as I described earlier, and these could be kept in two separate
 subdirs (or giving them different filename extensions, but I prefer the
 former), ie the actual distinction between them could also be a matter for
 the client. Yes?

 > We could also just keep the shares arranged as they are, but change
 > the downloader to fetch larger chunks (i.e. grab the whole hash
 > tree once, instead of grabbing individual hashes just before each
 > block), and then use separate HTTP requests for the chunks. That
 > would reduce our use of Request-Range to a single contiguous span.
 > If we could still pipeline the requests (or at least share the
 > connection), it should be nearly as efficient as the
 > discontiguous-range approach.

 Yup. I like the idea of clients pro-actively requesting and caching
 content that they are likely to use, however that is done. And of course
 dropping unused content after some user-configurable time. But I don't
 think it's crucial to do so. If there's a graceful way that happens to
 force 'related' data to a client, and particularly if a client is also a
 node, then I think it is acceptable. I guess though that perhaps for
 legal/idealogical reasons, such behaviour might need to be optional, not
 stuff that the entire system is built on top of. So actually, stuff I
 haven't solved even ideologically. I guess if I haven't, then many users
 will be ambivalent and it's best to ensure such behaviours are
 configurable.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/510#comment:30>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage