[tahoe-lafs-trac-stream] [tahoe-lafs] #510: use plain HTTP for storage server protocol
tahoe-lafs
trac at tahoe-lafs.org
Sat Oct 26 12:02:05 UTC 2013
#510: use plain HTTP for storage server protocol
------------------------------+---------------------------------
Reporter: warner | Owner: zooko
Type: enhancement | Status: new
Priority: major | Milestone: 2.0.0
Component: code-storage | Version: 1.2.0
Resolution: | Keywords: standards gsoc http
Launchpad Bug: |
------------------------------+---------------------------------
Comment (by simeon):
(PS Hope this reply not too long, irrelevant, or incoherent. Getting
tired!)
Replying to [comment:29 warner]:
> Hm. There *are* a set of minimum-sized portions of the share that
> it makes sense to retrieve, since we perform the integrity-checking
> hashes over "blocks". You have to fetch full blocks, because
> otherwise you can't check the hash properly.
Indeed. I guess LAFS has played with the parameters to some degree, so far
I have not, these ideas are in my head for 'next draft' and I guess I
would like to be able to try various algorithms and crunch some numbers.
The important thing wrt this ticket #510 is to get a URI schema that is
simple but flexible enough, and so to find two things, those being 1) the
best ways to express the necessary parameters, and 2) what parameters can
be allowed to float, so as to tune them later.
And at some stage that means you deciding if you even want to or can make
URIs that are simple enough for users to eg plonk into an email inline, or
is it that LAFS is best optimised with illegible URIs and people design
'front-ends' such as the thing I am working on (which I haven't yet
decided is or contains LAFS-compatible code ;-) not making any promises
here.)
> Each share contains some metadata (small), a pair of merkle hash
> trees (one small, the other typically about 0.1% of the total
> filesize), and the blocks themselves. Our current downloader
> heroically tries to retrieve the absolute minimum number of bytes
> (and comically/tragically performs pretty badly as a result, due to
> the overhead of lots of little requests).
Yup. I have been through many possible configuration in my head, and not
sure what you guys have tried. Thanks for your quick summary, it's
difficult with large ongoing projects to jump in and find what is
currently relevant.
My main realisation recently has been that if a sensible db/filesystem
abstraction is made which works, then metadata can be stored in it, and
this would lead to a very generic backend (each being part of the
distributed cloud), potentially with everything else done at a client end,
apart from exchanging objects between backend nodes and deprecation.
So anyway, one way of doing this would be to distinguish between a
'discrete' file type useful for a root metadata object pointing to what
blocks make up a given 'file', and a 'aggregatable' type that could be a
block including a bunch of parts of differing files. I imagine freenet's
strategy was something like this. The idea being as you say, that checksum
of a block is expensive, and files rarely ever being a fixed size,
inevitably some of a file will be a chunk of data smaller than a block, so
they can be aggregated a-la tail-packing in filesystems.
Aggregating content from different files improves distribution of the data
in the net, but the overhead I think is what killed (kills?) freenet. So I
don't know if you have performance measures. But it seems that such
distribution may also help keeping the encryption opaque, or does it? If
the block boundaries are guessable then I guess there is no advantage. And
making them unguessable sounds expensive.
So if making block boundaries vary eg between nodes, and making them non-
power-of-two is expensive, I guess this idea sucks? Perhaps the prefix
being a varying sized chunk of metadata actually works better for that.
I haven't looked at the encryption side of things very much at all, but
would like to figure out how to solve the current URI/API problem of
hyperlinked user-created text in a way that allows either storage in the
clear, or encrypted, without penalty either way, since I figure that
should be possible. Or if not, to eliminate the possibility from the
search-path of potential algorithms to adopt :) then I suppose it is best
to define to URI schema that best serve the purpose of user-generated
hyperlink text stored in either form.
> So we might consider changing the dowloader design (and then the
> server API, and then the storage format) to fetch well-defined
> regions: fetch("metadta"), fetch("hashtrees"), fetch("block[N]").
Exactly what I have been imagining. :) If it can be well standardised and
widely used, I would adopt this strategy too, but my priority remains that
I want to make strongly archivable content easy for a user to use, and
easy for them to verify. If a final file is expected to have hashsum XYZ
then I would want that to be the part the user sees, then they can store a
file in their homedir as a plain file, hashsum it, get XYZ and know it is
what it says it is.
I think this can be easily done, if the filename of the metadata portion
is exposed as hashsum:XYZ or equiv.
The other thing which is relevant is that users should be able to
associate a suggested filename inline in the URI. RFC 6920 I think does
not cater for this. This part of a URI should a) be optional, b) not
really matter, c) should be able to handle storing more than one suggested
filename for the object, although I think only one should be allowed in
the URI itself.
Actually a server could ignore this part, perhaps even the client does not
actually pass it to the server, but I guess a server should know to ignore
it in case of user hacks assuming that it is used.
Going back to the encoding of the hashsum, RFC 6920 uses inefficient
encodings, that make long URIs. Users typing content simply do not like
long URIs. It doesn't matter that the RFC offers ways to help the user get
it right, the fact is, nobody is going to use a scheme that is too clunky,
unless they are made to, which is not something I am considering. The RFC
authors perhaps, being large corporations in at least one case, can count
on being able to make someone use their system.
So to keep URIs short, at least two features are needed, which RFC doesn't
really do: most efficient URI-friendly hashsum encoding format, and
secondly, flexible user-customisable mechanism for truncating a hashsum to
a desired length.
Second part can be solved easily if a wildcard character is reserved, or
probably better, 'match' instruction used, perhaps even as the default.
Let users decide based on their context and the size of the db, how many
characters of hashsum is enough for their confidence level. When a match
hits multiple blocks, give some information to help the user decide which
was the one they were after.
First part ... base64 is actually (very close to being?) completely URI-
compatible but with some characters that are problems simply because they
are stupid choices for a filename, eg slash '/'. Flickr have defined
base58 which is analogous but uses only alpha-numeric characters. This
would be one good choice. Another would be to use base64 and simply change
disallowed characters to something else. I dunno what to call that,
case64? hack64? There is actually something defined iirc, but I
The advantage to using base64 is that many hashsum generator binaries
already can output base64 encoded hashsums. The disadvantage would be that
if what we adopt is 'almost' base64, then when there is a small
discrepency (say '-' in place of '/') the user might abort thinking it
indicates corruption, without doing further research to find the cause of
the larger similarity.
So I think even if adopting something 'very similar to' base64, a strong
effort would be needed to get the variation recognised as a standard form
of hashsum. Which I think would actually be very slow, it has taken years
for even base64 itself to become common in hashsum generators. :-\ Then
again, perhaps since the bulk of the code is there, a change could happen
quickly, if a strong case can be made for why.
Wikipedia has a neat table showing variants of base64, some are well-
standardised in RFCs, even for the explicite purpose of making URIs,
https://en.wikipedia.org/wiki/Base64#Implementations_and_history
What I am not 100% confident about because right now my brain is too
broken to think any more, is whether there are any characters left over
for things like wildcards, separators, splitting etc. But perhaps none of
those things are needed, if the URI schema is arranged well.
> If we exposed those named regions as distinct files, then we
> wouldn't use HTTP Request-Range headers at all, we'd just fetch
> different filenames. The downside would be the filesystem overhead
> for storing separate small files instead of one big file, and the
> request overhead when you make multiple independent URL fetches
> instead of a single one (with a composite Requets-Range header).
Of course, each file need not be just a single block. The server could
make an abstraction to provide 'filenames' for sub-blocks of actual on-
disk files. Which could be either actual files, or aggregated blocks from
different files. These are the types of arrangements, which, in straying
from common semantics, become worthwhile I think only if one is willing to
code up a few different scenarios, and make some kind of tests that will
indicate what performance differences there are, if any. All worthwhile
only if a good case can be made that it might actually help something.
Haven't thought enough about it to know. :-\
> And the server would have to be more aware of the share contents,
> which makes upgrades and version-skew a more significant problem.
I don't think it needs to know anything really. At most, two types of
files, as I described earlier, and these could be kept in two separate
subdirs (or giving them different filename extensions, but I prefer the
former), ie the actual distinction between them could also be a matter for
the client. Yes?
> We could also just keep the shares arranged as they are, but change
> the downloader to fetch larger chunks (i.e. grab the whole hash
> tree once, instead of grabbing individual hashes just before each
> block), and then use separate HTTP requests for the chunks. That
> would reduce our use of Request-Range to a single contiguous span.
> If we could still pipeline the requests (or at least share the
> connection), it should be nearly as efficient as the
> discontiguous-range approach.
Yup. I like the idea of clients pro-actively requesting and caching
content that they are likely to use, however that is done. And of course
dropping unused content after some user-configurable time. But I don't
think it's crucial to do so. If there's a graceful way that happens to
force 'related' data to a client, and particularly if a client is also a
node, then I think it is acceptable. I guess though that perhaps for
legal/idealogical reasons, such behaviour might need to be optional, not
stuff that the entire system is built on top of. So actually, stuff I
haven't solved even ideologically. I guess if I haven't, then many users
will be ambivalent and it's best to ensure such behaviours are
configurable.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/510#comment:30>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list