Context Navigation

← Previous Ticket
Next Ticket →

Opened at 2008-09-08T22:44:26Z

Closed at 2023-03-24T19:30:32Z

#510 closed enhancement (fixed)

use plain HTTP for storage server protocol

Reported by:	warner	Owned by:	exarkun
Priority:	major	Milestone:	HTTP Storage Protocol
Component:	code-storage	Version:	1.2.0
Keywords:	standards gsoc http leastauthority	Cc:	zooko, jeremy@…, peter@…
Launchpad Bug:

Description (last modified by daira)

Zooko told me about an idea: use plain HTTP for the storage server protocol, instead of foolscap. Here are some thoughts:

it could make Tahoe easier to standardize: the spec wouldn't have to include foolscap too
the description of the share format (all the hashes/signatures/etc) becomes the most important thing: most other aspects of the system can be inferred from this format (with peer selection being a significant omission)
download is easy, use GET and a URL of /shares/STORAGEINDEX/SHNUM, perhaps with an HTTP Content-Range header if you only want a portion of the share
upload for immutable files is easy: PUT /shares/SI/SHNUM, which works only once
upload for mutable files:
- implement DSA-based mutable files, in which the storage index is the hash of the public key (or maybe even equal to the public key)
- the storage server is obligated to validate every bit of the share against the roothash, validate the roothash signature against the pubkey, and validate the pubkey against the storage index
- the storage server will accept any share that validates up to the SI and has a seqnum higher than any existing share
- if there is no existing share, the server will accept any valid share
- when using Content-Range: (in some one-message equivalent of writev), the server validates the resulting share, which is some combination of the existing share and the deltas being written. (this is for MDMF where we're trying to modify just one segment, plus the modified hash chains, root hash, and signature)

Switching to a validate-the-share scheme to control write access is good and bad:

+ repairers can create valid, readable, overwritable shares without access to the writecap.
- storage servers must do a lot of hashing and public key computation on every upload
- storage servers must know the format of the uploaded share, so clients cannot start using new formats without first upgrading all the storage servers

The result would be a share-transfer protocol that would look exactly like HTTP, however it could not be safely implemented by a simple HTTP server because the PUT requests must be constrained by validating the share. (a simple HTTP server doesn't really implement PUT anyways). There is a benefit to using "plain HTTP", but some of the benefit is lost when in fact it is really HTTP being used as an RPC mechanism (think of the way S3 uses HTTP).

It might be useful to have storage servers declare two separate interfaces: a plain HTTP interface for read, and a separate port or something for write. The read side could indeed be provided by a dumb HTTP server like apache; the write side would need something slightly more complicated. An apache module to provide the necessary share-write checking would be fairly straightforward, though.

Hm, that makes me curious about the potential to write the entire Tahoe node as an apache module: it could convert requests for /ROOT/uri/FILECAP etc into share requests and FEC decoding...

Change History (45)

comment:1 Changed at 2008-09-10T20:20:57Z by zooko

Brian: very nice write-up. This is the kind of thing that ought to be posted to tahoe-dev. I kind of think that all new tickets opened on the trac should be mailed to tahoe-dev. That's what the distutils-sig list does, and it seems to work fine.

But anyway, would you please post the above to tahoe-dev? Thanks.

comment:2 Changed at 2008-09-24T13:52:07Z by zooko

I mentioned this ticket as one of the most important-to-me improvements that we could make in the Tahoe code: http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html

comment:3 Changed at 2010-02-23T03:09:25Z by zooko

Milestone changed from undecided to 2.0.0

comment:4 follow-up: ↓ 5 Changed at 2010-03-01T10:30:52Z by jrydberg

"PUT /shares/SI/SHNUM, which works only once" - Shouldn't POST be used rather than PUT? PUT is idempotent.

comment:5 in reply to: ↑ 4 Changed at 2010-03-02T03:08:01Z by davidsarah

Keywords standards added

Replying to jrydberg:

"PUT /shares/SI/SHNUM, which works only once" - Shouldn't POST be used rather than PUT? PUT is idempotent.

PUTting a share would be idempotent, because "(aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request" (http://tools.ietf.org/html/rfc2616#section-9.1). I.e. repeating the request can have no harmful effect. (Note that, assuming the collision-resistence of the hash, there is only one possible valid contents for the share at a given SI and SHNUM.)

HTTP doesn't require that an idempotent request always succeeds. The only ways in which client behaviour is specified to depend on idempotence are:

If there is an asynchronous close during a sequence of idempotent requests, clients SHOULD retry the request sequence once without user interaction (http://tools.ietf.org/html/rfc2616#section-8.1.4).
Idempotent requests can be pipelined (http://tools.ietf.org/html/rfc2616#section-8.1.2.2).

These are both desirable for uploading of shares.

comment:6 Changed at 2010-03-04T21:57:36Z by jsgf

Cc jeremy@… added

comment:7 Changed at 2010-03-12T23:30:26Z by davidsarah

Keywords gsoc added

comment:8 Changed at 2010-08-15T04:58:06Z by zooko

See also #1007 (HTTP proxy support for node to node communication).

comment:9 Changed at 2010-08-15T04:58:41Z by zooko

Summary changed from use plain HTTP for storage server protocol? to use plain HTTP for storage server protocol

comment:10 Changed at 2010-11-05T13:18:50Z by davidsarah

Keywords http added

comment:11 Changed at 2011-06-29T08:26:45Z by warner

Some notes from the 2011 Tahoe Summit:

We can't keep using shared-secret prove-by-present-it write-enablers over a non-confidential HTTP transport. One approach would be to use a verifying key as the write-enabler, and sign the serialized mutation request message, but that would impose a heavy CPU cost on each write (a whole pubkey verification).

A cheaper approach would use a shared-secret write-enabler to MAC the mutation request. To get this shared secret to the server over a non-confidential channel, we need a public-key encryption scheme. The scheme David-Sarah and I cooked up uses one pubkey-decryption operation per server connection, and avoids all but one verification operation per re-key operation. Normal share mutation uses only (cheap) symmetric operations.

Basically, each client/server pair establishes a symmetric session key as soon as the connection is established. This involves putting a public encryption key in the #466 signed-introducer announcement, maybe as simple as a DH g^x parameter (probably an elliptic-curve group element). At startup, the client picks a secret, creates g^y, sends it in a special message to the server, and the resulting shared g^xy is the session key. The client could use a derivative of their persistent master secret for this, or it could be random each time, doesn't matter.

The session key is used in an authenticated-encryption mode like CCM or basic AES+HMAC. When a #1426 re-key operation is performed, the signed please-update-the-write-enabler message is encrypted with the session key, protecting the WE from eavesdroppers. The server checks the re-key request's signature and stores the new WE next to the share.

To actually authorize mutate-share operations, the request is serialized, then MACed using the WE as the secret key. Requests without a valid MAC are rejected. This uses only cheap hash operations for the mutation requests. The expensive pubkey ops are only used once per file per serverid-change (migration) during re-keying, and one per connection to establish the session key.

comment:12 Changed at 2011-08-24T15:53:21Z by zooko

Owner set to taral

comment:13 Changed at 2011-08-29T00:16:13Z by taral

Okay, so a couple things:

I need a list of the protocol messages. :)
You guys sound like you're re-inventing TLS. Can someone explain why we shouldn't run the protocol over TLS instead of inventing our own crypto?

comment:14 Changed at 2011-09-01T21:24:09Z by zooko

Cc zooko added

comment:15 Changed at 2011-09-08T18:17:39Z by zooko

Taral:

The protocol messages are the methods of the classes which subclass RemoteInterface and are listed in interfaces.py . For example, to upload an immutable file, you get a remote reference to an RIBucketWriter and call its methods write() and close().

About crypto:

Note that we're talking only about the encryption used to protect authorization of users to do certain things. There is another use of encryption, which is to protect the confidentiality of the file data, and that we already do in our own custom way (since TLS doesn't really apply to files the way Tahoe-LAFS users use them).

The current version of Tahoe-LAFS protocol does actually run over SSL/TLS and rely on that to protect certain authorization secrets. The most important authorization secret is called the "write enabler", which you can read more about in specifications/mutable.rst , interfaces.py , client-side mutable/publish.py and mutable/filenode.py , and server-side storage/mutable.py .

When developing a new HTTP(S)-based protocol, we have to decide whether to implement our own encryption to manage authorization or to continue using the "enablers" design on top of SSL/TLS (thus making it be an HTTPS -only protocol and not an HTTP-protocol). I think it may actually ease deployment and usage to do the former, because SSL/TLS is a bit of a pain to deploy. I think it may actually also simplify the protocol! This is somewhat surprising, but what we need is an authorization protocol and what SSL/TLS provides is a two-party confidential, integrity-preserving channel with server-authentication. It kind of looks like implementing our own crypto authorization protocol (such as described in comment:11) may result in a simpler protocol than implementing an authorization protocol layered on top of a secure channel protocol.

Our custom protocol would also be a bit more efficient, where efficiency is measured primarily by number of required round-trips.

(Note that Brian Warner's foolscap is already a general-purpose authorization protocol built on top of SSL, but it doesn't quite fit into our needs because of a few efficiency considerations including the size of the foolscap authorization tokens (furls). Also, foolscap includes a Python-oriented remote object protocol and the whole point of this ticket is to get away from that. :-))

I don't have time to dredge up all the pros and cons that we've talked about, but if anyone does remember them or find them, please post them to this ticket or link to them from this ticket.

Last edited at 2011-09-08T18:35:38Z by zooko (previous) (diff)

comment:16 Changed at 2011-09-08T19:02:58Z by zooko

There are a few high-level docs for people getting started understanding the basic ideas of Tahoe-LAFS data formats.

http://tahoe-lafs.org/~zooko/lafs.pdf

docs/architecture.rst

These are a good start and one should probably read them first, but they really don't get specific enough so that you could, for example, go off and implement a compatible implementation yourself. Here are some "works in progress" where we hope that such a detailed specification will one day live:

docs/specifications/outline.rst

ticket #865, ticket #38

You could help, possibly by asking specific questions which can only be answered by fleshing out those specification documents.

comment:17 Changed at 2011-09-08T19:21:07Z by taral

Thanks zooko!

comment:18 Changed at 2011-09-09T19:06:35Z by zooko

You're welcome! I look forward to seeing what you do with it.

comment:19 Changed at 2011-09-09T19:11:43Z by zooko

Oh, I forgot to mention another "high level overview" for getting started with. This one was written by someone who I don't really know anything about -- named Mahmoud Ahmed Ismail, they haven't interacted with the Tahoe-LAFS developers much, but they started their own project inspired by Tahoe-LAFS and they wrote a high-level doc which is a really good introduction to Tahoe-LAFS design:

http://code.google.com/p/nilestore/wiki/TahoeLAFSBasics

comment:20 Changed at 2011-10-17T20:48:15Z by warner

I put a quick prototype of using HTTP to download immutable share data in my "http-transport" github branch (https://github.com/warner/tahoe-lafs/tree/http-transport , may or may not still exist by the time you read this). It advertises a "storage-URL" through the #466 extended introducer announcement, and uses the new web.client code in recent Twisted (10.0 I think?) and a Range: header to fetch the correct read vector. It does not yet use persistent connections, which I think are necessary to get the performance improvement we're hoping for. It also still uses Foolscap for share discovery (getting from a storage index to a list of share numbers on that server), and doesn't touch mutable shares at all, and of course doesn't even think about uploads or modifying shares.

I also added #1565 to discuss the URLs that should be used to access this kind of service.

comment:21 Changed at 2011-11-08T03:21:53Z by taral

Sorry about the delay, folks... things have been busy around here. If anyone else is interested in contributing to this, please feel free.

comment:22 Changed at 2013-05-30T00:13:14Z by daira

Description modified (diff)

The cloud backend, which uses HTTP or HTTPS to connect to the cloud storage service, provides some interesting data on how an HTTP-only storage protocol might perform. With request pipelining and connection pooling, it seems to do a pretty good job of maxing out the upstream bandwidth to the cloud on my home Internet connection, although it would be interesting to test it with a fatter pipe. (For downloads, performance appears to be limited by inefficiencies in the downloader rather than in the cloud backend.)

Currently, the cloud backend splits shares into "chunks" to limit the amount of data that needs to be held in memory or in a store object (see docs/specifications/backends/raic.rst ). This is somewhat redundant with segmentation: ciphertext "segments" are erasure-encoded into "blocks" (a segment is k = shares.needed times larger than a block), and stored in a share together with a header and metadata, which is then chunked. Blocks and chunks are not aligned (for two reasons: the share header, and the typical block size of 128 KiB / 3, which is not a factor of the 512 KiB default chunk size). So,

a sequential scan over blocks will reference the same chunk for several (typically about 12 for k = 3) consecutive requests.
a single block may span chunks.
writes not aligned with a chunk must be implemented using read-modify-write.

The cloud backend uses caching to mitigate any resulting inefficiency. However, this is only of limited help because the storage client lacks information about where the chunk boundaries are and the behaviour of the chunk cache, and the storage server lacks information about the access patterns of the uploader or downloader.

A possible performance improvement and simplification that I'm quite enthusiastic about for an HTTP-based protocol is to make blocks the same thing as chunks. That is, the segment size would be k times the chunk size, and the uploader or downloader would directly store or request chunks, rather than blocks, from the backend storage, doing any caching itself.

Last edited at 2013-05-30T00:28:07Z by daira (previous) (diff)

comment:23 follow-up: ↓ 25 Changed at 2013-10-20T00:06:41Z by zooko

Owner changed from taral to nobody

We put review comments over on github:

It depends on a use of the HTTP Content-Range header which is (unfortunately) explicitly forbidden by the HTTP/1.1 spec, so that probably needs to change.

It doesn't have unit tests.

Other than that, it is awesome!

comment:24 Changed at 2013-10-20T00:07:11Z by zooko

Cc peter@… added
Owner changed from nobody to zooko

comment:25 in reply to: ↑ 23 ; follow-up: ↓ 26 Changed at 2013-10-20T13:51:24Z by daira

Replying to zooko:

It depends on a use of the HTTP Content-Range header which is (unfortunately) explicitly forbidden by the HTTP/1.1 spec, so that probably needs to change.

Various behaviours relating to Content-Range are forbidden by the HTTP 1.1 spec, but which one were you referring to?

comment:26 in reply to: ↑ 25 ; follow-up: ↓ 27 Changed at 2013-10-20T17:31:00Z by zooko

Various behaviours relating to Content-Range are forbidden by the HTTP 1.1 spec, but which one were you referring to?

Described here: https://github.com/warner/tahoe-lafs/commit/3f6e2f196dc654b8f846b2bc0d174382bf6d59c5#commitcomment-4376374

comment:27 in reply to: ↑ 26 ; follow-up: ↓ 28 Changed at 2013-10-24T07:35:01Z by simeon

Replying to zooko:

Various behaviours relating to Content-Range are forbidden by the HTTP 1.1 spec, but which one were you referring to?

Described here: https://github.com/warner/tahoe-lafs/commit/3f6e2f196dc654b8f846b2bc0d174382bf6d59c5#commitcomment-4376374

AFAIK it is permitted to combine pipelining (https://en.wikipedia.org/wiki/HTTP_pipelining) and keep-alive (https://en.wikipedia.org/wiki/HTTP_persistent_connection) along with content-range, the overhead would not be very great, and perhaps some HTTP header fields could be omitted in such a communication.

I wonder if it's better perhaps to store large files as a series of fixed-maximum-size chunks, and have the client do the logic to reassemble them ... ? (Haven't read much about your implementation, and actually in my own similar work, i originally concluded to opt for simplest usage at client end ... so really that comment is not backed by anything, just musing...)

comment:28 in reply to: ↑ 27 Changed at 2013-10-24T07:40:35Z by simeon

Replying to simeon:

Replying to zooko:

Various behaviours relating to Content-Range are forbidden by the HTTP 1.1 spec, but which one were you referring to?

Described here: https://github.com/warner/tahoe-lafs/commit/3f6e2f196dc654b8f846b2bc0d174382bf6d59c5#commitcomment-4376374

AFAIK it is permitted to combine pipelining (https://en.wikipedia.org/wiki/HTTP_pipelining) and keep-alive (https://en.wikipedia.org/wiki/HTTP_persistent_connection) along with content-range, the overhead would not be very great, and perhaps some HTTP header fields could be omitted in such a communication.

I wonder if it's better perhaps to store large files as a series of fixed-maximum-size chunks, and have the client do the logic to reassemble them ... ? (Haven't read much about your implementation, and actually in my own similar work, i originally concluded to opt for simplest usage at client end ... so really that comment is not backed by anything, just musing...)

For an effective method of splitting large files in a way that preserves de-duplicatability, see debian's addition to gzip (http://svana.org/kleptog/rgzip.html) which creates gzip files that can be reliably and effectively de-duplicated even when some of the data has changed, which helps rsync stay useful. AFAIK they are still using the same 'stupid' algorithm, and it has been incorporated into the standard system tools so that debs are indeed rsyncable through various versions of deb.

comment:29 follow-up: ↓ 30 Changed at 2013-10-24T18:34:24Z by warner

Hm. There *are* a set of minimum-sized portions of the share that it makes sense to retrieve, since we perform the integrity-checking hashes over "blocks". You have to fetch full blocks, because otherwise you can't check the hash properly.

Each share contains some metadata (small), a pair of merkle hash trees (one small, the other typically about 0.1% of the total filesize), and the blocks themselves. Our current downloader heroically tries to retrieve the absolute minimum number of bytes (and comically/tragically performs pretty badly as a result, due to the overhead of lots of little requests).

So we might consider changing the dowloader design (and then the server API, and then the storage format) to fetch well-defined regions: fetch("metadta"), fetch("hashtrees"), fetch("block[N]"). If we exposed those named regions as distinct files, then we wouldn't use HTTP Request-Range headers at all, we'd just fetch different filenames. The downside would be the filesystem overhead for storing separate small files instead of one big file, and the request overhead when you make multiple independent URL fetches instead of a single one (with a composite Requets-Range header). And the server would have to be more aware of the share contents, which makes upgrades and version-skew a more significant problem.

We could also just keep the shares arranged as they are, but change the downloader to fetch larger chunks (i.e. grab the whole hash tree once, instead of grabbing individual hashes just before each block), and then use separate HTTP requests for the chunks. That would reduce our use of Request-Range to a single contiguous span. If we could still pipeline the requests (or at least share the connection), it should be nearly as efficient as the discontiguous-range approach.

comment:30 in reply to: ↑ 29 ; follow-up: ↓ 31 Changed at 2013-10-26T12:02:04Z by simeon

(PS Hope this reply not too long, irrelevant, or incoherent. Getting tired!)

Replying to warner:

Hm. There *are* a set of minimum-sized portions of the share that it makes sense to retrieve, since we perform the integrity-checking hashes over "blocks". You have to fetch full blocks, because otherwise you can't check the hash properly.

Indeed. I guess LAFS has played with the parameters to some degree, so far I have not, these ideas are in my head for 'next draft' and I guess I would like to be able to try various algorithms and crunch some numbers.

The important thing wrt this ticket #510 is to get a URI schema that is simple but flexible enough, and so to find two things, those being 1) the best ways to express the necessary parameters, and 2) what parameters can be allowed to float, so as to tune them later.

And at some stage that means you deciding if you even want to or can make URIs that are simple enough for users to eg plonk into an email inline, or is it that LAFS is best optimised with illegible URIs and people design 'front-ends' such as the thing I am working on (which I haven't yet decided is or contains LAFS-compatible code ;-) not making any promises here.)

Each share contains some metadata (small), a pair of merkle hash trees (one small, the other typically about 0.1% of the total filesize), and the blocks themselves. Our current downloader heroically tries to retrieve the absolute minimum number of bytes (and comically/tragically performs pretty badly as a result, due to the overhead of lots of little requests).

Yup. I have been through many possible configuration in my head, and not sure what you guys have tried. Thanks for your quick summary, it's difficult with large ongoing projects to jump in and find what is currently relevant.

My main realisation recently has been that if a sensible db/filesystem abstraction is made which works, then metadata can be stored in it, and this would lead to a very generic backend (each being part of the distributed cloud), potentially with everything else done at a client end, apart from exchanging objects between backend nodes and deprecation.

So anyway, one way of doing this would be to distinguish between a 'discrete' file type useful for a root metadata object pointing to what blocks make up a given 'file', and a 'aggregatable' type that could be a block including a bunch of parts of differing files. I imagine freenet's strategy was something like this. The idea being as you say, that checksum of a block is expensive, and files rarely ever being a fixed size, inevitably some of a file will be a chunk of data smaller than a block, so they can be aggregated a-la tail-packing in filesystems.

Aggregating content from different files improves distribution of the data in the net, but the overhead I think is what killed (kills?) freenet. So I don't know if you have performance measures. But it seems that such distribution may also help keeping the encryption opaque, or does it? If the block boundaries are guessable then I guess there is no advantage. And making them unguessable sounds expensive.

So if making block boundaries vary eg between nodes, and making them non-power-of-two is expensive, I guess this idea sucks? Perhaps the prefix being a varying sized chunk of metadata actually works better for that.

I haven't looked at the encryption side of things very much at all, but would like to figure out how to solve the current URI/API problem of hyperlinked user-created text in a way that allows either storage in the clear, or encrypted, without penalty either way, since I figure that should be possible. Or if not, to eliminate the possibility from the search-path of potential algorithms to adopt :) then I suppose it is best to define to URI schema that best serve the purpose of user-generated hyperlink text stored in either form.

So we might consider changing the dowloader design (and then the server API, and then the storage format) to fetch well-defined regions: fetch("metadta"), fetch("hashtrees"), fetch("block[N]").

Exactly what I have been imagining. :) If it can be well standardised and widely used, I would adopt this strategy too, but my priority remains that I want to make strongly archivable content easy for a user to use, and easy for them to verify. If a final file is expected to have hashsum XYZ then I would want that to be the part the user sees, then they can store a file in their homedir as a plain file, hashsum it, get XYZ and know it is what it says it is.

I think this can be easily done, if the filename of the metadata portion is exposed as hashsum:XYZ or equiv.

The other thing which is relevant is that users should be able to associate a suggested filename inline in the URI. RFC 6920 I think does not cater for this. This part of a URI should a) be optional, b) not really matter, c) should be able to handle storing more than one suggested filename for the object, although I think only one should be allowed in the URI itself.

Actually a server could ignore this part, perhaps even the client does not actually pass it to the server, but I guess a server should know to ignore it in case of user hacks assuming that it is used.

Going back to the encoding of the hashsum, RFC 6920 uses inefficient encodings, that make long URIs. Users typing content simply do not like long URIs. It doesn't matter that the RFC offers ways to help the user get it right, the fact is, nobody is going to use a scheme that is too clunky, unless they are made to, which is not something I am considering. The RFC authors perhaps, being large corporations in at least one case, can count on being able to make someone use their system.

So to keep URIs short, at least two features are needed, which RFC doesn't really do: most efficient URI-friendly hashsum encoding format, and secondly, flexible user-customisable mechanism for truncating a hashsum to a desired length.

Second part can be solved easily if a wildcard character is reserved, or probably better, 'match' instruction used, perhaps even as the default. Let users decide based on their context and the size of the db, how many characters of hashsum is enough for their confidence level. When a match hits multiple blocks, give some information to help the user decide which was the one they were after.

First part ... base64 is actually (very close to being?) completely URI-compatible but with some characters that are problems simply because they are stupid choices for a filename, eg slash '/'. Flickr have defined base58 which is analogous but uses only alpha-numeric characters. This would be one good choice. Another would be to use base64 and simply change disallowed characters to something else. I dunno what to call that, case64? hack64? There is actually something defined iirc, but I

The advantage to using base64 is that many hashsum generator binaries already can output base64 encoded hashsums. The disadvantage would be that if what we adopt is 'almost' base64, then when there is a small discrepency (say '-' in place of '/') the user might abort thinking it indicates corruption, without doing further research to find the cause of the larger similarity.

So I think even if adopting something 'very similar to' base64, a strong effort would be needed to get the variation recognised as a standard form of hashsum. Which I think would actually be very slow, it has taken years for even base64 itself to become common in hashsum generators. :-\ Then again, perhaps since the bulk of the code is there, a change could happen quickly, if a strong case can be made for why.

Wikipedia has a neat table showing variants of base64, some are well-standardised in RFCs, even for the explicite purpose of making URIs, https://en.wikipedia.org/wiki/Base64#Implementations_and_history

What I am not 100% confident about because right now my brain is too broken to think any more, is whether there are any characters left over for things like wildcards, separators, splitting etc. But perhaps none of those things are needed, if the URI schema is arranged well.

If we exposed those named regions as distinct files, then we wouldn't use HTTP Request-Range headers at all, we'd just fetch different filenames. The downside would be the filesystem overhead for storing separate small files instead of one big file, and the request overhead when you make multiple independent URL fetches instead of a single one (with a composite Requets-Range header).

Of course, each file need not be just a single block. The server could make an abstraction to provide 'filenames' for sub-blocks of actual on-disk files. Which could be either actual files, or aggregated blocks from different files. These are the types of arrangements, which, in straying from common semantics, become worthwhile I think only if one is willing to code up a few different scenarios, and make some kind of tests that will indicate what performance differences there are, if any. All worthwhile only if a good case can be made that it might actually help something. Haven't thought enough about it to know. :-\

And the server would have to be more aware of the share contents, which makes upgrades and version-skew a more significant problem.

I don't think it needs to know anything really. At most, two types of files, as I described earlier, and these could be kept in two separate subdirs (or giving them different filename extensions, but I prefer the former), ie the actual distinction between them could also be a matter for the client. Yes?

We could also just keep the shares arranged as they are, but change the downloader to fetch larger chunks (i.e. grab the whole hash tree once, instead of grabbing individual hashes just before each block), and then use separate HTTP requests for the chunks. That would reduce our use of Request-Range to a single contiguous span. If we could still pipeline the requests (or at least share the connection), it should be nearly as efficient as the discontiguous-range approach.

Yup. I like the idea of clients pro-actively requesting and caching content that they are likely to use, however that is done. And of course dropping unused content after some user-configurable time. But I don't think it's crucial to do so. If there's a graceful way that happens to force 'related' data to a client, and particularly if a client is also a node, then I think it is acceptable. I guess though that perhaps for legal/idealogical reasons, such behaviour might need to be optional, not stuff that the entire system is built on top of. So actually, stuff I haven't solved even ideologically. I guess if I haven't, then many users will be ambivalent and it's best to ensure such behaviours are configurable.

comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed at 2013-10-26T12:16:40Z by simeon

"...everything else done at a client end, apart from exchanging objects between backend nodes and deprecation."

Perhaps since deprecation is handled by the server, then locking/blocking should also be, which seems to be easy to use rw user perms for, which would mean having an API switch for a user to say mark file as -w or -r ... and this I think seems a nice way to solve that, because it means an actual UNIX user rather than an interface user, can do something quick to clean out a cache without losing things they have marked as 'keep'. But it's not a feature that I am married to. :) But I think the API switch would be useful, and this is one of the few factors that I think speak one way or the other about whether to store whole-files or arrays of chunks and so forth. The problem can still be solved in the array-of-chunks system, if the clients know to make such markings in a metadata file, and the server knows to look in metadata files before deprecating files ...

I guess its a feature that would be too costly UNLESS it was implemented via the filesystem flags.

For my usage case, I see the idea as one way of reducing the number of admin requests for file deletion. If the policy is 'you can delete it yourself' and 'you can lock it yourself', ie 'fight it out with other users (of your node)' that might be one way to run a low-admin node. Since the db is distributed, the meaning of 'deleted' 'not deleted' 'blocked' 'not blocked' are softened anyway, but I think people click 'complain' for irrational reasons, less than rational ones. A 'delete' button might asuage this. And would be useful for the case where it's a private db. I see this as a configurable option in my system, you could set it up various ways via a config file, eg only allow certain users to use the flags, allow everyone, don't allow anyone, don't even respect the flags if they are set already, etc.

Also allows a UNIX user/admin to override the behaviours without having to learn or expose themselves to the interface (which may be desirable for paranoia reasons eg virus, or technical ones like no js browser etc).

comment:32 in reply to: ↑ 31 ; follow-up: ↓ 33 Changed at 2013-10-26T17:03:26Z by simeon

Replying to simeon:

"...everything else done at a client end, apart from exchanging objects between backend nodes and deprecation."

Perhaps since deprecation is handled by the server, then locking/blocking should also be

Oh. And hashsumming of data, and making symlinks between a filename and a hashsum db blob. Your clients want to store data, so they want to know it was received intact, so you have to give them back a hashsum of the data. And often the data they want to upload already exists, so you can avoid the upload. But they want to use a different filename, so a symlink (or equivalent?).

But hashsumming on the server leads to a potential DoS. A malicious client can force the server to hashsum a lot by uploading arbitrary data, putting a compute-load on it. I suppose a hashcash system can be used to reduce the ability of clients to do this?

Perhaps this is an argument for finite-length blocks with the client being forced to at least provide hashsums as they are uploaded, but this adds complexity to the client and to the protocol, without really doing a lot to prevent DoS. Most clients are not CPU-bound, they are network-bound.

If so, a more effective way would be to force the client to download the data back, and not store it if they won't.

Combined with LRU deprecation these would reduce the ability of random clients to replace useful content with completely unuseful random crud.

In any case, I don't see any better way for the system to work, than for the server to hashsum the data ASAP after receipt, and report the value to the client prior to considering a transaction 'complete'.

Here is a picture of the pipeline I envisage

data->compress?->hashsum1->encrypt+hashsum2?->store->hashsum2+decrypt?->hashsum1->decompress?->data.

The compress stage is for two purposes, 1) to make the file small to decrease the cost of hashsum and encryption, and 2) to make the encryption as resilient as possible to deliberate attack (I am not a cryptographer, but it's my understanding that that's how things work). But the compress should probably only be done if it's useful. Either way, I guess leave that decision up to the client (which could be an intermediate transform web-application, rather than the endpoint, btw).

The current question is, which parts of this transform should be specified in the URI?

I think the URI must be capable of specifying the following (question-marks where I'm not sure the item is useful):

? file size or size-class ?
? checkdigit (a one-byte checksum of the hashsum, to detect user typos) ?
hash algorithm (if omitted, default to a pre-configured value)
truncatable hashsum (if omitted, query based on filename, if provided, and if allowed by config)
optional sequence number in case of hash collisions (not guaranteed to be consistent between nodes, or across time, server maintained, and when dups exist but one isn't specified, then return a list somehow to allow the client to choose)
optional filename

Certain metadata relating to files would be stored, in other ordinary db files requested using the same scheme.

Private metadata1

? magic number ?
hashsum2->hashsum1
encryption parameters
compression algorithm

Private metadata2

? magic number ?
mapping hashsum1->hashsum2[,hashsum2,hashsum2...]

btw I hate XML, so if you want to upset me, use XML for your metadata. I'll use something else for mine ;-)

The metadata files present a problem for keeping the encryption secure. If the file is to be constantly changed, the differing versions with differing content, but a lot of unchanged content and the same key each time, provides a way for an observer to break the encryption. Or if the key is changed often, a different pitfall presents, the encryptor potentially spends a lot of entropy, so the key is not very strong. I think.

So although it may seem optimal to aggregate metadata for many blobs into a single metadata file, perhaps not. Hard not to hit one of those two attack vectors, in this scenario. But if each file has a small metadata file, that small file could be encrypted using an expensive encryption algorithm, I guess. Not a cryptographer, so I could be very wrong.

For private use, the metadata need not be uploaded to the server at all. The user would need to be aware of how that works, and not lose the metadata, of course. Perhaps some combination of local caching and infrequent upload of metadata to the cloud would work?

Now a first guess (and now, later, a second one) is that to the backend, all of the information on which stages of the pipeline to apply, and what values to expect, is just cruft to be either ignored or stored without checking. That would make for a very simple and fast upload storage server.

There is no point having the client tell the server what hashsum to expect, because there is no defense against malicious clients in this case. A hashsum can only be calculated once the data is received. Or is this an argument for smallish blocks? Make the client provide a hashsum every 100k or whatever, and stop talking if they get one wrong? It still doesn't help much to prevent DoS, just means the client spends the same effort as the server. A DDoS still works.

Actually a malicious client would be best to upload spurious data WITH correct hashsum, so that the db fills up with rubbish. So there is no reason to worry overly much about checking hashsums on the server. Do or don't do, it doesn't affect the chance for a DoS. Am I right?

You may as well just store whatever you receive, hashsum it, label it so it can be retrieved by a client who asks using the hashsum, and leave it to the client to notice if the upload failed, ie if the data did not end up with the expected hashsum (if they even had calculated what to expect: a client could be dumb and just trust the server to return the correct value).

And this is why a LRU deprecation policy is needed. Because since there is no way for the server to know if the data offered will ever be used, it must know which data to drop if out-of-space. So drop oldest, least used data. More useful UNIX filesystem semantics: last-accessed, and archive-bit.

One could even accept uploads blindly, not even checksum on the server, just have client report one, and assume client tells truth about checksum, later LRU deprecate, leave it to other clients to integrity check what they receive? I doubt this method would make for happy users, there may be many DoS being passed on to other users. Hmm.

Perhaps modify by server verify hashsum only of files that survive the LRU? Or just have a lazy thread hashsumming in the background and deleting those that don't match whatever a client specified?

So possibly in the API the POST needs to specify the expected hashsum. Or not. *TODO which is easiest, which is best?

If you trust clients to indicate good/bad of files they get, it seems to me like just another thing they could lie about, just shifts the DoS. Same with flagging files that get many dropped downloads: a malicious client could make partial downloads on purpose to cause particular files to be flagged as bad. Storage space is cheap, and even cheaper in a distributed net (we hope!). These feedback/metrics systems might work with a signature/authority scheme, but then, same diff, you may as well put the trust earlier in the piece and flag bad uploads soon-as.

The case of breaking up files I think is just a usage case where the client does that, and stores the results into the server. You could, if this is planned, optimise the server to work best for specific file sizes. And in this situation, you might choose to look at this client as a transform layer, below the transform layer that accepts data from the user, and riding on top of the base layer of the storage server API. I think the URI schema should be generic, and yet succinct enough to cater to the top (human user) layer, and lower-layer uses should be specific subsets of that syntax, or with certain parameters filled with robotically-derived content.

Again, apologies for lengthy and perhaps unreadable text, too tired now to do anything except hit Submit Changes. ;-) Thanks for listening if you read this far, and I hope it's helpful!

comment:33 in reply to: ↑ 32 ; follow-up: ↓ 35 Changed at 2013-10-27T01:04:26Z by simeon

I've typed all this, but I'm getting tired again. Hopefully have transferred the important core points now, and I will attempt sometime to come back and clarify/summarise from this sketch. Let me know if you think I am beating a different path, and that summary can be put somewhere else where it won't clutter your system. :) For now, thanks for your thoughts and good luck with your project.

Replying to simeon:

Replying to simeon:

"...everything else done at a client end, apart from exchanging objects between backend nodes and deprecation."

Perhaps since deprecation is handled by the server, then locking/blocking should also be

Oh. And hashsumming of data, and making symlinks between a filename and a hashsum db blob.

I think the URI must be capable of specifying the following (question-marks where I'm not sure the item is useful):

I forgot the salt! ;-) ... and the user ID for mutable files, which should definitely be capable of being named the same as an immutable file in every other respect, so that a user can easily create a file in their directory with a name that they can see directly corresponds with a file in the cache.

So adding salt we have

optional salt
? file size or size-class ?
? checkdigit (a one-byte checksum of the hashsum, to detect user typos) ?
hash algorithm (if omitted, default to a pre-configured value)
truncatable hashsum (if omitted, query based on filename, if provided, and if allowed by config)
optional sequence number in case of hash collisions (not guaranteed to be consistent between nodes, or across time, server maintained, and when dups exist but one isn't specified, then return a list somehow to allow the client to choose)
optional filename

Salt has to be specified and stored where the server can find it easily, (ie as part of the object label in my model) if the server checks hashsums. If it does not, it could be hidden in client-only metadata, but I think for users it's useful to keep it as an obvious part of the object id ... we call the URI ... which they would see as 'filename'.

For usability I think this is too many fields, filesize and checkdigit probably have to be dropped. There are two way to make the URI, either strict field-ordering and using field-counting and hope the there is no need to extend the schema later... or with explicit query-labels.

Lots of examples to ponder

So what might a worst-case bungle of a compact usable and intuitive implementation would look like? I'm just gonna use an example, cause I forget how to do RFC syntax:

lafs:secretsalt;4k;XYZQabc1230;1;My+theory+by+Anne+Elque.html.gz

Where secretsalt is a custom salt; 4 is a single-byte file-size class; G is a checkdigit for the complete hashsum; XYZQabc1230 is the first 11 chars of the hashsum, 1 is because the current db was DoSed? with a manufactured item that happens to hit the same hashsum as the file we want; and the title is "My theory by Anne Elque".

With query labels, it goes something like

lafs:s=secretsalt;z=4;d=k;h=XYZQabc1230;f=My+theory+by+Anne+Elque;t=htmlgz

Or if we drop z and d, roll sequence-number into filename, and accept a default salt, perhaps for the normal case where the item does not have duplicate-hashsum siblings, we can use field counting to infer that the salt is default, things look OK even with field identifiers

lafs:h=XYZQabc1230;f=My+theory+by+Anne+Elque.html.gz

lafs:s=secretsalt;h=XYZQabc1230;f=My+theory+by+Anne+Elque.html.gz

for an autmoated transaction, the server need not pass a filename (nor the salt perhaps? depends on how we store items with different salts, and how much time we want to spend doing lookups), so a request could just be for

lafs:h=XYZQabc1230

Want to save two bytes? Make the default field h=

lafs:XYZQabc1230

Users who are familiar might find this more convenient that a URI that mentions a title.

If no h is provided, only an f, then optionally have the server do a lookup. This could be optimised by keeping a set of mutable, possible non-public files with reverse-lookup-mappings of filename to hashid.

Maybe the gz is a separate transform rather than a file extension, removed when passing the object back to the user? Unsure which way is better, the implications are not that important, but can become complex, eg a file could be a gzip, stored by the db would it be gzipped again? Without a way to specify so, then clients might do this. I don't think the server backend should add gzip, but as discussed in earlier posting, client should iff the content is compressible.

It's easy actually to test and compress if needed, so the client probably should test, but not testing might make the SCHEMA simpler, since the client would always assume it has/hasn't been gzipped. Not having a separate field means counting on the file extension as being correct, and this leads to problems if users misuse the file extension...

So this is a possible solution:

lafs:t=gz;s=secretsalt;h=XYZQabc1230;f=My+theory+by+Anne+Elque.html

on a web UI, this might be presented as

http://lafs-ui.com/htmlgz/secretsalt/XYZQabc1230;My+theory+by+Anne+Elque

and in the user's directory

http://lafs-ui.com/AElque@lafsmail.com/secretsalt/XYZQabc1230;My+theory+by+Anne+Elque

or probably they either use the default salt, or have a personal default configured somewhere in a client, so it can be shorter, like

http://lafs-ui.com/AElque@lafsmail.com/XYZQabc1230;My+theory+by+Anne+Elque

Does the lafs schema need to support user ids?

lafs:u=AElque@lafsmail.com;t=gz;s=secretsalt;h=XYZQabc1230;f=My+theory+by+Anne+Elque.html

Personally I think the email is preferrable to a PGP ID, easier for a user to understand. But if using an ID, I guess this might be

lafs:u=0xA72B89345;t=gz;s=secretsalt;h=XYZQabc1230;f=My+theory+by+Anne+Elque.html

I'm probably misusing the request here, not sure how you intend to expose the user directory. I'm guessing it makes sense only for mutable files, so the h= and s= go away, if the user wants to represent them, they are stored in the f= field,

lafs:u=AElque@lafsmail.com;t=gz;f=My+theory+by+Anne+Elque.html lafs:u=AElque@lafsmail.com;t=gz;f=s=secretsalt;h=XYZQabc1230;f=My+theory+by+Anne+Elque.html

Are we confused yet? ;-) I'm guessing it doesn't make sense to allow the schema to reference the mutable files, or else they need a specific schema. Certainly in any case, the user-definable field of the 'filename' needs to either be defined as potentially including strings that duplicate syntax elements, or the server does need logic to disallow or escape such entities. Myself, I prefer to say, filename always comes last, and can include any legal HTTP URI filename field characters. But a simple other way is to ensure that semi-colon is escaped within the field. I'm not gonna re-write the above URIs that way though, you have to use your imagination.

http://lafs-ui.com/AElque@lafsmail.com/XYZQabc1230;My+theory+by+Anne+Elque

http://lafs-ui.com/AElque@lafsmail.com/My+theory+by+Anne+Elque

Both of the above might be symlinks to the cache object item XYZQabc1230. The user might specifically include the truncated hashsum that would be seen on the cache item, but it's not needed because their homedir is a bunch of links/redirects to cache objects. Or is that too insecure? Again, my thoughts revolve around a public editing system, yours around a private data store. In this case, I would want to adopt the semantics that you would use.

I fear that having differing representations in web UI is counterproductive, so it's best if the syntax can be as brief, compact and intuitive and filename-friendly as possible. :)

Is it better then, to omit the (pretend) directory paths, and use fields instead? I think this is uglier, less clear, so which is actually better, ambiguity in the name of clarity?

http://lafs-ui.com/u=AElque@lafsmail.com;t=gz;h=XYZQabc1230;f=My+theory+by+Anne+Elque

I like the directory paths for some parameters, fields for others. Perhaps directory paths are OK where they would not be preserved in a save-file anyway? That means maybe more like this:

http://lafs-ui.com/AElque@lafsmail.com/My+theory+by+Anne+Elque

http://lafs-ui.com/gzhtml/h=XYZQabc1230;f=My+theory+by+Anne+Elque

Two views of the same object, with the 'client' being a web UI on the site lafs-ui.com, and one version being returned with mime-type html using HTTP header "Content-Encoding: gzip", the second returned using mime-type octet-stream and without the compression-encoding header, so the web browser would provide the user the raw blob.

http://lafs-ui.com/gzhtml/h=XYZQabc1230 http://lafs-ui.com/cache/h=XYZQabc1230

Do a lookup for files by name?

http://lafs-ui.com/search/f=My+theory+by+Anne+Elque

When the user presses 'save file', it's gonna depend on the context of their client how much of this gets passed through. I guess this means the server sets the Content-Disposition filename field to something comprehensive, and the client software filters it if the user prefers. Users who use things like lynx or telnet to grab a file have to wittle the fields down themselves. It's harder (unreliable) to reconstruct a field unless the server provides it, so I doubt that it's worth saving bytes by omitting or making the suggested filename minimalist. But perhaps it can be taken on context, return filled with fields the client mentioned? Then an internode-style transmission by hashsum-only has very little overhead, the entire field can be omitted.

In a user client, it should be configurable, to anything from the full lafs:blah down to just the filename portion, "My theory by Anne Elque", but as mentioned, the client shouldn't try to reconstruct fields that the server has not described, since that is not going to be easy to make correct in every case.

On disk

On disk on the server, the files can be stored efficiently using the hashsum to distribute the objects equally across a bunch of subdirectories. This is the real benefit of using the hash in the filename, as I guess you were aware at least tangentially, since I guess you understand the load-balancing aspect wrt distributed nodes. But it makes it easy on the filesystem to do look-ups as well.

The server node makes a translation when the client requests hashsum XYZQblah, to X/Y/XYZblah. Or for a larger installation, more subdirectories, perhaps XY/ZQ/blah, or X/Y/Z/XYZQblah. This is hidden from the user, and unless they are locally storing massive quantities of unsorted files in their homedir, they wouldn't want to or need to emulate.

So I'm guessing if custom salts are allowed that could be implemented on the server path as a root directory or parent directory tree of a similar type ... specifics would depend on how many users, how many files?

And on the server the full hash should be stored in the filename. Optionally it would be nice but not important to enable the user to configure to see the full hash in the 'save as' dialogue too.

Human-readable portion of filename I think is made using a link to that original object, resolved internally by the server to avoid excessive lag, with appended desired/suggested filename field. POSIX hardlinks vs symlinks vs hackish text file redirects have differing implications for ease-of-management, cross-platform compatibility, and for consistency wrt can one version disappear, be locked or blocked, while another remain. Ultimately it might be nice to have the different mechanisms as configurable policy options, otherwise the most sensible approach is to take the simplest path I guess.

The other fields

Below I've ranted a bit about the various fields that ultimately above I decided were too problematic to bother the user with. This may be interesting if you want to understand why, and it may be worth having the API support the ideas, perhaps they have uses behind the scenes or in special use-cases.

Another reason might be to help keep the schema overall compatible with RFC 6920, even if it involves some translation that can be automated at least in one direction (it's obviously not trivial to reconstruct a hashsum in a URI that has been truncated to a length that 6920 doesn't support, for example). I don't want to even think about it right now.

Here is my analysis, also don't want to revisit this right now, so it may have errors and stuff.

It's only in the case of a hash collision with differing content that the sequence number field would get used. Probably this option could be left out of an implementation, but it's good I think to plan for perhaps supporting it, in case an efficient mechanism is found for this DoS attack vector. (My gaze hovers over the bitcoin generators at this point.)

The file-size classifier byte is basically for the same purpose. I see bitcoin spawning methods for quickly generating massive numbers of hashes, which who knows, some people might be able to leverage to use as a library for hash collisions. (They use hash256 but can I think fairly easily change.) But these items are of a small, perhaps even fixed size I think. Including a filesize classification byte MAY be a way to get such hash-collisions to not matter. On the other hand, I am not that sure it's worth it, and dunno if others have done a rigorous analysis of such a strategy.

Checkdigit is for the human-typo factor. If the checkdigit does not match then the user can be alerted. However ... it's only able to be checked when the full hash is known, or else it would vary depending on the truncation length of the hashsum. And since it can't easily be verified by hand anyway, I'm again not all that sure if it's useful enough to be worth using. If the bitspace of the hashsum truncation length is sufficient, there should not be two items who are close enough together to be only a mere typo apart. So a dud request leads to a lookup in either case, and an error in either case.

Only if the client end is given the full hashsum by the user typing it in, does the benefit arise, where eg a js could alert the user prior to the client making the query to the server. I think this is useless, really. :) On the other hand, if you did implement things that way, it might be considered a life-saver by the poor human!

The usage I envisage would primarily be people copy-pasting URIs, or using phones to capture QRCode style links, more often than they would manually type one in. The main reason for truncating the hashsum is not to make it easier to type, but to make it easier to read! Humans stop reading after a lot of incomprehensible cruft. ;-) I hope this is not one of those cases. ;-) The URI has to fit neatly into the address bar, including enough of the actual human-generated filename for the reader to notice that it has one.

So I think checkdigit and filesize are important options for the URI schema to support, but maybe not essential to a given implementation.

comment:34 Changed at 2013-10-27T01:30:00Z by simeon

Not uploading if the content exists

The upload should of course be done in a way that benefits from the hashsum id. The client should be clever and check if it needs to upload by calculating the hashsum and doing a GET headers first on that blob.

When an insufficiently clever client (eg a non-js browser) does upload a dup, the server figures that out and re-uses the existing object, discarding the new one, but using whatever filename the client wants to make a new link if needed.

So ... the other thing needed is a way for the client to make a new link, without providing the actual content. Looks to me like in HTTP this has to be done via a GET perhaps to a specific query URI, or since it's a cheap operation, perhaps just make new names any time a user tries to GET something if they provide a hashsum and a filename?

As alluded to earlier, filename doesn't really actually have to exist on the server at all for the cache files, the value could be filled from the request header. But storing these on-disk on the server allows searching (until users clutter it up too much with rubbish names, so how useful this is depends on the user context).

Certainly the user wants to create links in their homedir. One solution is that the client is what handles those, rather than them being symlinks or whatever that the server can read. The client could use some kind of syntax like in the metadata files proposed earlier, to say, this object should be resolved by looking up this hashsum id. Then the user is just creating or overwriting a small plaintext metadata using a POST.

The same mechanism could be used for in-cache objects filenames, but I think the extra network requests to resolve links at the client end makes it an unattractive choice.

OK, now I think that's everything! ;-) ciao

comment:35 in reply to: ↑ 33 Changed at 2013-10-28T01:46:29Z by zooko

Replying to simeon:

Let me know if you think I am beating a different path, and that summary can be put somewhere else where it won't clutter your system. :)

Um, hey man, thanks for your interest! I appreciate the time you are taking to share your thoughts about this topic.

So, I haven't taken the time to read all of your comments here, but I did skim through them briefly and I didn't notice anything obviously crazy or stupid in the random sampling of sentences that I saw as I skimmed.

Now, this ticket — about using HTTP for the LAFS storage server protocol — is definitely not the best place for people to find your ideas and give you feedback on them. They are apparently an alternative to LAFS or a possible future variant of LAFS, so maybe one good place would be to post the design as a letter to the tahoe-dev mailing list. Other possibilities: the p2p-hackers mailing list (http://lists.zooko.com/mailman/listinfo/p2p-hackers), or Jack's (http://lists.randombit.net/mailman/listinfo/cryptography), or Perry's (http://www.metzdowd.com/mailman/listinfo/cryptography) crypto mailing lists.

comment:36 Changed at 2014-01-05T23:59:24Z by daira

[deleted]

Last edited at 2014-01-06T00:02:49Z by daira (previous) (diff)

comment:37 Changed at 2015-02-10T18:56:54Z by zooko

This would also help by removing an identity/linkage leak: #2384.

comment:38 Changed at 2015-03-19T00:54:54Z by warner

I was thinking about this again yesterday, and re-invented the MAC-with-WE scheme from comment:11 . But this time, since Curve25519 is so cheap these days, I figured we could stick with the same WE as before (derived from the writecap and the server ID), and encrypt it with the create-mutable-slot message, instead of retaining any connection-like state between the client and each server. Something like:

all reads are just GETs
immutable write is an account-signed netstring-encoded POSTs
mutable create-slot puts a Curve25519/ElGamal-encrypted WE into the account-signed netstring-encoded POST
mutable modify uses the WE to HMAC the netstring-encoded modification message. The HMACed message is then account-signed.

Moreover, I think the server's (signed) introducer announcement can be hosted over HTTP too, at a well-known URL. Then the server can be correctly described with the server's Ed25519 pubkey (the same one that signs announcements), and the URL where it lives. These two pieces of information could be put in a client-side statically-configured "known-servers" file, and the rest (including the Curve25519 pubkey for this encrypted-WE thing) gets fetched when the node starts up.

If we use the same writecap-derived WE (instead of switching to an ed25519 pubkey), we don't need the WE-migration process.

comment:39 Changed at 2015-07-24T01:33:05Z by warner

Note: to simplify tahoe's --listen configuration, we probably want the Foolscap listening port to handle both foolscap and HTTP (so we can just say tahoe --listen tcp:1234 instead of tahoe --listen-foolscap tcp:1234 --listen-http tcp:2345). foolscap#237 is about adding this feature.

comment:40 Changed at 2020-01-18T00:06:06Z by exarkun

https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/proposed/http-storage-node-protocol.rst is highly related.

comment:41 Changed at 2020-01-18T00:06:36Z by exarkun

Keywords leastauthority added

comment:42 Changed at 2020-01-18T00:06:47Z by exarkun

Owner changed from zooko to exarkun

comment:43 Changed at 2021-03-30T18:40:46Z by meejah

Milestone 2.0.0 deleted

Ticket retargeted after milestone closed (editing milestones)

comment:44 Changed at 2023-03-09T13:17:56Z by exarkun

Milestone set to HTTP Storage Protocol

comment:45 Changed at 2023-03-24T19:30:32Z by itamarst

Resolution set to fixed
Status changed from new to closed

Th is is ... done, I guess, or insofar as it isn't superseded by a bunch of other tickets.

Note: See TracTickets for help on using tickets.

Context Navigation

#510 closed enhancement (fixed)

use plain HTTP for storage server protocol

Description (last modified by daira)

Change History (45)

comment:1 Changed at 2008-09-10T20:20:57Z by zooko

comment:2 Changed at 2008-09-24T13:52:07Z by zooko

comment:3 Changed at 2010-02-23T03:09:25Z by zooko

comment:4 follow-up: ↓ 5 Changed at 2010-03-01T10:30:52Z by jrydberg

comment:5 in reply to: ↑ 4 Changed at 2010-03-02T03:08:01Z by davidsarah

comment:6 Changed at 2010-03-04T21:57:36Z by jsgf

comment:7 Changed at 2010-03-12T23:30:26Z by davidsarah

comment:8 Changed at 2010-08-15T04:58:06Z by zooko

comment:9 Changed at 2010-08-15T04:58:41Z by zooko

comment:10 Changed at 2010-11-05T13:18:50Z by davidsarah

comment:11 Changed at 2011-06-29T08:26:45Z by warner

comment:12 Changed at 2011-08-24T15:53:21Z by zooko

comment:13 Changed at 2011-08-29T00:16:13Z by taral

comment:14 Changed at 2011-09-01T21:24:09Z by zooko

comment:15 Changed at 2011-09-08T18:17:39Z by zooko

comment:16 Changed at 2011-09-08T19:02:58Z by zooko

comment:17 Changed at 2011-09-08T19:21:07Z by taral

comment:18 Changed at 2011-09-09T19:06:35Z by zooko

comment:19 Changed at 2011-09-09T19:11:43Z by zooko

comment:20 Changed at 2011-10-17T20:48:15Z by warner

comment:21 Changed at 2011-11-08T03:21:53Z by taral

comment:22 Changed at 2013-05-30T00:13:14Z by daira

comment:23 follow-up: ↓ 25 Changed at 2013-10-20T00:06:41Z by zooko

comment:24 Changed at 2013-10-20T00:07:11Z by zooko

comment:25 in reply to: ↑ 23 ; follow-up: ↓ 26 Changed at 2013-10-20T13:51:24Z by daira

comment:26 in reply to: ↑ 25 ; follow-up: ↓ 27 Changed at 2013-10-20T17:31:00Z by zooko

comment:27 in reply to: ↑ 26 ; follow-up: ↓ 28 Changed at 2013-10-24T07:35:01Z by simeon

comment:28 in reply to: ↑ 27 Changed at 2013-10-24T07:40:35Z by simeon

comment:29 follow-up: ↓ 30 Changed at 2013-10-24T18:34:24Z by warner

comment:30 in reply to: ↑ 29 ; follow-up: ↓ 31 Changed at 2013-10-26T12:02:04Z by simeon

comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed at 2013-10-26T12:16:40Z by simeon

comment:32 in reply to: ↑ 31 ; follow-up: ↓ 33 Changed at 2013-10-26T17:03:26Z by simeon

comment:33 in reply to: ↑ 32 ; follow-up: ↓ 35 Changed at 2013-10-27T01:04:26Z by simeon

Lots of examples to ponder

On disk

The other fields

comment:34 Changed at 2013-10-27T01:30:00Z by simeon

Not uploading if the content exists

comment:35 in reply to: ↑ 33 Changed at 2013-10-28T01:46:29Z by zooko

comment:36 Changed at 2014-01-05T23:59:24Z by daira

comment:37 Changed at 2015-02-10T18:56:54Z by zooko

comment:38 Changed at 2015-03-19T00:54:54Z by warner

comment:39 Changed at 2015-07-24T01:33:05Z by warner

comment:40 Changed at 2020-01-18T00:06:06Z by exarkun

comment:41 Changed at 2020-01-18T00:06:36Z by exarkun

comment:42 Changed at 2020-01-18T00:06:47Z by exarkun

comment:43 Changed at 2021-03-30T18:40:46Z by meejah

comment:44 Changed at 2023-03-09T13:17:56Z by exarkun

comment:45 Changed at 2023-03-24T19:30:32Z by itamarst

Download in other formats: