[tahoe-dev] how to encrypt and integrity-check with only one value (was: Re: two reasons not to use semi-private keys in our new cap design)
Zooko Wilcox-O'Hearn
zooko at zooko.com
Thu Sep 10 09:13:49 PDT 2009
Dear David-Sarah and Brian:
Hello, I am slowly catching up on the burst of crypto cap creativity
that you two posted over the last few days.
On Monday,2009-09-07, at 1:48 , Brian Warner wrote:
> * we can't determine the storage-index until after we've encoded the
> entire file (which generally means after we've uploaded it). So we
> need a new uploader protocol that lets us upload to an as-yet-
> unnamed
> slot, and then provide the slot's storage-index at the very end of
> the process. This is more work, but it isn't a huge deal.
Remember that I really, really want this anyway, because this is
necessary to have "one-pass" == "on-line" upload. Imagine that you
are a tiny embedded machine with little RAM and little or no disk.
Your client opens an HTTP connection to you and starts uploading the
plaintext of a huge file, expecting you to store it on a Tahoe-LAFS
grid. You need to (a) pick a random encryption key, (b) perform
encryption, erasure-coding, and computation of the verification data,
(c) send the resulting encrypted shares and verification data to
storage servers. You have to do all of this in an "on-line" way,
i.e. you can't store a lot of intermediate data somewhere while
waiting to see the end of the plaintext. Then, (d) return the
resulting read-cap to the client as quickly as possible after the
client finishes sending you the plaintext. This is ticket #320.
> * we wouldn't be able to directly use our permuted-list Tahoe2
> peer-selection protocol, since we won't know the storage-index (and
> thus the permuted list) until after we've uploaded all the
> shares. I
> think we'd have to go with the "server-selection-index" idea: a
> much
> shorter string (since it only needs to provide load-balancing, not
> collision resistance), either randomly generated or derived from a
> salted CHK hash (and thus computable before encoding/upload),
> used to
> permute the peerlist. This string must be included in the readcap,
> increasing it's length, but we could probably get away with
> maybe 20
> bits or so.
Argh! You are right! Another few bits needed in the readcap! Boo
hoo. :-(
> So, while I like the one-cryptovalue trick, I'm unsatisfied with both
> the lack of server-side validation and offline readcap-to-verifycap
> attenuation, and the separate SSI value makes me slightly nervous.
Re: server-side validation, what do you think of my proposal in [1]?
It lets the server fully validate the verify-cap, and readers carry
around just enough of the verify cap to give themselves a massive
advantage (a million to one) over DoS'ers.
Re: offline diminishing readcap-to-verifycap, I liked your and David-
Sarah's comments about storing the verifycap with the readcap
sometimes. In general, each kind of cap could have a base part --
the minimal information which is necessary and sufficient to be a cap
(assuming full access to servers) -- plus it could have an "extended"
part -- pieces that you can always get from the servers if you have
the base part, but you can save round-trips if you have the extended
part. For read-caps, the minimal part could be the crypto value, the
server-selection-index (boo hoo) and a 20-bit prefix of the
verifycap. The extended part could be the full verify-cap and the
k_enc. Or maybe the extended part could be the full public key and
the read key!
Then it would be up to the user of the cap to decide whether to use
the smallest possible cap or to use the extended cap in order to save
round-trips when dereferencing or diminishing it.
Re: separate SSI (server-selection-index) value, what makes you
nervous about it? Personally, I like the idea of separating the data
(crypto) layer from the network (server-selection) layer. Some grids
might have a server-selection policy that you always query the
servers in increasing order of network round trip time, regardless of
which cap you are looking for. Those grids wouldn't need a server-
selection-index at all. Others might accompany each of their caps
with a description of which servers each share was last seen on.
That would be in a sense a very large, optional SSI. (Hm, and it
would act a bit like a slow, persistent BitTorrent tracker. :-))
Is the fact that people might eventually use such crazy server-
selection policies (that we haven't yet vetted) one of the things
that makes you nervous about separating out the SSI? :-)
Regards,
Zooko
[1] http://allmydata.org/pipermail/tahoe-dev/2009-September/002829.html
tickets mentioned in this letter:
http://allmydata.org/trac/tahoe/ticket/320 # add streaming (on-line)
upload to HTTP interface
More information about the tahoe-dev
mailing list