[tahoe-dev] how to encrypt and integrity-check with only one value
Brian Warner
warner at lothar.com
Mon Sep 7 00:48:53 PDT 2009
Zooko Wilcox-O'Hearn wrote:
> Now, convergent encryption could do both jobs with one value! If you
> let the symmetric key be the secure hash of the plaintext, then the
> reader could use the symmetric key to decrypt, then verify that the
> key was the hash of the plaintext.
In addition to the other reasons you listed, you might not be able to
use this because of alacrity: a CHK hash can't be validated until the
entire plaintext has been downloaded. OTOH, it's conceivable that you
could build up a plaintext merkle tree with about the same effort as the
normal CHK flat hash, and use the root of that as your encryption key,
and safely encrypt the plaintext hash tree in a way that lets you grab
it quickly (one node at a time). It'd be kinda complex, but that might
let you use CHK-like encryption keys that also gave you low-alacrity
integrity properties.
> Here's my idea about ensuring both confidentiality and integrity with
> a single crypto value.
Ah, good, thanks for writing this up. I certainly like your scheme
better than the fragments of your scheme that I was able to reconstruct
from a memory of a vague conversation :-). I'll try to update
NewImmutableEncodingDesign in the next few days with your algorithm.
Some observations:
* obviously the "v = H(ciphertext)" could+should be expanded to include
our usual UEB scheme, with all integrity information (merkle trees,
share hash trees, ideally even an encrypted form of the plaintext
hash data) going into the UEB, and "v" being the hash of the UEB.
David-Sarah's point about making verifycap=H(v,K1enc) is spot-on.
* verifycap cannot be offline-derived from readcap: you have to run
through part of the download process, fetch at least "v" and the
K1enc value, derive K1, hash K1+v together to confirm that you really
do get the readcap, then emit H(v+K1enc) as the verifycap. This makes
manifest/repaircap generation really expensive (a network trip per
file). One mitigation strategy would be to store both readcap and
verifycap in dirnodes, effectively caching the verifycap computation.
* what should the storage-index be? It clearly must be the hash of the
readcap, otherwise readers cannot find the shares (or must carry
around some extra value, negating the shortness of the readcap).
* but since storage-index != verifycap (i.e. H(UEBhash+k1enc)), servers
will be unable to completely validate their shares. They can confirm
that everything (including K1enc, thanks to David-Sarah's suggestion)
matches the verifycap, but they can't tell that the verifycap matches
the storage-index under which the share is stored (i.e. they'd be
unable to detect two swapped sharefiles). This permits the
"roadblock" attack and generally misses our goals of allowing full
server-side validation.
* we can't determine the storage-index until after we've encoded the
entire file (which generally means after we've uploaded it). So we
need a new uploader protocol that lets us upload to an as-yet-unnamed
slot, and then provide the slot's storage-index at the very end of
the process. This is more work, but it isn't a huge deal.
* we wouldn't be able to directly use our permuted-list Tahoe2
peer-selection protocol, since we won't know the storage-index (and
thus the permuted list) until after we've uploaded all the shares. I
think we'd have to go with the "server-selection-index" idea: a much
shorter string (since it only needs to provide load-balancing, not
collision resistance), either randomly generated or derived from a
salted CHK hash (and thus computable before encoding/upload), used to
permute the peerlist. This string must be included in the readcap,
increasing it's length, but we could probably get away with maybe 20
bits or so.
So, while I like the one-cryptovalue trick, I'm unsatisfied with both
the lack of server-side validation and offline readcap-to-verifycap
attenuation, and the separate SSI value makes me slightly nervous.
Incidentally, I kind of suspect that we could get away with longer
immutable readcaps if we had short directory readcaps, since I imagine
that people are more likely to share with dircaps (which get you
filenames) than with the raw filecaps. On the other hand, I fear that we
have even fewer tricks available for mutable encoding schemes, unless
semiprivate keys work out.
cheers,
-Brian
More information about the tahoe-dev
mailing list