[tahoe-dev] Question about convergence keys
Jeremy Fitzhardinge
jeremy at goop.org
Tue Aug 12 19:19:24 PDT 2008
Brian Warner wrote:
> However, the null key is pretty guessable, so you're effectively allowing the
> whole world to participate in your "convergence domain".
>
> As zooko described elsewhere, the convergence domain is the set of people
> with whom you share two properties:
>
> 1: your uploads will converge with theirs, allowing you to save backend
> storage space and bandwidth when uploading identical files
> 2: the other people will be able to mount a partial-information guessing
> attack against your files: the public information about your uploaded
> file (like the storage index)[1] will reduce the work they need to do
>
Yes, that's acceptable in the use-case I'm considering. Basically, I'm
thinking that in the population of users we'd be dealing with, there
would be a high likelihood of a large amount of shared content; OS
installs, common media, etc. That is, the data itself is probably
public anyway.
I guess if you want to store a mixture of small really confidential data
and large semi-confidential/public data, then you'd create two nodes
with distinct convergence keys. Or is there some more subtle way of
achieving the same result?
> Also note that convergence is not necessarily as big a win as you might want.
> If both Alice and Bob have a bunch of identical files on their disk and are
> uploading them, then yeah, but in some quick tests on allmydata customer data
> we found the space savings to be less than 1%. You might want to do some
> tests first (hash all your files, have your friends do the same, measure the
> overlap) before worrying about sharing convergence secrets.
>
Yes, that would be an interesting experiment to perform anyway.
> [2]: the actual specification is in allmydata/util/hashutil.py:132, in the
> convergence_hash() function, and is:
>
> t = "allmydata_immutable_content_to_key_with_added_secret_v1+"
> t += netstring(convergence_secret)
> t += netstring("%d,%d,%d" % (k,N,segsize))
> return SHA256d(netstring(t) + file)
>
> We use netstrings and SHA256d (instead of plain SHA256) to avoid "chosen
> protocol attacks", which would allow two different files to wind up with
> the same hash.
>
OK, that's what I was hoping. The key isn't exactly the file hash, so
knowing the bare file hash doesn't let you decrypt it.
Thanks,
J
More information about the tahoe-dev
mailing list