[tahoe-dev] convergent encryption reconsidered -- salting and key-strengthening
zooko
zooko at zooko.com
Sun Mar 30 18:37:15 PDT 2008
[This conversation is spanning three mailing lists --
cryptography at metzdowd.com, p2p-hackers at lists.zooko.com, and tahoe-
dev at allmydata.org . Some of the posts have not reached all three of
those lists. I've manually added Jerry Leichter and Ivan Krstić to
the approved-senders set for p2p-hackers and tahoe-dev, so that
further posts by them will appear on those lists.]
> On Mar 30, 2008, at 3:13 PM, Ivan Krstić wrote:
>
> Unless I'm misunderstanding Zooko's writeup, he's worried about an
> attacker going from a partially-known plaintext (e.g. a form bank
> letter) to a completely-known plaintext by repeating the following
> process:
>
> 1. take partially known plaintext
> 2. make a guess, randomly or more intelligently where possible,
> about the unknown parts
> 3. take the current integrated partial+guessed plaintext, hash
> to obtain convergence key
> 4. verify whether that key exists in the storage index
> 5. if yes, you've found the full plaintext. if not, repeat from '2'.
>
> That's a brute force search.
That's right. Your comparison to normal old brute-force/dictionary
attack on passwords is a good one, and one that Jim McCoy and Jerry
Leichter have alluded to as well.
Think of it like this:
Passwords are susceptible to brute-force and/or dictionary attack.
We can't, in general, prevent attackers from trying guesses at our
passwords without also preventing users from using them, so instead
we employ various techniques:
* salts (to break up the space of targets into subspaces, of which
at most one can be targeted by a given brute-force attack)
* key strengthening (to increase by a constant factor the cost of
checking a password)
* rate-limits for on-line tries (i.e., you get only a small fixed
number of wrong guesses in a row before you are locked out for a time-
out period)
However, secrets other than passwords are not usually susceptible to
such attacks. You can store your True Name, credit card number, bank
account number, mother's maiden name, and so forth, on the same
server as your password, but you don't have to worry about using
salts or key strengthening on those latter secrets, because the
server doesn't run a service that allows unauthenticated remote
people to connect, submit a guess as to their value, and receive
confirmation, the way it does for your password. (In other words,
for such data we generally use an extreme form of the third defense,
rate-limiting tries -- the number of guesses that an attacker gets is
limited to none!)
Likewise, if you are going to store or transmit those kinds of
secrets in encrypted form using a traditional randomly-generated
encryption key, then you don't have to worry about brute-force/
dictionary attacks because your strong randomly-selected symmetric
encryption key defies them.
The Key Point:
*** Convergent encryption exposes whatever data is put into it to
the sorts of attacks that already apply to passwords.
Convergent encryption had been invented, analyzed and used for many
years, but to the best of my knowledge the first time that anyone
noticed this issue was March 16 of this year (at 3 AM Chicago Time),
when Drew Perttula and Brian Warner made that observation. (Although
to be fair some of the best-known uses of convergent encryption
during these years have been sharing publicly-known files with
strangers, in which case I suppose it is assumed that the cleartext
does not contain secrets.)
Now PBKDF2 is a combination of the first two defenses -- salting and
key strengthening. When you first suggested PBKDF2, I -- and
apparently Jerry Leichter -- thought that you were suggesting its
salting feature as a solution. The solution that we've come up with
for Tahoe (described in my original note) is much like salting,
except that the added value that gets mixed in is secret and
unguessable, where I normally think of salt as non-secret.
Now I see that you are also emphasizing the key strengthening feature
of PBKDF2.
"k" denotes symmetric encryption key, "p" denotes plaintext, "c"
denotes ciphertext, "s" denotes salt, "E(key, plaintext)" is
encryption, "H()" is secure hashing, "H^1000()" is secure hashing a
thousand times over, i.e."H(H(H(H(...))))" a thousand times.
Traditional encryption:
k = random()
c = E(k, p)
Traditional convergent encryption:
k = H(p)
c = E(k, p)
Tahoe-style convergent encryption with added secret ("s"); "s" can
be re-used for any number of files, but it is kept secret from
everyone except those with whom you wish to converge storage.
s = random()
k = H(s, p)
c = E(k, p)
PBKDF2 (simplified); "s" can be re-used but is generally not, and it
is not secret.
s = random()
k = H^1000(s, password)
c = E(k, p)
Now, one could imagine a variant of traditional convergent encryption
which added key strengthening:
k = H^1000(p)
c = E(k, p)
This would have a performance impact on normal everyday use of Tahoe
without, in my current estimation, making a brute-force/dictionary
attack infeasible. Key strengthening allows you to choose an amount
of wasted CPU that you are willing to impose on your users during
normal use, and multiply the attacker's costs by exactly that
amount. If the attacker has 2^64 computational capacity, and the
users are willing to waste 2^10 extra computrons on each file access,
then the attacker's effective capacity is reduced to 2^54.
The trade-off is actually worse than it appears since the attacker is
attacking multiple users at once (in traditional convergent
encryption, he is attacking *all* users at once), so he gains an
economy of scale, and can profitably invest in specialized tools,
even specialized hardware such as a COPACOBANA [1]. At the very
least he can profitably devote many CPU cores to churning out new
guesses 24/7, while for normal users it is not profitable to allocate
a 24/7 CPU load to strengthening their keys. The reason for this
disparity is that the attacker gets to attack everyone at once for
the same cost as attacking only one target, where the defenders have
to pay each for his own defense.
Next, one could imagine a variant of Tahoe's convergent encryption
with added secret which adds key strengthening:
s = random()
k = H^1000(s, p)
c = E(k, p)
This would likewise be costly to normal users, but moreover it is not
needed because the "s = random()" part of the algorithm locks out all
attackers except those with whom s is shared from mounting such an
attack at all.
Thank you for your comments on this issue. If you have further
ideas, especially as would be relevant to the Tahoe Least-Authority
Filesystem, I would love to hear them.
Regards,
Zooko O'Whielacronx
[1] http://copacobana.org/
More information about the tahoe-dev
mailing list