[tahoe-dev] convergent encryption reconsidered -- salting and key-strengthening

Sun Mar 30 18:37:15 PDT 2008

[This conversation is spanning three mailing lists --  
cryptography at metzdowd.com, p2p-hackers at lists.zooko.com, and tahoe- 
dev at allmydata.org .  Some of the posts have not reached all three of  
those lists.  I've manually added Jerry Leichter and Ivan Krstić to  
the approved-senders set for p2p-hackers and tahoe-dev, so that  
further posts by them will appear on those lists.]

> On Mar 30, 2008, at 3:13 PM, Ivan Krstić wrote:
>
> Unless I'm misunderstanding Zooko's writeup, he's worried about an
> attacker going from a partially-known plaintext (e.g. a form bank
> letter) to a completely-known plaintext by repeating the following
> process:
>
> 1. take partially known plaintext
> 2. make a guess, randomly or more intelligently where possible,
>     about the unknown parts
> 3. take the current integrated partial+guessed plaintext, hash
>     to obtain convergence key
> 4. verify whether that key exists in the storage index
> 5. if yes, you've found the full plaintext. if not, repeat from '2'.
>
> That's a brute force search.

That's right.  Your comparison to normal old brute-force/dictionary  
attack on passwords is a good one, and one that Jim McCoy and Jerry  
Leichter have alluded to as well.

Think of it like this:

Passwords are susceptible to brute-force and/or dictionary attack.   
We can't, in general, prevent attackers from trying guesses at our  
passwords without also preventing users from using them, so instead  
we employ various techniques:

  * salts (to break up the space of targets into subspaces, of which  
at most one can be targeted by a given brute-force attack)
  * key strengthening (to increase by a constant factor the cost of  
checking a password)
  * rate-limits for on-line tries (i.e., you get only a small fixed  
number of wrong guesses in a row before you are locked out for a time- 
out period)

However, secrets other than passwords are not usually susceptible to  
such attacks.  You can store your True Name, credit card number, bank  
account number, mother's maiden name, and so forth, on the same  
server as your password, but you don't have to worry about using  
salts or key strengthening on those latter secrets, because the  
server doesn't run a service that allows unauthenticated remote  
people to connect, submit a guess as to their value, and receive  
confirmation, the way it does for your password.  (In other words,  
for such data we generally use an extreme form of the third defense,  
rate-limiting tries -- the number of guesses that an attacker gets is  
limited to none!)

Likewise, if you are going to store or transmit those kinds of  
secrets in encrypted form using a traditional randomly-generated  
encryption key, then you don't have to worry about brute-force/ 
dictionary attacks because your strong randomly-selected symmetric  
encryption key defies them.

The Key Point:

  *** Convergent encryption exposes whatever data is put into it to  
the sorts of attacks that already apply to passwords.

Convergent encryption had been invented, analyzed and used for many  
years, but to the best of my knowledge the first time that anyone  
noticed this issue was March 16 of this year (at 3 AM Chicago Time),  
when Drew Perttula and Brian Warner made that observation.  (Although  
to be fair some of the best-known uses of convergent encryption  
during these years have been sharing publicly-known files with  
strangers, in which case I suppose it is assumed that the cleartext  
does not contain secrets.)

Now PBKDF2 is a combination of the first two defenses -- salting and  
key strengthening.  When you first suggested PBKDF2, I -- and  
apparently Jerry Leichter -- thought that you were suggesting its  
salting feature as a solution.  The solution that we've come up with  
for Tahoe (described in my original note) is much like salting,  
except that the added value that gets mixed in is secret and  
unguessable, where I normally think of salt as non-secret.

Now I see that you are also emphasizing the key strengthening feature  
of PBKDF2.

"k" denotes symmetric encryption key, "p" denotes plaintext, "c"  
denotes ciphertext, "s" denotes salt, "E(key, plaintext)" is  
encryption, "H()" is secure hashing, "H^1000()" is secure hashing a  
thousand times over, i.e."H(H(H(H(...))))" a thousand times.

Traditional encryption:

k = random()
c = E(k, p)

Traditional convergent encryption:

k = H(p)
c = E(k, p)

Tahoe-style convergent encryption with added secret ("s");  "s" can  
be re-used for any number of files, but it is kept secret from  
everyone except those with whom you wish to converge storage.

s = random()
k = H(s, p)
c = E(k, p)

PBKDF2 (simplified);  "s" can be re-used but is generally not, and it  
is not secret.

s = random()
k = H^1000(s, password)
c = E(k, p)

Now, one could imagine a variant of traditional convergent encryption  
which added key strengthening:

k = H^1000(p)
c = E(k, p)

This would have a performance impact on normal everyday use of Tahoe  
without, in my current estimation, making a brute-force/dictionary  
attack infeasible.  Key strengthening allows you to choose an amount  
of wasted CPU that you are willing to impose on your users during  
normal use, and multiply the attacker's costs by exactly that  
amount.  If the attacker has 2^64 computational capacity, and the  
users are willing to waste 2^10 extra computrons on each file access,  
then the attacker's effective capacity is reduced to 2^54.

The trade-off is actually worse than it appears since the attacker is  
attacking multiple users at once (in traditional convergent  
encryption, he is attacking *all* users at once), so he gains an  
economy of scale, and can profitably invest in specialized tools,  
even specialized hardware such as a COPACOBANA [1].  At the very  
least he can profitably devote many CPU cores to churning out new  
guesses 24/7, while for normal users it is not profitable to allocate  
a 24/7 CPU load to strengthening their keys.  The reason for this  
disparity is that the attacker gets to attack everyone at once for  
the same cost as attacking only one target, where the defenders have  
to pay each for his own defense.

Next, one could imagine a variant of Tahoe's convergent encryption  
with added secret which adds key strengthening:

s = random()
k = H^1000(s, p)
c = E(k, p)

This would likewise be costly to normal users, but moreover it is not  
needed because the "s = random()" part of the algorithm locks out all  
attackers except those with whom s is shared from mounting such an  
attack at all.

Thank you for your comments on this issue.  If you have further  
ideas, especially as would be relevant to the Tahoe Least-Authority  
Filesystem, I would love to hear them.

Regards,

Zooko O'Whielacronx

[1] http://copacobana.org/