[tahoe-dev] [p2p-hackers] convergent encryption reconsidered
zooko
zooko at zooko.com
Sat Mar 22 18:48:13 PDT 2008
Jim:
Thanks for your detailed response on the convergent encryption issue.
In this post, I'll just focus on one very interesting question that
you raise: "When do either of these attacks on convergent encryption
apply?".
In my original note I was thinking about the allmydata.org "Tahoe"
Least Authority Filesystem. In this post I will attempt to follow
your lead in widening the scope. In particular GNUnet and Freenet
are currently active projects that use convergent encryption. The
learn-partial-information attack would apply to either system if a
user were using it with files that she intended not to divulge, but
that were susceptible to being brute-forced in this way by an attacker.
On Mar 20, 2008, at 10:56 PM, Jim McCoy wrote:
>
> On Mar 20, 2008, at 12:42 PM, zooko wrote:
>
>> Security engineers have always appreciated that convergent
>> encryption allows an attacker to perform a
>> confirmation-of-a-file attack -- if the attacker already knows
>> the full plaintext of a file, then they can check whether a
>> given user has a copy of that file.
>
> The truth of this depends on implementation details, and is an
> assertion that cannot be said to cover all or even most of the
> potential use-cases for this technique.
You're right. I was writing the above in the context of Tahoe,
where, as Brian Warner explained, we do not attempt to hide the
linkage between users and ciphertexts. What I wrote above doesn't
apply in the general case.
However, there is a very general argument about the applicability of
these attacks, which is: "Why encrypt?".
If your system has strong anonymity properties, preventing people
from learning which files are associated with which users, then you
can just store the files in plaintext.
Ah, but of course you don't want to do that, because even without
being linked to users, files may contain sensitive information that
the users didn't intend to disclose. But if the files contain such
information, then it might be acquired by the learn-partial-
information attack.
When designing such a system, you should ask yourself "Why
encrypt?". You encrypt in order to conceal the plaintext from
someone, but if you use convergent encryption, and they can use the
learn-partial-information attack, then you fail to conceal the
plaintext from them.
You should use traditional convergent encryption (without an added
secret) if:
1. You want to encrypt the plaintext, and
2. You want convergence, and
3. You don't mind exposing the existence of that file (ignoring the
confirmation-of-a-file attack), and
4. You are willing to bet that the file has entropy from the
attacker's perspective which is greater than his computational
capacity (defeating the learn-partial-information attack).
You should use convergent encryption with an added secret (as
recently implemented for the Tahoe Least Authority Filesystem) if:
1. You want to encrypt the plaintext, and
2. You want convergence within the set of people who know the added
secret, and
3. You don't mind exposing the existence of that file to people in
that set, and
4. You are willing to disclose the file to everyone in that set, or
else you think that people in that set to whom you do not wish to
disclose the file will not try the learn-partial-information attack,
or if they do that the file has entropy from their perspective which
is greater than their computational capacity.
I guess the property of unlinkability between user and file addresses
issue 3 in the above list -- the existence of a file is a much less
sensitive bit of information than the existence of a file in a
particular user's collection.
It could also effect issue 4 by increasing the entropy the file has
from an attacker's perspective. If he knows that the ciphertext
belongs to you then he can try filling in the fields with information
that he knows about you. Without that linkage, he has to try filling
in the fields with information selected from what he knows about all
users. But hiding this linkage doesn't actually help in the case the
attacker is already using everything he knows about all users to
attack all files in parallel.
Note that using an added secret does help in the parallel attack
case, because (just like salting passwords) it breaks the space of
targets up into separate spaces which can't all be attacked with the
same computation.
> The first problem is isolating the original
> ciphertext in the pool of storage. If a file is encrypted using
> convergent encryption and then run through an error-correction
> mechanism to generate a number of shares that make up the file an
> attacker first needs to be able to isolate these shares to generate
> the orginal ciphertext. FEC decoding speeds may be reasonably fast,
> but they are not without some cost. If the storage pool is
> sufficiently large and you are doing your job to limit the ability of
> an attacker to see which blocks are linked to the same FEC operation
> then the computational complexity of this attack is significantly
> higher than you suggest.
The attacker can do this job more easily, in two ways:
1. He doesn't need to erasure-decode in order to check whether a
given erasure-coded share was generated from a given plaintext. He
can work forward from guessed-plaintext to encryption key to partial
ciphertext to partial erasure coded share, and check that. (Brian
Warner already explained that this is actually even easier for an
attacker in Tahoe, because he can then go from encryption key to
storage index and check that, but in this post I'm trying to address
the more general case.)
2. In the "parallel attack", he doesn't need to figure out which
erasure-coded shares correspond to which files. For example, suppose
he collects the first 32 bytes of many "blocks", where a block is the
output of erasure coding after encryption. Then he tries plausible
plaintexts, generates the encryption key, generates the first few
bytes of ciphertext, and then generates the first 32 bytes of erasure
coded share. (The details vary here depending on the encryption and
erasure coding, but you see that for typical encryption and typical
erasure coding, this is much less work than you might think.) Now he
checks if the 32 bytes that he generated appear in the set of 32-byte
block headers that he collected. If he gets a match, then he has
learned the full contents of a file in the system, although he
doesn't know which file or which user.
Regards,
Zooko
More information about the tahoe-dev
mailing list