[tahoe-dev] [p2p-hackers] convergent encryption reconsidered

Sat Mar 22 23:36:35 PDT 2008

On Sat, Mar 22, 2008 at 6:48 PM, zooko <zooko at zooko.com> wrote:
> [...]
>  However, there is a very general argument about the applicability of
>  these attacks, which is: "Why encrypt?".
>
>  If your system has strong anonymity properties, preventing people
>  from learning which files are associated with which users, then you
>  can just store the files in plaintext.
>
>  Ah, but of course you don't want to do that, because even without
>  being linked to users, files may contain sensitive information that
>  the users didn't intend to disclose.  But if the files contain such
>  information, then it might be acquired by the learn-partial-
>  information attack.

I can't really agree with this.  The contents of the files themselves
contain contextual information which can be used to link the disparate
pieces together.  Here is the simple counter-example:  would you
rather hand me a copy of your unencrypted disk with the inode tables
removed or a copy of your encrypted disk with the inode tables
removed?  In the former case I know that it is not too difficult to
recover a huge portion of the original files (having learned during
many all-night sessions recovering disks over twenty years that it not
as difficult as it first seems when your disk fails its fsck) but in
the latter case I am face a much more difficult task even if you use
the same key for AES-CTR on each of the files.

>  > [...] FEC decoding speeds may be reasonably fast,
>  > but they are not without some cost.  If the storage pool is
>  > sufficiently large and you are doing your job to limit the ability of
>  > an attacker to see which blocks are linked to the same FEC operation
>  > then the computational complexity of this attack is significantly
>  > higher than you suggest.
>
>  The attacker can do this job more easily, in two ways:
>
>  1.  He doesn't need to erasure-decode in order to check whether a
>  given erasure-coded share was generated from a given plaintext.  He
>  can work forward from guessed-plaintext to encryption key to partial
>  ciphertext to partial erasure coded share, and check that.  [...]
>
>  2.  In the "parallel attack", he doesn't need to figure out which
>  erasure-coded shares correspond to which files. [...]

As you just stated, the partial erasure coded share needs to be
generated.  Last I checked your erasure coding mechanism is a lot
slower than your encryption method.  You are also increasing the
memory/storage requirements for this attack.  I was not doubting that
it was possible to confirm the existence of a file nor that in some
cases this rather far-fetched attack was possible, I was stating that
your suggestion that this particular chosen plaintext variant is a
serious risk which makes convergent encryption somehow untrustworthy
was an overstatement that was easier to fix using mechanisms which did
not throw out the benefits of convergent encryption.

This flaw in convergent encryption is one that needs to be
acknowledged and addressed by those designing systems using this
property, but it appears to be a problem that a future designer should
consider and solve early so that they do not need to face the
either/or choice you appear to have run into given your current
design.

jim