[tahoe-dev] notes from the Tahoe-LAFS Weekly Dev Call, 2012-09-11

Tue Sep 11 20:57:03 UTC 2012

On 9/11/12 9:59 AM, Zooko Wilcox-O'Hearn wrote:

> • topic: The compression attack on HTTPS.

> The drewp defense -- the Added Convergence Secret -- is exactly the
> thing that creates independent compression contexts in order to limit
> the scope for attack.

Nope. Drewp's defense increases the total entropy of the input to the
convergence function, but that only helps if you're limited to guessing
that input all-at-once. The compression attack lets you guess the secret
input incrementally (just like a padding-oracle attack), so the
attacker's job is linear, not exponential. Adding 256 bits of
unguessable secret merely adds about 256 extra guesses to their
workload.

The defense is either to prevent the mixing of secret and adaptive
attacker-supplied data in the same compression context (i.e. the same
file), or to prevent the attacker from measuring the length of the
resulting compressed data.

> Zooko and (perhaps to a lesser degree) Brian are uncomfortable with
> the fact that LAFS currently exposes the length of your plaintext, to
> the byte level of precision. Zooko wants to add padding.

I'm vaguely uncomfortable with it, but I'm more uncomfortable with some
of the alternatives. Exposing the exact byte-length of the plaintext is
easy to implement (the alternatives are harder to implement) and easy to
explain to users ("we expose the exact byte-length of your plaintext,
and if you append N bytes of attacker-supplied data, we'll expose
length+N"). Compressing the data first might have value (for large
fluffy data, but we think most large data is already mp3/jpg
compressed), but is harder to explain: "we'll leak the length of a
gzipped form of your data, which exposes some obscured combination of
the actual length of your file and the fluffiness of its contents, which
will vary in hard-to-predict way if you append attacker-supplied data to
it".

Padding isn't too hard to explain ("we expose 8*ceil(len/8)"), but the
privacy value it provides is dubious: an active attacker can still
detect single-byte variations if they can get you to start close to an
edge of the block size, and 8 bytes may not be enough to thwart the
would-be file-correlator (who's just on the lookout for a file exactly
4834263 bytes long, but knows there aren't any other files close to that
length, so the rounded-up 4834264-byte file is probably the same). For
larger files, even 4096-byte chunks might not be enough. So the benefit
depends upon the block size you pick, versus the distribution of file
sizes, meaning we'd have to pick a block size out of a hat, and
unjustified ad-hoc constants always make me think we're doing something
wrong. (it might end up being a good idea, but it makes me nervous).

You could add random padding (in a convergent fashion, e.g. append
H(file)%8 bytes of zeros, record the original length in the encrypted
data somewhere). But as we've learned from anonymous remailers, random
padding merely lowers the signal-to-noise ratio, and only increases the
cost of statistical correlation by a linear factor. So you'd have to be
clear on what sort of protection you were earning before taking the
complexity hit of random padding.

cheers,
 -Brian