#2018 new enhancement

padding to hide the size of plaintexts — at Version 5

Reported by: zooko Owned by:
Priority: normal Milestone: undecided
Component: code-encoding Version: 1.10.0
Keywords: confidentiality privacy compression newcaps research Cc: nejucomo@…, me+tahoetrac@…
Launchpad Bug:

Description (last modified by daira)

Even though LAFS keeps the contents of files secret from attackers, it currently exposes the length (in bytes) of that content. This can be useful information to an attacker in various ways. For one thing, an attacker might be able to "recognize" specific files or kinds of files from a pattern of file sizes. More subtle dangers may also exist, depending on the circumstances, for example the famous "CRIME" attack on SSL (http://security.stackexchange.com/questions/19911/crime-how-to-beat-the-beast-successor/19914#19914) which depends crucially on the attacker being able to measure the exact size of certain encrypted output. Ticket #925 is about how potentially interesting metadata about the LAFS filesystem itself can be inferred from the byte-level granularity of exposed sizes.

I propose that LAFS automatically add a randomized number of padding bytes to files when encrypting. Concretely, how about something like this. With F as the file size in bytes,

  1. Let the "max padding", X, be 32 * log₂(F), rounded up to the nearest multiple of 32.
  1. Choose a number of padding bytes, P, evenly from [0..X) as determined by the encryption key. Note: this is important that the number is deterministic from the key, so that multiple encryptions of the same-keyed file will not pick different random numbers and allow an attacker to statistically observe the padding's size.
  1. Append P bytes of padding (0 bytes) to the plaintext before encryption. (This does not affect how the key is derived from the plaintext in the case of convergent encryption.)

Change History (5)

comment:2 Changed at 2013-07-08T18:12:56Z by leif

Is there an advantage to random padding instead of just padding up to some fixed interval?

If someone uploads many different files which are all the exact same size, random padding will not stop attackers from inferring what that size is.

comment:3 Changed at 2013-07-08T20:01:05Z by nickm

Here's how to do the analysis.

Look at what information the attacker sees over time, and what the attacker is trying to learn. Consider how fast they can learn what they like as Tahoe stands today. Then consider how fast they can learn that with the proposed padding scheme.

Generally, padding many things to the same size tends to work better than adding random amounts of padding to a lot of things. In the "pad to same size" case, the attacker learns less from seeing the size of a single object.

Don't forget object linkability in your analysis. That is, if certain messages are likelier to be received together than two messages chosen at random, then the attacker can make inferences over time, so you can't just look at single-object probabilities in isolation.

Feel free to share this message wherever it will do good.

comment:4 Changed at 2013-07-12T14:14:18Z by zooko

Comments from Marsh Ray:

  • I like the length hiding idea. Be sure the pad length gets derived from the key via a strongly one-way path.
  • Would file access patterns reveal the amount of padding? Would it ever make sense to distribute padding over the whole file?
  • "you might use 32*ceil(log₂(F)) to hide F a little better"

from https://twitter.com/marshray/status/354028204446584832

comment:5 Changed at 2013-07-16T04:36:03Z by daira

  • Description modified (diff)
Note: See TracTickets for help on using tickets.