[tahoe-dev] collision-resistance vs. second-pre-image resistance (was: Tahoe performance)

Wed Feb 25 08:44:31 PST 2009

Dear Jack Lloyd:

I enjoyed your example of how collisions in rsync's hashes could be  
exploited (quoted below).  Devious!

I've been trying to articulate a general principle about this, and  
your example has helped.  The general principle is that if you have  
an immutable mapping from identifier to static object (i.e., to a  
specific sequence of bytes), then users will assume that the  
important property is that the object you get is always *that one  
object*.  They will assume this property even if you as the  
cryptographer specify the weaker property that the object you get  
will always be an object which was chosen by the person who produced  
the identifier.

The practical significance of this distinction is that you have to  
double the size of the identifier.  For secure-hash-based  
identifiers, an identifier of size K bits is sufficient to give a K- 
bit-security level against second-pre-image attacks (which provides  
the 'was intended by the creator of the identifier' property), but an  
identifier size of 2K bits is required to give K-bit-security against  
collision attacks (which provides the 'that one object' property).

The current state of the art of security engineering often errs by  
specifying and enforcing second-pre-image resistance and not thinking  
that someone might rely on it for collision-resistance.  I think Jack  
Lloyd's devious exploit of the rsync hashes is another example of  
this pattern.  The recent widely publicized Rogue CA which exploits  
collisions in MD5 is a beautiful example of this.

Ferguson and Schneier have argued, both in their book "Practical  
Cryptography" and in recent SHA-3 discussions, that this pattern is  
so common that the wise thing to do is simply to *always* use 2K-bit  
identifiers and try to resist collisions -- not just second-pre-images.

However, I am just now reminded of Tyler Close's argument that users  
will *not* assume that a mutable file cap -- an identifier of a  
mutable file -- will stably point to the same sequence of bytes over  
time.  So we might be able to get away with having only K-bit  
identifiers for those.

It would have to be carefully engineered to make sure that a sneaky  
person couldn't screw up the system by generating multiple public  
keys whose hashes collided, even though that same person knows the  
signing key and has the option of just using it to sign new file  
contents, as normal.

Thanks to Tyler Close for arguing that K-bit secure hashes were good  
enough for the case that the creator has complete control over the  
identified object anyway.  (At Zeitgeist Bar a few years ago after a  
CodeCon.)

Regards,

Zooko

On Feb 20, 2009, at 19:03 PM, Jack Lloyd wrote:

> On Thu, Feb 19, 2009 at 02:19:42PM -0800, Brian Warner wrote:
>
>> the same weak checksums), and put A1 on the destination machine  
>> and A2
>> on the source, then 'rsync ./A2 remote:A1' would fail to properly
>> replace A1 with A2. But they're both your own files anyways.
>
> Not necessarily: I can think of many scenarios where it might be
> advantageous to ensure that rsync (or more generally, the system
> backup) would skip backing up a new version of a file. Such situations
> can occur anytime the creator of the file has interests that do not
> align with the person performing the backup operation.
>
> For instance, say I wish to hide a logic bomb on a corporate server,
> that six months after I left the company would activate and perform
> some unfriendly action. However I don't want to leave evidence behind,
> and the machine in question is backed up nightly to an incorruptible
> offsite system. If I simply uploaded the malicious executable, after
> it triggered one could go back and find said malicious code in the
> backups, providing forensic evidence that could be used against me. So
> instead I could prepare two files, the logic bomb executable plus an
> innocouous file which rsync would consider to be identical. Since most
> binary formats (eg ELF binary files, Office documents) are quite
> flexible, I could perhaps even create my dummy file to be something
> meaningful. Then I place the innocouous version where it will be
> backed up, after which I replace it with the malicious version. Then
> numerous backup operations would occur, and (assuming the malicious
> version deleted or replaced itself after triggering) leave no evidence
> behind, without having to corrupt the backup server or process itself.
>
> -Jack