[tahoe-dev] Interesting hashing result

zooko zooko at zooko.com
Sun Feb 15 08:04:33 PST 2009


Hi Shawn:

Thanks for the information.  Could you give me a couple more details  
-- what sizes of files are in your test set, how long does it take  
with cold cache, and how long with warm cache?

Also, what is this signature that you are generating?  Is it  
something that is generated and used just by your backup tool?


Tahoe uses a strong hash function to generate the immutable file caps  
from the ciphertext so that we get this property:

    The "At Most One File Per Immutable File Cap" Property:

    There can exist (in this universe, in the forseeable future) at  
most one file matching any given immutable file capability.

In cryptographic terms, this is called "collision-resistance".

This property might not be needed for some applications, but it is  
very tricky for application writers to know when they can safely rely  
on a weaker property, such as the property that cryptgraphers call  
"second-pre-image-resistance".

(You, Shawn, might already be familiar with all this, but I'm  
spelling it out for the benefit of other readers also.)


Unfortunately the SHA-256d secure hash function imposes a significant  
CPU overhead.  Currently I think that all actual uses of Tahoe are I/ 
O-bound anyway, and have ridiculously overpowered CPUs anyway, so I  
don't think this CPU-overhead is causing an actual performance  
problem for anyone in practice, but hopefully Tahoe will move into  
more and more use cases, and as it does this might become a problem.

In the future, we could switch to a faster hash function which is  
still secure.  I've been eyeing the Tiger hash function for a long  
time -- it takes about 1/3 as many CPU cycles to hash things as  
SHA-256 does and its output size (192-bits) is more fitting to the  
rest of our system than SHA-256's 256-bit output size.  However,  
there is a good chance that Tiger could be proven to lack collision- 
resistance in the forseeable future, and I don't think taking that  
risk is currently worth saving those CPU cycles.

In the year 2012 (hey, we're living in the future!), the new SHA-3  
hash function will be chosen.  That function will also, I hope,  
require about 1/3 as many CPU cycles as SHA-256 does while being a  
safer long-term bet.

Regards,

Zooko


More information about the tahoe-dev mailing list