[tahoe-dev] Interesting hashing result

Shawn Willden shawn-tahoe at willden.org
Sat Feb 14 12:25:58 PST 2009


Actually not all that surprising, but perhaps interesting enough to be worth 
pointing out...

While looking a little at the performance of my file system scanning and 
change detection code I noticed that the librsync signature generation is 
about 4 times faster than SHAD-256 hashing.  Since the signatures are about 
1% of the size of the whole file, I get a 4-5x speedup by generating the 
signature first and then computing the SHAD-256 hash of that for use as 
my "content hash", as compared to hashing the content and separately 
generating the signature.  That is:

	H(SIG(content)) is 5x faster than

	H(content); SIG(content)

In most cases the hashing and signature operations are I/O bound anyway, so 
this doesn't matter that much in terms of reducing scanning time.  The only 
reason I noticed it was because my testing was operating repeatedly on the 
same set of files which ended up cached in memory.

Still, it seems worthwhile to use the more efficient method just to avoid 
spending cycles that could be used elsewhere (or just avoided to reduce power 
consumption).

The only possible concern here would be if the librsync signature algorithm 
were to somehow fail to detect changes that SHA-256 alone would detect, or if 
there were some way the cryptographic weakneesses of MD-4 (the "strong" of 
the two checksums used by librsync -- I believe the "weak" is a CRC) could be 
exploited.

On the first issue, rsync is very widely used and has proved itself very 
reliable, so I'm not concerned about that.

On the second issue, I don't think there would be any security concerns 
anyway, given the application here, but certainly any issues that could arise 
should be addressed by the application of SHAD-256.

Anyway, thought this might be of interest to someone.

	Shawn.


More information about the tahoe-dev mailing list