[tahoe-dev] Interesting hashing result
Shawn Willden
shawn-tahoe at willden.org
Sat Feb 14 12:25:58 PST 2009
Actually not all that surprising, but perhaps interesting enough to be worth
pointing out...
While looking a little at the performance of my file system scanning and
change detection code I noticed that the librsync signature generation is
about 4 times faster than SHAD-256 hashing. Since the signatures are about
1% of the size of the whole file, I get a 4-5x speedup by generating the
signature first and then computing the SHAD-256 hash of that for use as
my "content hash", as compared to hashing the content and separately
generating the signature. That is:
H(SIG(content)) is 5x faster than
H(content); SIG(content)
In most cases the hashing and signature operations are I/O bound anyway, so
this doesn't matter that much in terms of reducing scanning time. The only
reason I noticed it was because my testing was operating repeatedly on the
same set of files which ended up cached in memory.
Still, it seems worthwhile to use the more efficient method just to avoid
spending cycles that could be used elsewhere (or just avoided to reduce power
consumption).
The only possible concern here would be if the librsync signature algorithm
were to somehow fail to detect changes that SHA-256 alone would detect, or if
there were some way the cryptographic weakneesses of MD-4 (the "strong" of
the two checksums used by librsync -- I believe the "weak" is a CRC) could be
exploited.
On the first issue, rsync is very widely used and has proved itself very
reliable, so I'm not concerned about that.
On the second issue, I don't think there would be any security concerns
anyway, given the application here, but certainly any issues that could arise
should be addressed by the application of SHAD-256.
Anyway, thought this might be of interest to someone.
Shawn.
More information about the tahoe-dev
mailing list