[tahoe-dev] hashlib-vs-pycryptopp benchmarks

Brian Warner warner at lothar.com
Tue Aug 10 05:59:03 UTC 2010


A few weeks ago we were discussing whether to stick with the
pycryptopp/Crypto++ version of certain hash functions, or if we should
switch to Python's standard-library's hashlib versions. On many (most?)
platforms, hashlib uses libopenssl's functions. Out of curiosity, I ran
some quick benchmarks. On my 2010 MacBookPro laptop, here's what I got:


# % python bench_hashlib.py
# hashlib uses openssl
# hashlib 1 bytes: 100000 loops, best of 3: 2.46 usec per loop
# hashlib 4 bytes: 100000 loops, best of 3: 2.5 usec per loop
# hashlib 16 bytes: 100000 loops, best of 3: 2.47 usec per loop
# hashlib 64 bytes: 100000 loops, best of 3: 2.98 usec per loop
# hashlib 256 bytes: 100000 loops, best of 3: 4.58 usec per loop
# hashlib 1024 bytes: 100000 loops, best of 3: 11 usec per loop
# hashlib 4096 bytes: 10000 loops, best of 3: 36.5 usec per loop
# hashlib 16384 bytes: 10000 loops, best of 3: 139 usec per loop
# hashlib 65536 bytes: 1000 loops, best of 3: 547 usec per loop
# hashlib 262144 bytes: 100 loops, best of 3: 2.18 msec per loop
# hashlib 1048576 bytes: 100 loops, best of 3: 8.72 msec per loop
# pycryptopp 1 bytes: 100000 loops, best of 3: 14 usec per loop
# pycryptopp 4 bytes: 100000 loops, best of 3: 14 usec per loop
# pycryptopp 16 bytes: 100000 loops, best of 3: 14.1 usec per loop
# pycryptopp 64 bytes: 100000 loops, best of 3: 14.7 usec per loop
# pycryptopp 256 bytes: 100000 loops, best of 3: 16.3 usec per loop
# pycryptopp 1024 bytes: 10000 loops, best of 3: 22.4 usec per loop
# pycryptopp 4096 bytes: 10000 loops, best of 3: 46.7 usec per loop
# pycryptopp 16384 bytes: 10000 loops, best of 3: 144 usec per loop
# pycryptopp 65536 bytes: 1000 loops, best of 3: 533 usec per loop
# pycryptopp 262144 bytes: 100 loops, best of 3: 2.09 msec per loop
# pycryptopp 1048576 bytes: 100 loops, best of 3: 8.32 msec per loop

So at 1 MiB, hashlib/openssl gets 120MBps, while pycryptopp/Crypto++
gets 126MBps (about 5% faster). But hashlib/openssl has lower startup
time (2.5us vs 14us).

What does that mean in practical terms? To compute the Merkle hash tree
on a 1GiB file, with 128KiB segments, we have to bulk-hash 8192 large
segments of 128KiB each, and then build a hash tree with 8192 nodes
(each of which uses SHA256d, thus requiring 32768 small hashes). The
large hashes will take about 8.962s for hashlib and 8.732s for
pycryptopp, and the small ones will be about 90ms for hashlib vs 472ms
for pycryptopp. So we can expect the total to be about 9.051s for
hashlib and 9.205 for pycryptopp.

In other words: not a significant difference, at least for large files.

benchmark script is attached.

cheers,
 -Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bench_hashlib.py
Type: text/x-python-script
Size: 2565 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20100809/a88819c3/attachment.bin>


More information about the tahoe-dev mailing list