[tahoe-dev] Experimenting with SHA-3 candidates in Tahoe-LAFS
Zooko O'Whielacronx
zookog at gmail.com
Thu Apr 15 22:04:13 PDT 2010
Dear Prof. Danilo Gligoroski:
Thank you very much for the offer of helping evaluate the performance
of SHA-3 candidates in the context of Tahoe-LAFS. In this letter I
sketch out how we evaluate Tahoe-LAFS performance and what we care
about for the future.
There are two implementations of Tahoe-LAFS to consider: Tahoe-LAFS v2
as it will be implemented in 2011 and Tahoe-LAFS v2 as it will be
implemented in 2021.
The first implementation to consider is Tahoe-LAFS v2 as it will be
implemented in the next year or so. This implementation will very
likely continue to be written primarily in Python, except for the
crypto and erasure-coding libraries which will probably continue to be
written in C or C++. The deployment platforms will probably be
desktops, laptops, servers, netbooks, smartbooks, handhelds, and
embedded devices like the SheevaPlug [1] or Beagleboard [2]. The CPU
architectures it is used on will probably be mostly amd64, ARM, x86,
and perhaps a few deployments on some other architectures such as
UltraSparc. The deployments might sometimes be able to take advantage
of hardware acceleration which is available in the UltraSparc T2 [3],
the next generation of Intel chips [4], the Via Padlock [5], or add-in
cards like (usually acceleration of AES and SHA-256).
Based on a few ad hoc measurements we've made so far, I expect that
the performance burden of interpreting Python, waiting for network
round trips, and waiting for disk seeks will usually be larger than
the cost of hash function computation. There may currently be some
operations which are significantly impacted by the efficiency of
computation of the hash function—in particular an immutable file
upload does a pass over the file to compute the secure hash of the
file before it begins encrypting and uploading the file. That part may
be somewhat limited by the efficiency of the hash function—I'm not
sure. There may be other parts of Tahoe-LAFS behavior that are
affected by the efficiency of the hash function—I'm not sure.
Measuring and documenting the concrete performance of the current
version of Tahoe-LAFS in various deployment scenarios with various
hash function implementations would help guide our design process.
We will be improving the known limitations over time. The Python
interpreter will become more efficient due to projects like PyPy [6].
New download code (in Tahoe-LAFS v1.7 in May of this year) will reduce
the time spent waiting for network traffic, and so on. Also, different
deployment scenarios may become common. The typical deployment
currently is a powerful amd64 laptop, desktop, or server serving
either just one user and relying on a high-latency spinning disk, but
people could start using Tahoe-LAFS in a situation where a small ARM
chip (such as in the PogoPlug) serves a large number of users
simultaneously and has very fast and low-latency network and SSD
storage. This would make the overall performance of the system less
dependent on network and disk latency and more dependent on the crypto
and Python-interpretation tasks that the CPU is doing. François
Deppierraz recently ran a simple measurement on his embedded ARM
device and reported that hash function and encryption are potentially
significant performance issues on that system [7].
Now, what about "Tahoe-LAFS v2 as it will be implemented in 2021"?
Well, Tahoe-LAFS is useful for long-term storage and has a strong
commitment to backwards compatibility. People who store data,
potentially petabytes of data or even more, using the version of
Tahoe-LAFS v2 available in 2011 will then have to use the same crypto
algorithms and data structures when they test the integrity of their
files and read or update the contents of their files in 2021. The
implementation of Tahoe-LAFS v2 that they use in 2021 may be
substantially different. It might be written entirely in a compiled
language like C or C++, or it might be written entirely in a
high-level language such as Python or Javascript. It might rely on new
Just-in-Time compilation techniques that reduce the cost of
interpreting Python or Javascript. The network, storage, memory, and
CPU architectures may be different. The deployment scenarios may be
different too—perhaps people will want to rely on tiny computers
embedded into their eyeglasses or something to do the necessary crypto
computations to assure them of the integrity of the "augmented
reality" images that they are looking at.
Therefore, in designing Tahoe-LAFS v2, I don't just say "It doesn't
matter how fast the crypto algorithms are because the Python
interpreter is slow.". I pay attention to factors which could affect
future implementations such as how compactly and memory-efficiently an
algorithm can be implemented in ASIC (for those eyeglasses that I
want), how parallelizable our crypto algorithms and data structures
are (for those thousand-core devices that people will be carrying
around in 2021), and so on. On the other hand, we don't focus
exclusively on such far-future performance issues—we also measure and
improve current performance, choosing the "low hanging fruit" and then
iterating. (At the moment, I think the low-hanging fruit is: 1.
optimize server-selection, pipelining, and fail-over of download, and
2. replace RSA with ECDSA.)
I would love your help! I think the most important thing we need next
is a reproducible, automated benchmark which shows how the performance
of operations that we care about is affected by the performance of the
hash function. We can then run this automated benchmarks on our
buildbot farm [8] and get an idea about how the performance of the
current implementation varies across platforms.
Also note that we are about to initiate the "100 Year Cryptography"
project which will include combining two secure hashes using the
Comb4P combiner from [9]. We will want to measure that resulting
combined hash function the same way as we measure individual hash
functions.
I have more to say about these topics, including how to analyze
François's measurements on his ARM box, and how we have to do a
tree-hashing mode of operation and would need help measuring it. But,
now I must go to sleep. :-)
Regards,
Zooko
[1] http://en.wikipedia.org/wiki/SheevaPlug
[2] http://beagleboard.org/
[3] http://en.wikipedia.org/wiki/UltraSPARC_T2
[4] http://en.wikipedia.org/wiki/AES_instruction_set
[5] http://www.via.com.tw/en/initiatives/padlock/hardware.jsp
[6] http://speed.pypy.org/
[7] http://allmydata.org/pipermail/tahoe-dev/2010-March/004150.html
[8] http://allmydata.org/buildbot
[9] http://tuprints.ulb.tu-darmstadt.de/2094/1/thesis.lehmann.pdf
More information about the tahoe-dev
mailing list