[tahoe-dev] Fwd: [hash-forum] bulk data use cases -- SHA-256 is too slow
zooko
zooko at zooko.com
Tue Feb 5 11:29:07 PST 2008
Folks:
I just sent this letter to the mailing list where the process of
inventing SHA-3 is discussed. My hope is that SHA-3 will be much
faster than SHA-2, while also more secure, just as AES is much faster
than 3DES, and state of the art MACs are much faster than HMAC.
Regards,
Zooko
Begin forwarded message:
> From: zooko <zooko at zooko.com>
> Date: February 5, 2008 12:12:39 PM MST
> To: hash-forum at nist.gov
> Subject: bulk data use cases -- SHA-256 is too slow
>
> Folks:
>
> Cryptographic hash functions were invented for hashing small
> variable-length strings, such as human-readable text documents,
> public keys, or certificates, into tiny fixed-length strings in
> order to sign them. When considering such usage, the inputs to the
> hash function are short -- often only hundreds or thousands of
> bytes, rarely as much as a million bytes. Also, the computational
> cost of the hash function is likely to be swamped by the
> computational cost of the public key operation.
>
> Later, hash functions were pressed into service in MACs as
> exemplified by HMAC. In that usage, the inputs to the hash
> function tend to be small -- typically hundreds of bytes in a
> network packet. Also, the network is often the limiting factor on
> performance, in which case the time to compute the MAC is not the
> performance bottleneck.
>
> I would like to draw your attention to another way that
> cryptographic hash functions have been pressed into service -- as
> core security mechanisms in a myriad of bulk data systems.
> Examples include local filesystems (e.g. ZFS [1]), decentralized
> filesystems (e.g. a project that I hack on: allmydata.org [2]), p2p
> file-sharing tools (e.g. BitTorrent [3], Bitzi [4]), decentralized
> revision control tools (e.g. monotone [5], git [6], mercurial [7],
> darcs [8]), intrusion detection systems (e.g. Samhain [9]), and
> software package tools (e.g. Microsoft CLR strong names [10],
> Python setuptools [11], Debian control files [12], Ubuntu system-
> integrity-check [13]).
>
> Commonly in this third category of uses the size of the data being
> hashed can be large -- millions, billions or even trillions of
> bytes at once -- and there is no public key operation or network
> delay to hide the cost of the hash function. The hash function
> typically sits squarely on the critical path of certain operations,
> and the speed of the hash function is the limiting factor for the
> speed of those operations.
>
> Something else common about these applications are that the
> designers are cryptographically unsophisticated, compared to
> designers in the earlier two use cases. It is not uncommon within
> those communities for the designers to believe that hash collisions
> are not a problem as long as second pre-image attacks are
> impossible, or to believe that the natural redundancy and structure
> of their formats protect them ("only meaningless files can have
> hash collisions", they say).
>
> A consequence of these conditions is that raw speed of a hash
> function is very important for adoption in these systems. If you
> browse the references I've given above, you'll find that SHA-1,
> Tiger, and MD5 (!!) are commonly used, and SHA-256 is rare. In
> fact, of all the examples listed above, SHA-256 is used only in my
> own project -- allmydata.org. It is available in ZFS, but it is
> never turned on because it is too slow compared to the alternative
> non-cryptographic checksum.
>
> I should emphasize that this is not just a matter of legacy -- it
> is not just that these older hash functions have been
> "grandfathered in". Legacy is certainly a very important part of
> it, but newly designed and deployed systems often use SHA-1. Linus
> Torvalds chose to use SHA-1 in his newly designed "git"
> decentralized revision control tool, *after* the original
> 2005-02-15 Wang et al. attack was announced, and roundly mocked
> people who suggested that he choose a more secure alternative [7].
> I recently plead with the developers of the "darcs" revision
> control tool that they should not use SHA-1 for their new,
> backwards-incompatible design. (The issue currently hangs on
> whether I can find a sufficiently fast implementation of SHA-256 or
> Tiger with Haskell bindings.)
>
> Because of my exposure to these systems, I was surprised to see a
> few comments recently on this mailing list that SHA-256 is fast
> enough. My surprise abated when I decided that the commentors are
> coming from a background where the first two use cases -- public
> key signatures and MACs -- are common, and they may not be aware
> that SHA-256 is potentially too slow for some other use cases.
>
> Regards,
>
> Zooko O'Whielacronx
>
> [1] http://www.solarisinternals.com/wiki/index.php/
> ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums
> [2] http://allmydata.org
> [3] http://en.wikipedia.org/wiki/BitTorrent_%28protocol%29
> [4] http://bitzi.com/developer/bitprint
> [5] http://www.venge.net/mtn-wiki/FutureCryptography
> [6] http://www.gelato.unsw.edu.au/archives/git/0506/5299.html
> [7] http://www.selenic.com/pipermail/mercurial/2005-August/003832.html
> [8] http://www.nabble.com/announcing-darcs-2.0.0pre3-
> tt15027931.html#a15048993
> [9] http://la-samhna.de/samhain/manual/hash-function.html
> [10] http://blogs.msdn.com/shawnfa/archive/2005/02/28/382027.aspx
> [11] http://peak.telecommunity.com/DevCenter/setuptools
> [12] http://www.debian.org/doc/debian-policy/ch-
> controlfields.html#s-f-Files
> [13] https://wiki.ubuntu.com/IntegrityCheck
More information about the tahoe-dev
mailing list