[tahoe-dev] Fwd: [hash-forum] bulk data use cases -- SHA-256 is too slow

zooko zooko at zooko.com
Tue Feb 5 11:29:07 PST 2008


Folks:

I just sent this letter to the mailing list where the process of  
inventing SHA-3 is discussed.  My hope is that SHA-3 will be much  
faster than SHA-2, while also more secure, just as AES is much faster  
than 3DES, and state of the art MACs are much faster than HMAC.

Regards,

Zooko

Begin forwarded message:

> From: zooko <zooko at zooko.com>
> Date: February 5, 2008 12:12:39 PM MST
> To: hash-forum at nist.gov
> Subject: bulk data use cases -- SHA-256 is too slow
>
> Folks:
>
> Cryptographic hash functions were invented for hashing small  
> variable-length strings, such as human-readable text documents,  
> public keys, or certificates, into tiny fixed-length strings in  
> order to sign them.  When considering such usage, the inputs to the  
> hash function are short -- often only hundreds or thousands of  
> bytes, rarely as much as a million bytes.  Also, the computational  
> cost of the hash function is likely to be swamped by the  
> computational cost of the public key operation.
>
> Later, hash functions were pressed into service in MACs as  
> exemplified by HMAC.  In that usage, the inputs to the hash  
> function tend to be small -- typically hundreds of bytes in a  
> network packet.  Also, the network is often the limiting factor on  
> performance, in which case the time to compute the MAC is not the  
> performance bottleneck.
>
> I would like to draw your attention to another way that  
> cryptographic hash functions have been pressed into service -- as  
> core security mechanisms in a myriad of bulk data systems.   
> Examples include local filesystems (e.g. ZFS [1]), decentralized  
> filesystems (e.g. a project that I hack on: allmydata.org [2]), p2p  
> file-sharing tools (e.g. BitTorrent [3], Bitzi [4]), decentralized  
> revision control tools (e.g. monotone [5], git [6], mercurial [7],  
> darcs [8]), intrusion detection systems (e.g. Samhain [9]), and  
> software package tools (e.g. Microsoft CLR strong names [10],  
> Python setuptools [11], Debian control files [12], Ubuntu system- 
> integrity-check [13]).
>
> Commonly in this third category of uses the size of the data being  
> hashed can be large -- millions, billions or even trillions of  
> bytes at once -- and there is no public key operation or network  
> delay to hide the cost of the hash function.  The hash function  
> typically sits squarely on the critical path of certain operations,  
> and the speed of the hash function is the limiting factor for the  
> speed of those operations.
>
> Something else common about these applications are that the  
> designers are cryptographically unsophisticated, compared to  
> designers in the earlier two use cases.  It is not uncommon within  
> those communities for the designers to believe that hash collisions  
> are not a problem as long as second pre-image attacks are  
> impossible, or to believe that the natural redundancy and structure  
> of their formats protect them ("only meaningless files can have  
> hash collisions", they say).
>
> A consequence of these conditions is that raw speed of a hash  
> function is very important for adoption in these systems.  If you  
> browse the references I've given above, you'll find that SHA-1,  
> Tiger, and MD5 (!!) are commonly used, and SHA-256 is rare.  In  
> fact, of all the examples listed above, SHA-256 is used only in my  
> own project -- allmydata.org.  It is available in ZFS, but it is  
> never turned on because it is too slow compared to the alternative  
> non-cryptographic checksum.
>
> I should emphasize that this is not just a matter of legacy -- it  
> is not just that these older hash functions have been  
> "grandfathered in".  Legacy is certainly a very important part of  
> it, but newly designed and deployed systems often use SHA-1.  Linus  
> Torvalds chose to use SHA-1 in his newly designed "git"  
> decentralized revision control tool, *after* the original  
> 2005-02-15 Wang et al. attack was announced, and roundly mocked  
> people who suggested that he choose a more secure alternative [7].   
> I recently plead with the developers of the "darcs" revision  
> control tool that they should not use SHA-1 for their new,  
> backwards-incompatible design.  (The issue currently hangs on  
> whether I can find a sufficiently fast implementation of SHA-256 or  
> Tiger with Haskell bindings.)
>
> Because of my exposure to these systems, I was surprised to see a  
> few comments recently on this mailing list that SHA-256 is fast  
> enough.  My surprise abated when I decided that the commentors are  
> coming from a background where the first two use cases -- public  
> key signatures and MACs -- are common, and they may not be aware  
> that SHA-256 is potentially too slow for some other use cases.
>
> Regards,
>
> Zooko O'Whielacronx
>
> [1] http://www.solarisinternals.com/wiki/index.php/ 
> ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums
> [2] http://allmydata.org
> [3] http://en.wikipedia.org/wiki/BitTorrent_%28protocol%29
> [4] http://bitzi.com/developer/bitprint
> [5] http://www.venge.net/mtn-wiki/FutureCryptography
> [6] http://www.gelato.unsw.edu.au/archives/git/0506/5299.html
> [7] http://www.selenic.com/pipermail/mercurial/2005-August/003832.html
> [8] http://www.nabble.com/announcing-darcs-2.0.0pre3- 
> tt15027931.html#a15048993
> [9] http://la-samhna.de/samhain/manual/hash-function.html
> [10] http://blogs.msdn.com/shawnfa/archive/2005/02/28/382027.aspx
> [11] http://peak.telecommunity.com/DevCenter/setuptools
> [12] http://www.debian.org/doc/debian-policy/ch- 
> controlfields.html#s-f-Files
> [13] https://wiki.ubuntu.com/IntegrityCheck



More information about the tahoe-dev mailing list