[tahoe-dev] BackupDB proposal

zooko zooko at zooko.com
Thu May 29 11:35:27 PDT 2008


On May 29, 2008, at 2:45 AM, Ben Laurie wrote:
>
> Rather than messing around with a database, I would store hashes
> alongside each file and check whether the hash has changed. Obviously
> you incur the cost of rehashing the local file each time, but,  
> well, who
> cares?

Secure hashes are cheap when measured in seconds per byte hashed  
(e.g. 81 MiB/sec on Wei Dai's benchmarks [1]), but the technique of  
reading and hashing an entire file when considering whether to back  
the file up is not cheap when measured in time to do a backup run, or  
when measured in CPU and disk load imposed on a user's machine (which  
they might want to use for other stuff at the same time.)

Reading all of the contents of a file from disk is not cheap in the  
first place, and then doing a secure hash on those contents is not  
cheap.

On my Macbook Pro (which has a faster version of the same CPU model  
that Wei Dai's benchmarks used) I have home movie that I would like  
to backup, which is 12.7 GiB in size.  Just reading it from disk  
takes between 7.5 and 8 minutes.  If I read it in with "cat" it takes  
about 4% of my CPU -- if I use a tiny Python script to do the same  
thing it takes 14%.  Either way, it draws down my laptop's battery  
level.

If I do a sha256sum of the file, then it takes 10 minutes and uses  
about 45% of the CPU during those ten minutes, if I'm interpreting  
these numbers correctly:

real    9m58.678s
user    3m48.451s
sys     0m44.169s

(Perhaps Ben's intuitions about the cost of secure hashes are  
informed by his extensive experience in secure networking -- OpenSSL  
and Apache and so on.  It seems to me that most crypto algorithms  
have been developed and measured in the context of network apps,  
where the network is the bottleneck, and 81 MiB/sec is better than  
the Internet can normally deliver anyway, but this performance  
analysis does not apply directly to storage applications.  See also a  
post I made to the NIST SHA-3 mailing list: [2].)

Regards,

Zooko

[1] http://cryptopp.com/benchmarks-amd64.html
[2] https://zooko.com/sha256_is_too_slow.html


More information about the tahoe-dev mailing list