[tahoe-dev] BackupDB proposal
zooko
zooko at zooko.com
Thu May 29 11:35:27 PDT 2008
On May 29, 2008, at 2:45 AM, Ben Laurie wrote:
>
> Rather than messing around with a database, I would store hashes
> alongside each file and check whether the hash has changed. Obviously
> you incur the cost of rehashing the local file each time, but,
> well, who
> cares?
Secure hashes are cheap when measured in seconds per byte hashed
(e.g. 81 MiB/sec on Wei Dai's benchmarks [1]), but the technique of
reading and hashing an entire file when considering whether to back
the file up is not cheap when measured in time to do a backup run, or
when measured in CPU and disk load imposed on a user's machine (which
they might want to use for other stuff at the same time.)
Reading all of the contents of a file from disk is not cheap in the
first place, and then doing a secure hash on those contents is not
cheap.
On my Macbook Pro (which has a faster version of the same CPU model
that Wei Dai's benchmarks used) I have home movie that I would like
to backup, which is 12.7 GiB in size. Just reading it from disk
takes between 7.5 and 8 minutes. If I read it in with "cat" it takes
about 4% of my CPU -- if I use a tiny Python script to do the same
thing it takes 14%. Either way, it draws down my laptop's battery
level.
If I do a sha256sum of the file, then it takes 10 minutes and uses
about 45% of the CPU during those ten minutes, if I'm interpreting
these numbers correctly:
real 9m58.678s
user 3m48.451s
sys 0m44.169s
(Perhaps Ben's intuitions about the cost of secure hashes are
informed by his extensive experience in secure networking -- OpenSSL
and Apache and so on. It seems to me that most crypto algorithms
have been developed and measured in the context of network apps,
where the network is the bottleneck, and 81 MiB/sec is better than
the Internet can normally deliver anyway, but this performance
analysis does not apply directly to storage applications. See also a
post I made to the NIST SHA-3 mailing list: [2].)
Regards,
Zooko
[1] http://cryptopp.com/benchmarks-amd64.html
[2] https://zooko.com/sha256_is_too_slow.html
More information about the tahoe-dev
mailing list