[tahoe-dev] Hashes and keys
Shawn Willden
shawn-tahoe at willden.org
Mon Feb 9 12:51:07 PST 2009
Reading hashutil.py has pointed out a problem with my scheme for minimizing
the size of the backup read cap mapping tables.
To recap: I need a way to link backuplog entries, which contain path and
metadata information, to content, stored in immutable files. To do that, I
decided to create tables that map from the file content hash to the read cap
(and also to the read cap of a signature file).
To keep the mapping tables small, I want to exploit the fact that read caps
are sort of "self-keying", since the first section of a CHK is the encryption
key, which is derived from the file content and encoding parameters.
To make that work, I need to embed the encryption key (or part of it) in the
backuplog. I can almost do that, because I have to hash the file content
anyway to determine if the content has changed. I say "almost", because I
don't want the backup program to have to know anything about encoding
parameters, and those clearly need to be factored into the final storage
index.
Here's my proposed solution:
I want to calculate the encryption key like this:
K = H( params || conv ) XOR H(content)
rather than:
K = H( params || conv || content )
where H is a tagged hash, and || is concatenation. That way given the content
hash the encryption key can be calculated inexpensively. The reason for
using XOR rather than hashing is because I want to be able to store only,
say, 12 bytes of H(content) in the backuplog[1] and be able to generate the
first 12 bytes of K for lookup in the read cap mapping files.
I'm not proposing to modify how Tahoe generates CHK keys. Instead, I'm
extending the webapi to allow clients to specify their own keys, generated
however they like, and I'm going to generate them this way.
See any problems?
Thanks,
Shawn.
[1] I need to do find some Birthday Problem work I did a few years ago to work
out just how many bytes I need to ensure a sufficiently-low risk of
collision. I'm hoping I can go with even less than 12.
More information about the tahoe-dev
mailing list