[tahoe-dev] Hashes and keys

Shawn Willden shawn-tahoe at willden.org
Mon Feb 9 12:51:07 PST 2009


Reading hashutil.py has pointed out a problem with my scheme for minimizing 
the size of the backup read cap mapping tables.

To recap:  I need a way to link backuplog entries, which contain path and 
metadata information, to content, stored in immutable files.  To do that, I 
decided to create tables that map from the file content hash to the read cap 
(and also to the read cap of a signature file).

To keep the mapping tables small, I want to exploit the fact that read caps 
are sort of "self-keying", since the first section of a CHK is the encryption 
key, which is derived from the file content and encoding parameters.

To make that work, I need to embed the encryption key (or part of it) in the 
backuplog.  I can almost do that, because I have to hash the file content 
anyway to determine if the content has changed.  I say "almost", because I 
don't want the backup program to have to know anything about encoding 
parameters, and those clearly need to be factored into the final storage 
index.

Here's my proposed solution:

I want to calculate the encryption key like this:

	K = H( params || conv ) XOR H(content)

rather than:

	K = H( params || conv || content )

where H is a tagged hash, and || is concatenation.  That way given the content 
hash the encryption key can be calculated inexpensively.  The reason for 
using XOR rather than hashing is because I want to be able to store only, 
say, 12 bytes of H(content) in the backuplog[1] and be able to generate the 
first 12 bytes of K for lookup in the read cap mapping files.

I'm not proposing to modify how Tahoe generates CHK keys.  Instead, I'm 
extending the webapi to allow clients to specify their own keys, generated 
however they like, and I'm going to generate them this way.

See any problems?

Thanks,

	Shawn.

[1] I need to do find some Birthday Problem work I did a few years ago to work 
out just how many bytes I need to ensure a sufficiently-low risk of 
collision.  I'm hoping I can go with even less than 12.


More information about the tahoe-dev mailing list