[tahoe-dev] Tahoe-LAFS key management, part 2: Tahoe-LAFS is like encrypted git
Zooko Wilcox-O'Hearn
zooko at zooko.com
Wed Aug 19 08:28:45 PDT 2009
Okay, in today's installment I'll reply to my friend Kris Nuttycombe,
who read yesterday's installment and then asked how the storage
service provider could provide access to the files without being able
to see their filehandles and thus decrypt them.
I replied that the handle could be stored in another file on the
server, and therefore encrypted so that the server couldn't see it.
You could imagine taking a bunch of these handles -- capabilities to
read an immutable file -- and putting them into a new file and
uploading to the Tahoe-LAFS grid. Uploading it would encrypt it and
give you a capability to that new file. The storage service provider
wouldn't be able to read the contents of that file, so it wouldn't be
able to read the files that it references. This forms a
"Cryptographic Hash Function Directed Acyclic Graph" structure, which
should be familiar to many readers as the underlying structure in git
[*]. Git uses this same technique of combining identification and
integrity-checking into one handle.
From this perspective, Tahoe-LAFS can be seen as "like git, and use
the handle for encryption in addition to integrity-checking and
identification".
(There are many other differences. For starters git has a high-
performance compression scheme and it has a decentralized revision
control tool built on top. Tahoe-LAFS has erasure-coding and a
distributed key-value store for a backend.)
Okay, the bus is arriving at work.
Oh, so then Kris asked "But what about the root of that tree?". The
answer is that the capability to the root of that tree is not stored
on the servers. It is held by the client, and never transmitted to
the storage servers. It turns out that storage servers don't need
the capability to the file in order to serve up the ciphertext.
(Technically, they *do* need an identifier, and ideally they would
also have the integrity-checking part so that they could perform
integrity checks on the file contents (in addition to clients
performing that integrity check for themselves). So the capability
gets split into its component parts during the download protocol,
when the client sends the identification and integrity-checking bits
to the server but not the decryption key, and receives the ciphertext
in reply.)
Therefore the next layer up, whether another program or a human user,
needs to manage this single capability to the root of a tree. Here
the abstraction-piercing problem of availability versus
confidentiality remains in force, and different programs and
different human users have different ways to manage their caps. I
personally keep mine in my bookmarks in my web browser. This is
risky -- they could be stolen by malicious Javascript (probably) or I
might accidentally leak them in an HTTP Referer header. But it is
very convenient. For the files in question I value that convenience
more than an extra degree of safety. I know of other people who keep
their Tahoe-LAFS caps more securely, on Unix filesystems, on
encrypted USB keys, etc..
Regards,
Zooko
[*] Linus Torvalds got the idea of a Cryptographic Hash Function
Directed Acyclic Graph structure from an earlier distributed revision
control tool named Monotone. He didn't go out of his way to give
credit to Monotone, and many people mistakenly think that he invented
the idea.
More information about the tahoe-dev
mailing list