[tahoe-dev] overview of allmydata.org "Tahoe" security properties (confidentiality, integrity, availability)

zooko zooko at zooko.com
Mon Jan 21 13:50:37 PST 2008


Dear tahoe-dev, Norm Hardy, and anonymous security experts:

I've just updated the Tahoe docs to have a "top-level overview" which  
attempts to answer Norm Hardy's "initial questions" of November 2007  
[1].

This overview document is also, I hope, the best starting point for  
those security experts out there who are interested in attacking  
Tahoe's security design in return for fame and gratitude and handsome  
items of swag that say "I'm [INSERT NAME HERE] and I broke the Tahoe  
secure filesystem design!  Thank you from [HEART] allmydata.org".   
I've Bcc:'ed two such security experts on this message.

The overview document explains what security properties we think we  
have achieved, and leaves it up to architecture.txt [3] to explain  
how we think that we have achieved them.

The overview is now on-line [2], and it is probably best viewed in  
its original HTML format, but appended is a text version of it for  
the record.

Regards,

Zooko

[1] http://allmydata.org/pipermail/tahoe-dev/2007-November/000222.html
[2] http://allmydata.org/source/tahoe/trunk/docs/about.html
[3] http://allmydata.org/trac/tahoe/browser/docs/architecture.txt

------- begin appended file about.html, mashed into text
Overview

A "storage grid" comprises a number of storage servers. A storage  
server has local attached storage (typically one or more SATA hard  
disks). A "gateway" uses the storage servers and provides to its  
clients a filesystem over a standard protocol such as HTTP(S), FUSE,  
or SMB.

Users do not rely on storage servers to provide confidentiality nor  
integrity for the data -- instead all of the data is encrypted and  
integrity checked by the gateway, so that the servers are not able to  
learn anything about the data nor to alter it.

Users do rely on the storage servers for availability -- the  
ciphertext is erasure-coded and distributed across N different  
storage servers (the default value for N is 12) so that it can be  
recovered from any K of these servers (the default value of K is 3).  
Therefore only the simulaneous failure of N-K+1 (with the defaults,  
10) servers can make the data unavailable. Phrasing this in terms of  
reliance, we say that the users rely on the gateway for the  
confidentiality and integrity of the data, and on any 3 of the 12  
servers for the availability of the data.

The typical deployment mode is that each user runs her own gateway on  
her own machine. This way she needs to rely only on her own machine  
for the confidentiality and integrity of the data, and she can take  
advantage of tighter filesystem interfaces such as FUSE and SMB.

An alternate deployment mode is that the gateway runs on a remote  
machine and the user connects to it over HTTPS. This means that the  
operator of the gateway can view and modify the user's data (the user  
relies on the gateway for confidentiality and integrity), but it  
means that the user can access the filesystem with a client that  
doesn't have the gateway software installed, such as an Internet  
kiosk or cell phone.

A user who has read-write access to a file or directory can give  
another user read-write access to that file or directory, or can give  
another user read-only access to that file or directory. A user who  
has read-only access to a file or directory can give another user  
read-only access to it.

When linking a file or directory into a parent directory, you can use  
a read-write link or a read-only link. If you use a read-write link,  
then anyone who has read-write access to the parent directory can  
gain read-write access to the child, but anyone who has read-only  
access to the parent directory can gain only read-only access to the  
child. If you use a read-only link, then anyone who has either read- 
write or read-only access to the parent directory can gain read-only  
access to the child.

There are two kinds of files: immutable and mutable. Immutable files  
have the property that once they have been uploaded to the storage  
grid they can't be modified. Mutable ones can be modified.

For much more technical detail, please see The Doc Page on the Wiki,  
and the other files in the docs directory of the source tree.


More information about the tahoe-dev mailing list