[tahoe-dev] question about sharing...

Zooko O'Whielacronx zooko at zooko.com
Tue Jun 28 19:56:20 PDT 2011


Dear toby cabot:

Thank you for writing this. I wonder if it would be good as a magazine
article, or at least a blog entry on someone's blog, to get it in
front of people who already understand the basics of
files-and-directories and account-names-and-passwords but haven't
really been exposed to capabilities.

I'm going to be very picky in the following comments, because the
whole topic of security in distributed open systems like the Internet
is thick with subtle misunderstandings and myths, so we have to be
extremely precise to help people understand things more clearly.

Hopefully in addition to my comments below other security experts who
read this list will chime in. (Even if only to say that they read it
through and didn't see any problems with it.)

Just so you know, though, I really like this write-up! I like your
conversational but simple writing style, and I am glad to see this
much explanation aimed at the "newcomer" who doesn't already
understand capability access control theory or cryptography. Some
documentation for that audience is much needed. :-)


On Mon, Jun 20, 2011 at 4:39 PM, toby cabot <toby at caboteria.org> wrote:
>
> As you saw on that page, Tahoe-LAFS
> provides a guarantee that you can store your data on servers that you
> don't trust, and the administrators of those servers won't be able to
> read your data.

Let us avoid the word "trust". It is a very common word in discussions
of security, and it sows much confusion because it combines at least
three different concepts into one word:

1. Whether you *rely* on a person (or a bot operating under that
person's orders) to do something.

2. Whether you think the person or bot is *likely* to do it.

3. Whether you think the person is good or evil.

Whenever I find myself using the word "trust" I try to reflect and
figure out which of these is closest to what I really mean. It is
almost always the first one.

For example, I might be tempted to write something like "I trust the
operators of Tahoe-LAFS storage servers that I use.".  But this would
be a very confusing thing to say! I have pretty good confidence that
all of the operators of storage servers that I currently use
(volunteergrid1 -- hi folks!) are good and not evil. I have somewhat
lower confidence that their storage servers are going to operate
correctly at all times and under all circumstances. But most
importantly, I do not *need* them to refrain from peeking at my data
-- I do not *rely* on them for that -- since Tahoe-LAFS makes it
impossible for them to peek at the data without my authorization. On
the other hand I do rely on them (at least, any 3 out of 10 of them)
to keep their server running and reachable over the Net.

Rewriting the text above to use the words "rely" or "need" or
"require" instead of the word "trust" is left as an exercise for the
writer. I am curious how it will turn out.

By the way, I learned this notion of "reliance analysis" from a paper
named "Paradigm Regained" by Mark S. Miller. I strongly recommend it:

http://www.erights.org/talks/asian03/


>   It does this by encrypting the data before it stores
> it on those servers, so that all they see is random-looking bits and
> they can't recover the actual content of your files.  Tahoe-LAFS also
> guards against the failure of the storage servers by storing the same
> data on more than one of them.  Of course, this will use more disk
> storage than simply storing the file once, but you can decide how
> you'd like to trade off extra storage for fault-tolerance.

A typical reader would probably assume, after reading the above that
what Tahoe-LAFS offers is replication, in which you spend X times as
much storage space so that you can tolerate the loss of X-1 servers
out of X servers. E.g. you could have 3 servers, using 3 times as much
storage space so that you can tolerate the loss of any 2 of the 3. The
fact that erasure coding offers more tradeoffs should be communicated.
E.g. 3-of-10 coding uses 3.3 times as much storage space so that you
can tolerate the loss of 7 servers out of 10.


> Capabilities (vs. Access Control)

Nit-pick: capabilities are a way to control access. What you meant
above is "vs. Access Control Lists". Perhaps something like "vs.
account-names-and-passwords" or "vs. traditional filesystem
permissions" would work too and be less jargony.


> things they can figure out how to do, but are not permitted to do.  In
> other words, I can discover a directory's existence, and learn its

   ^-- replace "in other words" with "for example", since this is only
one way to get ACCESS DENIED in a traditional access control system
:-)


> Tahoe-LAFS does away with the complexity inherent in the ACL approach
> and uses a much simpler approach, called "capabilities".  Access to
> each file (and directory) in Tahoe-LAFS is allowed by a "capability"
> which is a string of characters that looks something like
> =URI:CHK:riplmjitnwh25ur3jomzyxrww4:et4gkxykswl7lstw5q4g5suf6y2xyyphvid5nn2r3ktvhytbs5da:3:10:3472=.

Right here is a good opportunity to lay out the critical fact about
capability access control: that there is a single thing which serves
as both identifier and authorizer. This key fact about capabilities is
summarized in the aphorism: "Unify authority and designation!".

So perhaps you could write:

Each file and directory in Tahoe-LAFS is identified by a "capability"
which is a string of characters that looks something like
URI:CHK:riplmjitnwh25ur3jomzyxrww4:et4gkxykswl7lstw5q4g5suf6y2xyyphvid5nn2r3ktvhytbs5da:3:10:3472
. This capability serves as both the identifier of the file or
directory and as the authorization code necessary to get access to it.


> Each capability contains the
> two things that you need to access the file: how to find the encrypted
> bits (the "storage index"), and how to decrypt them (the "encryption
> key").

Perhaps this fact isn't needed in this description? Up to you.


> It's important to understand that a capability specifies the location
> of a file, but it's different than a traditional file system "path".
> Tahoe-LAFS has no well-known "root" so there's no way to poke around
> and try to discover things inside it.  Each directory and file can be
> found only by its capability and can't be discovered in any other way.
> (How many bits in a capability, i.e. how hard would it be to guess?)

As hard as to guess a random 256-bit secret key, so basically
impossible (unless we screwed up).

> A directory capability acts like a traditional file system directory
> in that users can browse down from it to see files in it and in the
> tree below it, but they can't browse "up" to see other directories
> within the same Tahoe-LAFS file system.  It's as if each directory in
> Tahoe-LAFS is a root directory.  Users cannot discover things that
> they're not supposed to know, so the in-line ACL checks implemented by
> traditional file systems are unnecessary.

The above two paragraphs are good, but I'm afraid that they
accidentally reinforce one of the most common misunderstandings about
Tahoe-LAFS: the idea that you can't navigate through files and
directories using normal paths and filenames. In fact, you can! A
directory has a set of children where each child is named by a
human-readable string of your choice. You can navigate from one
directory to its child and then to its child's child by names
separated by slashes such as "cd childname1/childname2". This is
exactly like directories of your local filesystem.


> If you're curious about the capability model, it's worth taking some
> time to learn more about it:
> http://en.wikipedia.org/wiki/Capability-based_security

Ugh, I'm not impressed with that page.  I wonder if our readers would
be more confused rather than less if we refer them to that page. But
is there a better page to which we can refer people? :-(


Okay, otherwise, I think your write-up was really great! Let's iterate
on it a time or two -- I see that Patrick "marlowe" McDonald has also
offered to format it into restructured text -- and then let's try to
get some widespread readership of it, such as submitting it to a
magazine like Linux Weekly News (http://lwn.net ) or Linux Journal or
something. Any other suggestions from people here about how to get
this article to the right audience?

Regards,

Zooko


More information about the tahoe-dev mailing list