[tahoe-dev] Access control and permissions on a tahoe grid

Fri Jun 12 19:19:27 PDT 2009

On Fri, 12 Jun 2009 18:59:08 +0100
Rufus Pollock <rufus.pollock at okfn.org> wrote:

> First off, I wanted do say a big thank-you for developing Tahoe --
> it's a great piece software serving a really important function.

Thanks!

> 1. Can you have a "Grid Administrator" (with root-style permissions)?

As Kevin pointed out, the answer is generally "no", but you'll want to
distinguish "permission to read/write files" from "permission to consume
storage space". Tahoe explicitly denies the first sort of permission: to see
the contents of a file you must either create it or receive the filecap from
someone who already knew it.

The second sort of permission is still being developed, but it will derive
from whims of the admin of each storage server. Each admin will be free to
delegate their grant-space-or-not decisions to some other party, and we
expect that a common mode will be for all participants to grant that control
to a centralized "account manager", who will then grant access to specific
users as desired.

> In our setup we want people to be able to "donate" nodes to the grid.
> At the same time there needs to be some way to monitor/control what
> people upload (the aim is to store open data of general interest not
> someone's personal backups or their CD collection) and we also want to
> ensure not just anyone can come and delete objects.

We should also define "deletion" more carefully than we have been so far.
There are two things that you might want to happen when you push the "delete
foo.jpg" button in some Tahoe directory. The first is that you remove the
link which associates the name "foo.jpg" with some particular filecap. The
second is that you might like the space consumed by foo.jpg to be released
and made available for other purposes.

In Tahoe, the first is implemented by modifying the mutable file which makes
up the parent directory. Tahoe directories are just tables of name+filecap,
serialized into bytes, and stored in a mutable file. Therefore anyone with a
writecap to the mutable file will be able to unlink its children.

The reclaim-the-space part is trickier, and we implement it with
garbage-collection. The storage servers don't know which directories hold a
reference to any given file (because they aren't allowed to read the
directories), so the rule is that clients are responsible for updating
leases, and servers are supposed to keep a file's shares alive until all of
the leases on that share have expired. This is less immediate than an
explicit delete operation would be, but it avoids race conditions and removes
the danger that the server might delete a file which is still being
referenced by some other parent directory. (think reference-counting).

So there isn't really an explicit "delete" permission, but write-permission
on all of the directories that currently contain a reference to foo.jpg is
pretty similar.

> What this suggests to us is we want a "Grid Administrator" role with
> root style permissions:
> 
> a) A "Grid Administrator" can see all objects (files/directories)
> created on the grid.
> 
> b) "Grid Administrator" has full access to all objects (in particular,
> can delete them if necessary)
> 
> c) We don't (always) make the writecap world-available.
> 
> As I understand it, to ensure a) we need every node owner to create
> objects within a designated root directory (otherwise their
> directories and files will be hidden from everyone else on the system
> -- as one would want for privacy ...).

Yup. If clients voluntarily give their filecaps/dircaps to somebody, then
that somebody can do anything they want.

> To get (b)+(c) requires that when objects are created on the grid
> (which may happen on a local node) that information is automatically
> passed to the "Grid Administrator"? AFAICT the only way to achieve
> this is to have all users only create objects on the grid via some
> central node/api/upload point. Is this correct?

Yes. Basically you're looking at hiding the Tahoe grid behind a proxy, and
that proxy limits the operations allowed to users: they can't just upload an
unlinked file (data->filecap), nor can they just create a new unlinked
directory, but they can (upload+link) a file into an existing directory, and
they can (mkdir+link) a new subdirectory of an existing directory.

You could conceivably give out the readcap and let users download data on
their own, without the proxy, but the Tahoe storage server protocol doesn't
currently distinguish between read-authority and write-authority, so those
users would also be able to upload unlinked files and create unlinked
directories, which you want to be able to prevent.

As Kevin pointed out, once Accounting is in place (some day..), you'll be
able to explicitly control space-consumption as an orthogonal issue to write
files you can read or write. You could implement some other schemes on top of
this: only give consume-space permission to the proxy, only publish the root
directory's readcap. Then clients could read to their hearts content, but no
server would give them space to upload new (unlinked) files. The proxy would
accept file data from clients, upload them (using its consume-space powers),
and link them into the root directory (using its writecap).

> 2. How do you control who can join a grid?
> 
> Is there any way to configure my node only to talk to these other
> nodes? Given that new nodes join a grid via an introducer I wondered
> if there were some way to use the introducer for this function. (E.g.
> I have to be a given a token which I pass to the introducer in order
> to be "allowed in")

The answer depends upon how you'd answer Kevin's question about "why do you
want to do this". It's also strongly influenced by the current storage server
protocol, which (as described above) doesn't split out upload-shares
permission from download-shares permission.

One reason to control who can join a grid is so that a storage-server
operator can control who gets to consume their disk space. The Accounting
project is our plan for this: it doesn't matter who can connect, as long as
they can't consume space without some sort of authorization that you control.
(our plan for Accounting involves authorized clients holding private DSA keys
which correspond to a DSA public keys that's been added to the server's
tahoe.cfg).

Another reason might be to control who can read certain files. We prefer
using the readcap for this: posession of the file/directory's readcap is both
necessary and sufficient to retrieve the contents. Two-factor authorization
at the file level (knowledge of the readcap PLUS membership in the grid) is
harder to delegate and reason about.

Yet another reason is to control which servers are used when you (or some
other "member" of the grid) uploads a file. For example, you might want your
shares to be placed on 100%-genuine high-quality allmydata.com(TM!) servers,
not those shabby fly-by-night allmydata.org servers, because you've got
limited bandwidth and want to entrust your precious few shares to the most
reliable servers :-). So, regardless of who you meet through this Introducer,
you only want to use servers that are branded "100% allmydata.com". For this
goal, we're working on ticket #466 (signed introducer announcements), and
you'll express your preference by putting a DSA pubkey in your tahoe.cfg .
Your client will only use servers whose introducer-brokered announcements
were signed by the matching privkey.

We considered using "secret introducers" to achieve this last goal, but we
decided to use signed-announcements instead. The main reason is that we want
to switch to gossip-based decentralized introduction at some point, which
just wouldn't work with a secret-introducer scheme. Another is that a
secret-introducer scheme unnecessarily conflates client access with server
access: since clients need to know the secret introducer FURL, they could
inject not-100%-allmydata.com server announcements. In the #466 scheme, the
introducer (and the client-side code which talks to it) has just one job:
distribute a list of likely servers. The introducer is *not* responsible for
making value judgements about those servers.. that job is left to the client,
who is the one who really cares about it anyways.

> 3. Is it ever possible to revoke capabilities.
> 
> For example, if I give you the writecap to directory X is there any
> way to rescind that later on (i.e. can I change the writecap for that
> directory without deleting it)?

Nope, not yet. In the current release, sharing filecaps and dircaps is an
irrevocable act. You'd need to introduce some out-of-band mechanism to
control access carefully enough to provide strong revocation properties.

Note, however, that dircaps reference mutable state, and there's nothing to
stop you from emptying out the directory and switching to using a different
one instead. The analogy we use is to change your phone number and tell
everybody your new one except that annoying guy that you don't want to talk
to anymore. They continue to have access to the old empty directory, but
nobody else is using it anymore, so who cares? (I suppose the analogy works
better if you've left an answering machine on the old number, but never check
the messages.. he can talk to himself all he wants, but nobody else is
listening).

If you're revoking access because you want to keep them from modifying some
state that you care about, then the move-and-don't-forward trick works just
fine, although it becomes a coordination issue if you've told lots of people
(i.e. you have to inform N-1 parties to revoke the remaining 1, and you need
some way to explain to them which directory it is you want to replace).

But if you're revoking access because you want to prevent them from reading
some file that they used to have access to, then it's a race between their
decision to read and your decision to revoke. They might have done a "cp -r"
the moment you first gave them the dircap, and did it again every ten
seconds, and really you've got no way to know whether they've read the files
or not. So from a security point of view, the most conservative position to
take is that read access is effectively irrevocable: if they've ever had read
access, you must assume that they've used it already, so there's no point to
taking it back now.

(Note that if the storage servers counted share-reads, you might be able to
know that nobody had read the file, and that might make revokers feel a bit
better, but there would be a lot of false-negatives. You could also ask
clients nicely to notify you whenever they read the file, and then feel
better if they hadn't done so before you revoked their access, but you'd
never really know for sure).

Revocation is a complicated topic. As Kevin said, it basically requires an
intermediary, which might either be a single proxy/gatekeeper or something
distributed (like an intermediate tahoe directory that you can later empty).
Any extra layer will hurt availability/reliability in hard-to-model ways
(what if the proxy is down? what is the probability two directories are
recoverable versus just a single directory?).

So we've not yet implemented any sort of revocation. The way I explain it to
folks is that Tahoe offers a very clear and understandable access-control
model. It might not do everything that what you want, but it's pretty easy to
see what it does and does not offer you, and you can use that to make good
decisions about how you use the features that it does have.

A lot of people want some sort of revocation (at least they think they do),
but when you try to nail down exactly what sorts of properties they'd be
happy with, so far we always wind up with a scheme that is a lot harder to
explain: it might do what you want, but it might not, and it's hard to tell
what it does and does not offer you. So we've chosen to err on the
conservative/lazy side and defer native Tahoe revocation for a time when we
understand it better.

cheers,
 -Brian