[tahoe-dev] storage-club URLs

Brian Warner warner at lothar.com
Tue Feb 22 13:56:30 PST 2011


I've been thinking and talking a while about moving to "Storage Clubs"
in tahoe as a grid-membership management system. The idea is use an
invitation scheme: when you start your Tahoe node, you can either start
a new grid or accept an invitation to join an existing one. Grids would
be kept small (below Dunbar's Number[1], maybe 150 participants) to let
social pressures work to keep the grid healthy, in conjunction with the
Ostrom's Principles accounting work we're doing.

To get files from somebody else's grid, you'd need a gateway of some
sort. What I've been thinking about recently is how to reference files
through those gateways with URLs that could also be used by regular
non-Tahoe HTTP clients, and how they could be used to publish data with
various tradeoffs between security and legibility.

So imagine that each club has a shared SSL key: all members of the grid
know it, they share it with new members upon joining, and they keep it
secret from non-members. (remember that clubs are invitation-only). Now
hash the associated public certificate to get a DNS prefix we'll call
$PREFIX, formed as in Tyler Close's YURL scheme, like this:

 $PREFIX=
 sha256-4oymiquy7qobjgx36tejs35zeqt24qpemsnzgtfeswmrw6csxbkq.tahoe-lafs.org

This would let an improved HTTPS clients compare the received
certificate against the expected one, which cuts the CA hierarchy out of
their reliance set.

Each participant of the club can provide "external gateway" services, in
addition to the usual share-storage service. If they turn this on, their
node does some UPnP magic and tries to acquire an externally-visible IP
address (maybe for just a few hours, until their DHCP lease changes).
They also contact a tahoe-lafs.org dispatch/dyndns service, and tell the
dispatcher about their IP address. The dispatcher checks to make sure
they're really reachable at that address and that they really posess the
right certificate. If so, the dispatcher adds the IP address to a
round-robin DNS table for $PREFIX. HTTP/HTTPS clients who hit $PREFIX
will thus be routed to one of the current gateways. The dispatcher is
responsible for periodically checking in on the active gateways and
removing unresponsive ones. If you have enough grid members running
gateways, you could get 24/7 availability to external clients.

The dispatcher is also the owner of the tahoe-lafs.org SSL key (which
we'll call the "org" key), which is used to sign/delegate $PREFIX to the
club's shared key. This allows unmodified clients to get CA-based
verification of the club's key, independent of the YURL-properties of
the $PREFIX DNS name.

Your Tahoe filecaps look like CHK-$KEY, and are tied to a specific grid,
so their native form might be CHK-$GRIDID-$KEY. When you want to share a
file with someone, you build a URL that looks like this:

 https://$PREFIX/u/CHK-$KEY

When a gateway's outward-facing HTTP/HTTPS interface receives this
request, they do a regular Tahoe download from the other club servers,
and deliver the plaintext to the HTTP client.

Then we can imagine four levels of clients, with useful tradeoffs
between how much security they get and how much extra code they are
running:

 1: Unmodified web browser: they rely upon the CA system (to tell them
    about the org key), and upon the org-key holder (who could sign any
    club key they liked), and upon the gateway they talk to (who could
    deliver any content they liked). The client effectively reveals the
    secret filecap to all these parties, and the integrity of the data
    they receive can be compromised by any such party. File availability
    depends upon DNS, the CA system (who could deny service by not
    signing the org key), the org dispatcher, at least one of the
    gateways, and a quorum of storage servers.

 2: web browser with YURL-verifying plugin: they ignore the CA system
    and look at the club key's hash. Confidentiality and integrity still
    rely upon the gateway as above, but the org dispatcher is now merely
    responsible for availability, and the CA system is unused.

 3: web browser with Tahoe plugin on a different grid: it recognizes the
    tahoe-ness of the URL, realizes that the file lives in a different
    grid, computes the storage-index, and fetches ciphertext from the
    gateway with a URL like https://$PREFIX/c/$SI . Then it decrypts and
    performs the usual Tahoe integrity check. This client does not
    depend upon anyone else for confidentiality and integrity.
    Availability relies upon DNS, the org dispatcher, one of the
    gateways, and a quorum of the servers.

 4: web browser with Tahoe plugin on the same grid: this recognizes the
    tahoe-ness of the URL, extracts the filecap, then does a normal
    Tahoe download of the file. This removes DNS, the org dispatcher,
    and the gateways from the reliance set: as with native Tahoe, you
    just depend upon being able to reach a quorum of storage servers.




This scheme could also be used to publish files with insecure public
names, like a normal Apache server. Within a club, we could build a
protocol with which a club member could claim a username (or subdomain)
and associate it with a directory readcap. All gateways would learn of
the mapping, and then external HTTP clients could hit URLs like these:

 http://$PREFIX/$USERNAME/$FILENAME
 https://$PREFIX/$USERNAME/$FILENAME
 https://$USERNAME.$PREFIX/$FILENAME
 https://$SUBDOMAIN.$USERNAME.$PREFIX/$FILENAME

Rather than have a strongly-enforced distributed-consensus system, I'm
thinking that gateways should simply accept claims on a
first-come-first-served basis, flood the announcement to all other
gateways, and then give their operators controls to remove or reassign
names that are contested. Gateway operators, as the ones willing to
provide service to the outside world, are the appropriate parties to
make decisions about what gets published through their hosts.



I think this sort of scheme would make it possible for the non-Tahoe
world to benefit from data published in a Tahoe grid, with as much or as
little security as they were willing to install (but improved
availability compared to a single server, and less centralized control).
And it might make an interesting basis for distributed publishing, which
would be a useful feature in a "Freedom Box"-style device.

what do you think?
 -Brian


[1]: http://en.wikipedia.org/wiki/Dunbar%27s_number


More information about the tahoe-dev mailing list