[tahoe-dev] signed Introducer messages, server-selection work

Thu Jun 10 20:48:25 PDT 2010

On Thu, Jun 10, 2010 at 11:39 AM, Brian Warner <warner at lothar.com> wrote:
>
> I've been thinking about server-selection again this week,

Yay!

> To enable a distributed Introducer (i.e. via gossip-based flooding of
> announcements), we need to separate these two concerns.

The part about the distributed introducer is Faruq's Google Summer of
Code project:

http://socghop.appspot.com/gsoc/org/home/google/gsoc2010/tahoe_lafs

>  * there will be a file, perhaps named $BASEDIR/private/servers.txt,
>   which contains a list of storage servers that will be used for upload
>   and download. Each server gets a single entry on this list, and the
>   entries contain at least a nodeid, and maybe some extra data.

I like this part.

>  * servers.txt may be modified or created by a "serverlist manager"
>   component. Some entry in tahoe.cfg indicates which manager to use and
>   how it should be configured.

I'm pretty interested in serverlist managers which exist entirely
outside of the Tahoe-LAFS codebase. For example, there's the one where
you edit the file with your favorite text editor and insert or delete
server ids. Another one runs cron job once per day that wgets the new
serverlist.txt from a web site. Another one keeps the serverlist.txt
under git and the serverlist gets deployed automatically by puppet on
its schedule or in response to its triggered events.

In fact, now that I've said all that I actually think that a
serverlist manager who wants to live inside the Tahoe-LAFS codebase
has some explaining to do about why he needs to.

>  * The default configuration uses a "traditional Introducer" manager,
>   which reads introducer.furl from the config and behaves just like our
>   current system.

Okay.

> Servers may also be removed: when a friendnet participant stops behaving
> in a friendly manner, clients may want to remove that server from their
> lists. Or a server in a commercial grid might be compromised, and should
> be removed from the list.

Note that you don't really *need* to update the serverlist in order to
handle a server getting shut-down or going becoming unreachable --
that is already handled automatically by clients. You really need to
update the serverlist when a server is still there and acting legit
from the Tahoe-LAFS storage client's perspective, but you are no
longer willing to rely on it for storing shares of your ciphertext!

> 1 "static": The simplest approach is to let client admins manage their
>            serverlist manually, and not enable a serverlist manager.
>            Another form of this is to have client admins specify an
>            explicit list of serverids, and use the Introducer to merely
>            learn their contact information (ipaddr/port).

Right, unless I'm misunderstanding, the (decentralized) introduction
is still assumed to be working as a way to discover IP addresses/ports
in all of these approaches.

> 2 "update-furl": in this form, clients are given a FURL that points to a
>                 server. The server provides a list of approved nodeids.
>                 As in the "static" case, the Introducer is used to
>                 distribute contact information, but not to authorize
>                 servers. The client would fetch the nodeid list before
>                 each upload/download operation. Variations involve
>                 caching the list for a certain time, to reduce network
>                 traffic at the expense of immediacy.

I can't immediately see why this is better than the cron job that runs
wget. Well, I guess it is better because of users who don't know how
to or don't want to edit their crontab, but do want this behavior and
know how to edit their Tahoe-LAFS config to give it a serverlist
manager FURL. Hm. It feels like a shame to mix otherwise separate
behavior into one codebase and operating system process just because
we don't have a good way for users to manage more than one process,
but perhaps that's the situation we're faced with.

> 3 "one-key": The grid is defined by a single signing/verifying keypair.
>             All authorized servers are given the signing privkey, and
>             all clients are configured with the verifying pubkey.

This just smells wrong to me. I hope we don't do this.

>             Variations include giving the client a list of pubkeys,
>             and/or giving each server a list of privkeys.

Wait, what? Isn't that entirely different? In fact, isn't that
equivalent to #1-static? :-)

> 4 "certchain": Each client gets a pubkey, as before, to which they
>               delegate their server-selection. However Announcements
>               are a chain of signed messages, each message delegating
>               authority to the key which signs the next message.

This seems flexible and reasonable. The list of blessers could be
stored in a text file and could be
statically/manually/separately-from-Tahoe-LAFS managed, just like the
list of servers could.

In fact, there's a sense in which blessers and servers are sort of the
same thing, right? If I put a server's id (or public key) into place
to indicate that I'm willing to rely on that server to store shares of
my ciphertext, then that also means that I'm willing to store shares
on another server of that server's choice. This may seem
counter-intuitive to some people, but it's impossible to *prevent*
that server that you specify from actually storing the shares on a
different server and retrieving them from that server on demand.
That's the way the Redundant Array of Inexpensive Clouds proposal by
Kevan Carstensen would work, for example.

So, the argument goes, since you can't prevent that server from
delegating its storage responsibilities opaquely, you might as well
enable to to delegate them transparently, by instead of proxying the
uploads and downloads through to the other server, giving you a signed
message saying that it vouches for the other server and you should be
willing to use the other server.

For one thing, it can make it more manageable to have that situation
rendered transparent so that you can see it, such as by examining your
servers.txt, perhaps, or by some UI tool that shows you which servers
are holding which shares and why you are using those servers in the
first place. That way if a server were to become flakey and corrupt or
lose your share, you could both disable that specific server (somehow)
in your servers.txt, but you could also trace which other server
originally vouched for it.

Obviously this doesn't prevent a server from delegating its storage
responsibility opaquely to you. Both styles of delegation of storage
responsibility might be useful.

> After talking with Zandr about what the ops team at an AMD-like service
> would prefer,

Zandr is an excellent resource, since he understands Tahoe-LAFS and
was responsible for ops for Allmydata for quite a while. He is
probably the single best resource for the question of how an
Allmydata-like service would want to operate. BUT, unless I missed a
super happy memo, he's not currently using Tahoe-LAFS for anything, so
in addition to his wisdom we should really seek out feedback from
current users.

Any current users out there managing a Tahoe-LAFS grid? If so, please
consider the different methods of managing servers in this thread and
let us know what you think!

>I think the development trajectory to follow is #1-static,
> then #3-one-key, followed by #4b-certchain-with-renewal-URL.

Personally I'm "+1" on #1-static, "-1" on #3-one-key (it is just
icky), "+0" on #4a and "-0" on #4b (administrators may want to control
the blesser.txt with git+puppet or something rather than with
Tahoe-LAFS+foolscap).

Oh, another wrinkle to this is that the uses cases that I've heard the
loudest from the most people are all about specifying new policies of
how to select from *among* blessed, acceptable servers for this
particular share. Those uses cases are documented in #467, #573, and
the "ServerSelection" page on the wiki.

http://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/467# allow the user to
specify which servers a given gateway will use for uploads
http://tahoe-lafs.org/trac/tahoe-lafs/ticket/573# Allow client to
control which storage servers receive shares

Maybe a good first step would be to let the serverlist.txt entries
have a set of arbitrary tags after each server id.

> what does everybody think?

I'm pretty uncertain about the #4-cert-chain. I'm also uncertain about
how the server-blessing mechanism (discussed here) would interact with
a server-selection-for-this-upload mechanism (per #467 et al).

One thing we all agree on is that #1-static plus a "traditional
introducer" manager (for backward-compatibility) would be a step in
the right direction.

Right?

I want to make sure that Faruq understands what impact this work would
have on his Summer of Code project of Decentralized Introduction. As
far as I can see so far, it should have no impact, except that it will
make his Decentralized Introduction more useful by preventing users
from expecting Decentralized Introduction to handle server-blessing
for them...

Regards,

Zooko

P.S. You know what? "Introduction" will become the wrong word for that
service once we no longer rely on it for access control at all. Then
it will merely become "routing" or "IP address lookup".