[tahoe-dev] signed Introducer messages, server-selection work

Brian Warner warner at lothar.com
Thu Jun 10 23:10:59 PDT 2010


Zooko: thanks for the detailed response!

> The part about the distributed introducer is Faruq's Google Summer of
> Code project:

Yeah, we should figure out where his project is going and how it's
likely to affect the Introducer codebase. The other half of my efforts
are going into updating the way the Introducer works (the
signed/*extensible* introducer announcements). The work that I did 20
months ago on #466 (and shelved because of no ecdsa in pycryptopp) adds
a new "v2" introducer RemoteInterface, which deals in terms of signed
dictionaries instead of unsigned tuples, and includes
backwards-compatibility code so that new clients can work with old
Introducers, and old clients with new Introducers.

Take a look at the "466-ver5.patch" on that ticket. I'd really like it
if the new distributed introducer code weren't completely incompatible
with that patch. Or at least the new distributed introducer should work
in terms of the same signed-dictionaries that the patch defines, rather
than in terms of the old unsigned-tuples. If the distributed-introducer
changes provide for backwards compatibility that's as good as my
466-ver5.patch, then I'd be happy to scrap that code in favor of
Faruq's.

> In fact, now that I've said all that I actually think that a
> serverlist manager who wants to live inside the Tahoe-LAFS codebase
> has some explaining to do about why he needs to.

I too am interested in a composable system, in which the serverlist can
easily be manipulated by external tools. However, I think it's important
that Tahoe users can accomplish common use cases "out of the box",
without writing their own manager programs. And single-process nodes are
much easier to manage than multiple-process nodes. So I strongly feel
that all our basic use cases need to implementable by tahoe.cfg
controls, even if they can be explained in terms of cron and wget.

>> Servers may also be removed: when a friendnet participant stops
>> behaving in a friendly manner, clients may want to remove that server
>> from their lists. Or a server in a commercial grid might be
>> compromised, and should be removed from the list.
> 
> Note that you don't really *need* to update the serverlist in order to
> handle a server getting shut-down or going becoming unreachable --

Yeah, that's why I said "unfriendly manner": it's perfectly friendly to
fail in an obvious fashion, but failing in a non-obvious fashion is not
so friendly :).

>> 1 "static": The simplest approach is to let client admins manage
>>             their serverlist manually, and not enable a serverlist
>>             manager.
> 
> Right, unless I'm misunderstanding, the (decentralized) introduction
> is still assumed to be working as a way to discover IP addresses/ports
> in all of these approaches.

Correct. The goal is to reduce the Introducer's responsibility down to
maintaining a mapping from serverid to some sort of remote connection to
that server. (For now, that means a Foolscap RemoteReference, but
alternate backends could represent that connection with something else).
So it's just a routing layer.

>> 2 "update-furl": in this form, clients are given a FURL that points
>>                  to a server.
> 
> I can't immediately see why this is better than the cron job that runs
> wget. Well, I guess it is better because of users who don't know how
> to or don't want to edit their crontab, but do want this behavior and
> know how to edit their Tahoe-LAFS config to give it a serverlist
> manager FURL.

It would also offer lower latency and low traffic, because you can do a
proper publish-subscribe over foolscap, but not over wget. Running the
cron job once per day would be nice and low-traffic, but would induce a
day of latency before you picked up changes, and running it every minute
would be low-latency but excessive traffic.

But my main problem with #2-update-furl is that it's almost identical to
the old Introducer-has-serverlist-authority. I really only included it
for comparison.

>> 3 "one-key": The grid is defined by a single signing/verifying keypair.
> 
> This just smells wrong to me. I hope we don't do this.

Well, it's awfully simple and easy to understand, and is a 90% solution
for each of our main use cases (and if, like AMD, you never remove a
server, it would be a 100% solution). But yeah, the lack of revocation
feels like a fatal flaw to me too.

>>             Variations include giving the client a list of pubkeys,
>>             and/or giving each server a list of privkeys.
> 
> Wait, what? Isn't that entirely different? In fact, isn't that
> equivalent to #1-static? :-)

It's only the same as #1-static if you assume that each server has a
separate privkey, and therefore each client gets every server's pubkey.
I was imagining that you might create two keypairs A and B, and split
the servers into two groups: every server in group A gets the Apriv key,
and everyone in group B gets the Bpriv key. Then clients could remove
one group at a time.

And no, I can't think of a sensible reason for doing that either :-).
But the ability to do so makes it somewhat more flexible than #1-static,
so I thought it needed to be included. It also represents one end of the
spectrum of revocability (the "instarevocable" end, as opposed to the
"irrevocable" end).

>> 4 "certchain": Each client gets a pubkey
> 
> This seems flexible and reasonable. The list of blessers could be
> stored in a text file and could be
> statically/manually/separately-from-Tahoe-LAFS managed, just like the
> list of servers could.

Hmm, that's a good idea.

> In fact, there's a sense in which blessers and servers are sort of the
> same thing, right?

Yeah, I think so. In writing up some more docs, I started using terms
like "blessers" and "superblessers", so I think I'm seeing the
similarities too.

Fundamentally, the client delegates their server-selection authority, by
placing pubkeys in their tahoe.cfg or equivalent, to the holders of the
corresponding privkeys, and those holders can re-delegate that authority
to anyone they like. Having certificate chains just makes it possible
for that re-delegation to be revoked in a relatively fine-grained
fashion, which is a usability win.

The upload/download process works in terms of servers, so it's fair to
say, whoever you delegate the selection authority to, at the end of the
day they're picking a set of servers for you (from which you do the
usual per-file permuted-ring peer-selection algorithm).

The "transparency" of which you speak is probably expressed by the
client receiving signed announcements that include a chain of keys (so
you can see the sequence of delegation steps). The less-transparent
alternative would be for servers/blessers to just share private keys
(i.e. #3-one-keypair), which would lose the information about who
granted authority to whom.

A UI which shows information about the certchains (in the
IntroducerClient webstatus) would be handy. We'd probably need nicknames
for each key, and maybe some historical information about them, to make
that really useful ("who else did this blesser sign? what's this
intermediary key for? in how long will this blessing expire?").

>> I think the development trajectory to follow is #1-static,
>> then #3-one-key, followed by #4b-certchain-with-renewal-URL.
> 
> Personally I'm "+1" on #1-static, "-1" on #3-one-key (it is just
> icky), "+0" on #4a and "-0" on #4b (administrators may want to control
> the blesser.txt with git+puppet or something rather than with
> Tahoe-LAFS+foolscap).

I'm not sure we understand each other about #4b, if you think that
git+puppet is a functional substitute, so let me go into a bit more
detail (my apologies if I misunderstood your understanding). The
important thing about #4b is that the server (which has a private key
"B" and wants to announce a certchain that starts with "A" and ends with
"B") needs a copy of an up-to-date A->B blessing prefix at all times.
That server is going to concatenate the prefix (which says "I, holder of
privkey A, delegate my server-selection authority to the holder of
privkey B") to its own signed announcement (which says "I, holder of
privkey B, say that you should use an RIStorageServer at FURL=xyz"), and
send the pair to the Introducer as its Announcement.

Since the prefix expires after a while (to enable revocation), server B
needs to periodically acquire a new one. There must be some program that
holds privkey A (and a list of which keyids "B" are still in the club),
which generates these prefixes (either periodically or on demand), and
there needs to be some way to get the prefix file from A to B in a
timely fashion.

The simplest approach I've thought of for that prefix-file transfer step
is for B to use wget (or rather an in-node twisted.web.client
equivalent) in a cron-job-like loop, and for A to write signed prefix
files into a static web-server directory in a similar loop. With a
suitable sign-a-prefix tool, this could be quickly assembled out of
cron, apache, and wget. But I want to make it easy for Tahoe users to
make this work without writing shell scripts, so I plan to build this
into the tahoe node too. If the code to do this is already inside a
Tahoe node, then transferring the prefix-file with a FURL would be even
easier to implement.

But I intend to make the transfer mechanism behave equivalently to the
cron-plus wget form, and include that form as an example in the docs, so
that folks who want to do it that way have some hints to follow.

Note that when B fetches the A->B blessing prefix, the connection does
not need to be confidential, nor (if you tell B what key A to expect)
does it need integrity: B can check the prefix itself to see if it's the
right one. In contrast, a client which is updating its list of
acceptable serverids *does* need an integrity-protecting channel,
because otherwise attackers can inject bad serverids into their list.
puppet-over-ssh would be fine, but raw wget would not.

The blessing prefix is an operational issue, rather than a configuration
issue, so using puppet to distribute updated blessing prefixes isn't an
obvious solution to me. But maybe I'm just thinking about it the wrong
way around.

> Oh, another wrinkle to this is that the uses cases that I've heard the
> loudest from the most people are all about specifying new policies of
> how to select from *among* blessed, acceptable servers for this
> particular share.

Yes, that is an entirely separate question. The #466 work is strictly
limited to populating the set of servers, from which any #467-like
per-file selection criteria would choose a subset. #466 is further
limited to choosing the serverlist based upon client-delegated
authority, not other interesting properties like geographic location,
load, capacity, network proximity, or potential shared failure modes.

Another way to look at it is that all clients in a given grid (for some
value of the word "in") should have the same #466-selected serverlist.

> Maybe a good first step would be to let the serverlist.txt entries
> have a set of arbitrary tags after each server id.

Interesting. Yeah, there's an overlap here with other desireable
features that I haven't figured out yet. There are about three
categories of server information that we might want to write down into
lists inside BASEDIR/ somewhere:

 * list of serverids, to indicate which servers we're willing to use
   (this presumes that all other information about each server will
   arrive via some other means, like the Introducer)
 * list of information about each server (location hints, versions, free
   space), replacing whatever the Introducer would have told us. This
   would also include statically-configured non-Foolscap storage
   servers, like the URL/credentials/bucketname for an S3-based server,
   that might not be easy to distribute through an introducer.
 * list of attributes for each server (like geographic location and
   colo/rack/slot numbers), for use in more sophisticated per-file
   server-selection code

For the purposes of #466, I kind of want to have a file that only
contains the serverids. But for the other goals (Introducerless static
configuration, non-foolscap storage servers, attribute-based selection),
I want to have a file that contains one line per server, with a
dictionary of attributes on each line.

I haven't yet worked out how to combine these things.

> I'm pretty uncertain about the #4-cert-chain.

Yeah. I think we need it, because the lack of revocability is a fatal
flaw for #3, and the lack of scaling is a fatal flaw for #1, and the big
SPOF is a fatal flaw for #2. I wrote the code and got all the tests
working two years ago, so I don't imagine it will be too hard to freshen
it up and get it working again, but it'd still be nice to use something
simpler instead.

> I'm also uncertain about how the server-blessing mechanism (discussed
> here) would interact with a server-selection-for-this-upload mechanism
> (per #467 et al).

Yup. I *think* it'll be safe to say something like:

 * the Introducer system (distributed or not) provides a set of servers
 * the #466 blessers reduce that set by disregarding unblessed servers
 * for each upload/download, the #467/etc criteria assigns shares to
   servers, effectively reducing the server-set further

and thus say that #467/etc expect to get a list of [blessed] servers,
but don't care how that list came to be.

> One thing we all agree on is that #1-static plus a "traditional
> introducer" manager (for backward-compatibility) would be a step in
> the right direction.
> 
> Right?

I think so, yes.

> I want to make sure that Faruq understands what impact this work would
> have on his Summer of Code project of Decentralized Introduction. As
> far as I can see so far, it should have no impact, except that it will
> make his Decentralized Introduction more useful by preventing users
> from expecting Decentralized Introduction to handle server-blessing
> for them...

Right, the original reason for #466 was to enable decentralized
introduction, i.e. let us use it without losing all control over which
servers are used by each client. The original use case for #466 was the
AllMyData commercial grid, which (about every two months) was impacted
by a power-user customer accidentally configuring their home client node
to offer storage services to the rest of the grid, which resulted in
other customers sending their shares his way, rather than to the
company's machines. We had two solutions in mind: #466, or the "split
Introducer", in which there'd be separate FURLs for publishing and
subscribing (and we'd keep the publish.furl secret, for use by AMD
servers only). We knew that #466 was more general, so we put off the
split introducer.

The biggest impact I foresee on Faruq's work is in merging it with the
signed-dictionary types from 466-ver5.patch, and making sure the
backwards-compatibility cases are handled well.

> P.S. You know what? "Introduction" will become the wrong word for that
> service once we no longer rely on it for access control at all. Then
> it will merely become "routing" or "IP address lookup".

Hm, probably. I suspect there will be a long-term use for a component
named "Introducer", but it may end up referring to the (distributed)
system by which you learn about the *reputation* of potential storage
servers, through some sort of PGP-web-of-trust -like system. For now, we
can work on fairly static forms of this feature (explicit listing of
serverids, or specific human decisions to sign/not-sign blesser
messages). But we can imagine a future in which your system is picking
servers based upon long-term reliability indices, or willingness to
exchange storage space, or cheap mojo prices, etc. (also I'm reminded of
the Invitation protocols we worked out a few years ago). These things
might then fall under the banner of "Introduction", even though by that
point it will refer to something far more sophisticated than what we've
got right now.

cheers,
 -Brian


More information about the tahoe-dev mailing list