[tahoe-dev] version advertisement and negotiation

Brian Warner warner-tahoe at allmydata.com
Tue Oct 28 20:18:15 PDT 2008


> On Sat, 18 Oct 2008 01:17:31 -0700
> Drew Perttula <drewp at bigasterisk.com> wrote:

Excellent points, as always!

> > Reducing round trips is nice, and being able to reduce our dependence

> It seems like in your situation, the pattern is "meet a server once and
> then do many megabytes of traffic with it", so I don't see why
> optimizing for round trips is so important.

True, I'm getting ahead of myself by thinking of this as a requirement. There
are a couple of different long-term goals that feed into this desire:

 * making Tahoe into an RFC-style standard, which would be eased by using
   more familar protocols (a variant of HTTP rather than Foolscap, etc),
   which may lead to less connection-oriented communications
 * improving scalability by using a Chord structure (a plan we called "Denver
   Airport" once upon a time), which would involve forwarding messages
   through other hosts, resulting in something less connection-ish
 * reducing round trips in the actual storage-server interaction, like
   coalescing today's half-dozen immutable-share writes into a single one,
   which would probably speed up writes of small files by a factor of 3

But none of these are a strong motivation, and you are quite right that we
can cache what we know about the remote side at the beginning of the
connection and use it for a while.


> If you were relaxed about the time and bytes it takes to do the
> negotiation, I would think you'd just talk about the features directly
> instead of making everyone map them back and forth to version numbers.

Yeah, I like that: send a dictionary that maps featurename to support level,
or something, which you increment every time you think you've changed
something. Instead of saying that the "storage" feature was at version 2,
you'd talk about "storage/immutable-share-size-limit" and
"storage/accounting-support", each with their own value.

The concern we had was about what sort of changes we might make which
unknowingly affect compatibility.. the application version number is sort of
a hedge against such changes. The idea is that it should be easy to figure
out known compatibility issues from the Protocol Version numbers, and that it
should be at least possible (although probably difficult) to work out
unanticipated compatibility issues from the application version.

Of course, it all depends upon how crazy the compatibility issues get. Having
it be difficult to manage compatibility would discourage us from making
changes, which is a double-edged sword. I don't know how much energy to put
into this kind of thing.. picking a reasonable amount of engineering time is
a compatibility decision in itself.


> Next, I hope you're considering the presence of non-allmydata nodes out
> there. Your server version number shouldn't be "1.2.0r1234", since you're
> then inviting each new server to invent its own new scheme for versioning.
> At least use "allmydata-1.2.0r1234".

Yeah, that's my biggest reason for preferring the "Protocol Version" numbers
as the primary factor. We've already had problems with the "1.2.0r1234"
scheme becoming non-monotonically-increasing, since the Allmydata windows
client product that incorporates Tahoe is labelled "3.0.0" (since it followed
the non-Tahoe-using client named "2.0"), and due to the slightly weird way
that we've been building the windows executable, the tahoe node itself
advertises the same 3.0.0 version. So now we have tahoe 1.3.0 being more
advanced than something that claims to be tahoe 3.0.0 .

My feeling is that a version string (as opposed to a number) is always going
to be susceptible to pressures of marketing, etc. Zooko and I have gone back
and forth about this a bit.. I'd rather not have technical reasons why the
application version string must be in a specific form (MAJOR.MINOR.NANO.ETC),
since I figure that sooner or later someone will want to use it in a
different way. Using a separate (and small) number space for each specific
purpose (protocol compatibility) feels better, leaving the string to the
whims of the marketing folks.


> As an RDF nut, I would suggest making the server versions always be URLs.
> Then I would go further and say that each negotiated feature should also be
> a URL. Using URLs makes the values really clear for debugging purposes; you
> can make them into working URLs that connect to the documentation about the
> feature; and we'll be able to search the net for any feature URL to find
> past discussions about it, etc.

Hm. Yeah, I see the goal, but I'm not sure if I can completely imagine how to
implement it.

We have two examples of the sort of thing we need (cited in my earlier
message): storage servers limited to 2**32-1 or to 2**64-1, and signed vs
non-signed Introducers. We'll undoubtedly come up with new axes along which
we need to make distinctions in the future, but we won't know how to label
each axis until then. For each of those future features, the old behavior
will be retroactively defined as "version 0".

So I suppose we'd define something like
"http://allmydata.org/protocols/storage/immutable-share-size/0" as 2**32-1,
vs ".../1" as 2**64-1, and the node would return a dictionary that includes a
mapping from "http://allmydata.org/protocols/storage/immutable-share-size/"
to "1", and write a function that does int(versions.get(FEATURE, "0")).

And later we define some other feature, like ".../storage/accounting", where
/0 means the server doesn't understand accounting at all, and /1 means it
provides some particular level of support.

Hm, ok, I guess I can imagine what it would mean.


cheers,
 -Brian


More information about the tahoe-dev mailing list