[tahoe-dev] server selection

Tue Apr 21 13:50:41 PDT 2009

On Tue, 21 Apr 2009 12:42:48 -0700
Zooko O'Whielacronx <zooko at zooko.com> wrote:

> One thing I learned from attending CodeCon is that there are a lot
> of people who have ideas about how to use Tahoe and how they want
> its server selection to be this or that specific policy.

[I added my response to the wiki page too, hence the funky markup here. Note
that in TracWiki, '''this''' is used for emphasis, instead of *this*]

Having a function or class to control server-selection is a great idea. The
current code already separates out responsibility for server-selection into a
distinct class, at least for immutable files
(source:src/allmydata/immutable/upload.py#L131 {{{Tahoe2PeerSelector}}}). It
would be pretty easy to make the uploader use different classes according to
a {{{tahoe.cfg}}} option.

However, there are some additional properties that need to be satified by the
sever-selection algorithm for it to work at all. The basic Tahoe model is
that the filecap is both necessary and sufficient (given some sort of grid
membership) to recover the file. This means that the eventual
'''downloader''' needs to be able to find the same servers, or at least have
a sufficiently-high probability of finding "enough" servers within a
reasonable amount of time, using only information which is found in the
filecap.

If the downloader is allowed to ask every server in the grid for shares, then
anything will work. If you want to keep the download setup time low, and/or
if you expect to have more than a few dozen servers, then the algorithm needs
to be able to do something better. Note that this is even more of an issue
for mutable shares, where it is important that publish-new-version is able to
track down and update all of the old shares: the chance of accidental
rollback increases when it cannot reliably/cheaply find them all.

Another potential goal is for the download process to be tolerant of new
servers, removed servers, and shares which have been moved (possibly as the
result of repair or "rebalancing"). Some use cases will care about this,
while others may never change the set of active servers and won't care.

It's worth pointing out the properties we were trying to get when we came up
with the current "tahoe2" algorithm:

 * for mostly static grids, download uses minimal do-you-have-share queries
 * adding one server should only increase download search time by 1/numservers
 * repair/rebalancing/migration may move shares to new places, including
   servers which weren't present at upload time, and download should be able
   to find and use these shares, even though the filecap doesn't change
 * traffic load-balancing: all non-full servers get new shares at the same
   bytes-per-second, even if serverids are not uniformly distributed

We picked the pseudo-random permuted serverlist to get these properties. I'd
love to be able to get stronger diversity among hosts, racks, or data
centers, but I don't yet know how to get that '''and''' get the properties
listed above, while keeping the filecaps small.

cheers,
 -Brian