[tahoe-dev] server selection Re: Node correlations
Brian Warner
warner at lothar.com
Tue Jan 24 23:54:42 UTC 2012
On 1/18/12 1:55 PM, Olaf TNSB wrote:
> I think the flexibility of allowing users to design their own
> arbitrary rules should be the goal. Python's a good solution.
Yeah, we've got a good platform, we just have to come up with ways to
let users express their goals in a way that the code can act upon.
I'd suggest a plug-in approach. tahoe.cfg can contain a line that names
the kind of server-selection algorithm being used (so it can default to
the current behavior), which basically points to a python class. If the
algorithm can be completely parameterized with a handful of strings or
numbers (like the current k/H/N shares.required / shares.happy /
shares.total approach), then that configuration can live entirely inside
tahoe.cfg . If not (e.g. it needs a big list of server identifiers and
local attributes on each), put that stuff in a separate file and have
tahoe.cfg point to it.
So:
* the server-selection portion of upload should be isolated into a
single class, which is handed the configuration, gets to ask servers
whether they'll hold a share or not, and finishes with a "share map"
that meets the constraints (the current code is actually pretty close)
* if a server disconnects while the upload is running, the
server-selection object should be prepared to provide a replacement.
The rest of the uploader can then figure out how to restart the
encoding process as necessary.
* the *download* side gets parameterized similarly: its server-selection
object is responsible for producing a list of servers sorted by how
likely they are to have shares (the optimum query order). Downloads
would still work if this returned a completely unsorted list, but
might take a very long time to find all shares (especially if there
are a lot of servers)
The design constraints are:
* download needs to keep working, and the downloader might not have the
same configuration as the uploader. The uploader can pass information
through the filecap to the downloader, but not very much. The
downloader can search, though: it isn't necessary for them to exactly
agree on where the shares are, as long as agree enough to make the
search time reasonable: worst case is an exhaustive search per file.
* the repairer might move shares around later, and any existing filecaps
need to keep working, ideally work even better once repair/rebalance
has moved shares to better places
* maintain file diversity: as long as there are multiple acceptable
share-placement-maps, the uploader should pick randomly among them, to
minimize server-choice correlations between each uploaded file. The
downloader should deal with this correctly. We've defined the
"server-selection-index" (SSI) as the value that the downloader uses
to permute the server list in the same was as the uploader. The
current code uses the storage-index (SI) for this purpose.
Also note that different placement strategies are appropriate for
different purposes. If you want a backup service, then you probably want
to exclude your own server from the list (or at least don't count local
shares as contributing to reliability). If you want a general-purpose
distributed filesystem that performs equally well for all downloaders
(including yourself), then you want the shares to be distributed
uniformly. If you have clusters of users with better connectivity to
certain servers, you might want to put more shares close to them. A
single grid might be used for all these purposes at the same time, so
eventually we're going to need to add an API that lets the uploader
choose their strategy at runtime ("tahoe put --for-backup FILE"?)
cheers,
-Brian
More information about the tahoe-dev
mailing list