[tahoe-dev] server selection
Zooko O'Whielacronx
zooko at zooko.com
Tue Apr 21 12:42:48 PDT 2009
Folks:
One thing I learned from attending CodeCon is that there are a lot of
people who have ideas about how to use Tahoe and how they want its
server selection to be this or that specific policy.
Here are my notes, which I just entered onto a wiki page. Please
reply to the list and then update the wiki page to reflect the new
consensus:
Different users of Tahoe have different desires for "Which servers
should I upload which shares to?".
* allmydata.com wants to upload to a random selection, evenly
distributed among servers which are not full; This is,
unsurprisingly, what Tahoe v1.4 currently does.
* Brian has mentioned that an allmydata.com-style deployment might
prefer to have the servers with more remaining capacity receiving
more shares, thus "filling up faster" than the servers with less
remaining capacity.
* Kevin Reid wants, at least for one of his use cases, to specify
several servers each of which is guaranteed to get at least K shares
of each file, in addition to potentially other servers also getting
shares.
* Shawn Willden wants, likewise, to specify a server (e.g. his
mom's PC) which is guaranteed to get at least K shares of certain
files (the family pictures and movies files).
* Some people -- I'm sorry I forget who -- have said they want to
upload at least K shares to the K fastest servers.
* Jake Appelbaum has said that he wants to specify a set of servers
which collectively are guaranteed to have at least K shares -- he
intends to use this to specify the ones that are running as Tor
hidden services and thus are extra attack-resistant but also extra
slow-and-expensive to reach.
* Several people -- again I'm sorry I've forgotten specific
attribution -- want to identify which servers live in which cluster
or co-lo or geographical area, and then to distribute shares evenly
across clusters/colos/geographical-areas instead of evenly across
servers.
As I, Zooko, have emphasized a few times, we really should not try to
write a super-clever algorithm into Tahoe which satisfies all of
these people, plus all the other crazy people that will be using
Tahoe for crazy things in the future. Instead, we need some sort of
configuration language or plugin system so that each crazy person can
customize their own crazy server selection policy. I don't know the
best way to implement this yet -- a domain specific language?
Implement the above-mentioned list of seven policies into Tahoe and
have an option to choose which of the seven you want for this upload?
My current favorite approach is: you give me a Python function. When
the time comes to upload a file, I'll call that function and then use
whichever servers it said to use.
http://allmydata.org/trac/tahoe/wiki/ServerSelection
Regards,
Zooko
More information about the tahoe-dev
mailing list