[tahoe-dev] [tahoe-lafs] #872: Adjust the probability of selecting a node according to its storage capacity
tahoe-lafs
trac at allmydata.org
Tue Dec 29 23:01:53 PST 2009
#872: Adjust the probability of selecting a node according to its storage
capacity
------------------------------------------------------------+---------------
Reporter: davidsarah | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-peerselection | Version: 1.5.0
Keywords: performance scalability preservation bandwidth | Launchpad_bug:
------------------------------------------------------------+---------------
Comment(by davidsarah):
Replying to [comment:10 davidsarah]:
> Besides, I have an idea about how to do better, but I'll have to think
about it some more.
The idea worked out.
The comment:3 algorithm is equivalent to picking the minimum hash value
out of ''e'' independent hash values for each server. We can get the same
result by taking a single hash, and transforming it so that it follows the
same distribution as the minimum of ''e'' hashes would have done.
Let X''e'' be the distribution given by the minimum of ''e'' independent
uniform distributions U1..''e'', each on [0, 1). The
[http://en.wikipedia.org/wiki/Cumulative_distribution_function cumulative
distribution function] of X''e'' is given by:
||F_X''e''(x)|| = P(X''e'' <= x) ||
|| || = P(min(U1, U2, ... U''e'') <= x) ||
|| || = 1 - P(U1 > x) P(U2 > x) ... P(U''e'' > x) ||
|| || = 1 - (1-x)^''e''^ ||
Then we can use [http://en.wikipedia.org/wiki/Inverse_transform_sampling
inverse transform sampling] to generate samples from X''e''. For that we
need the inverse of F_X''e'' which is
||(F_X''e'')^-1^(y) = 1 - (1-y)^(1/''e'')^||
So if we let y be the peer id hash for a given server scaled to the range
[0, 1), and ''e'' be its capacity estimate, then sorting according to 1 -
(1-y)^(1/''e'')^ will give the same distribution of permutations that we
would have got by the comment:3 algorithm.
Plotting (F_X''e'')^-1^ for various ''e''
[http://fooplot.com/index.php?&type0=0&type1=0&type2=0&type3=0&type4=0&y0=1-(1-x)^(1/1)&y1=1-(1-x)^(1/2)&y2=1-(1-x)^(1/3)&y3=1-(1-x)^(1/4)&y4=1-(1-x)^(1/5)&xmin=-0.35&xmax=1.35&ymin=0&ymax=1
gives results] that are intuitively reasonable in order for this to work:
increasing ''e'' biases the transformed hash toward lower values that are
more likely to be near the start of the list (but for any ''e'', there is
still some chance that the server will not be picked).
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/872#comment:11>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list