[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better
tahoe-lafs
trac at allmydata.org
Thu Sep 24 18:38:21 PDT 2009
#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
--------------------------------+-------------------------------------------
Reporter: zooko | Owner: kevan
Type: defect | Status: new
Priority: critical | Milestone: 1.5.1
Component: code-peerselection | Version: 1.4.1
Keywords: reliability | Launchpad_bug:
--------------------------------+-------------------------------------------
Comment(by kevan):
The case you describe, if I understand it correctly, would be fit by this
example:
Suppose I have four servers, {{{s_1}}} through {{{s_4}}}. Suppose I have
{{{k = 3, h = 3, N=10}}}. Suppose that the shares generated from a file f
are numbered {{{f_n}}} (so {{{f_1}}} is share 1, and so on). Then the
situation you describe is a problem if shares are distributed as follows:
* {{{s_1}}}: {{{f_1}}}
* {{{s_2}}}: {{{f_1}}}
* {{{s_3}}}: {{{f_1}}}
* {{{s_4}}}: {{{f_2, f_3, f_4, f_5, f_6, f_7, f_8, f_9, f_10}}}
because the set {{{{s_1, s_2, s_3}}}} is not sufficient to recover the
file, though it is the same size as {{{k}}}.
Do you think we're on the same page?
From what I understand of the placement algorithm, Tahoe-LAFS wouldn't
normally distribute one share to more than one server.
[source:src/allmydata/immutable/upload.py at 4045#L225 _loop] in
{{{Tahoe2PeerSelector}}} contains the guts of the peer selection
algorithm. As I understand it, it only has one request for a particular
share out to peers at a time -- either by using {{{pop()}}} on the list
containing the homeless shares, or by setting the slice of the homeless
shares list out on the wire at the moment to the empty list. If this is
the case, then it shouldn't allocate one share to more than one server.
A more interesting case is if, for some reason, we attempt to upload
{{{f}}} to a grid that has the following layout:
* {{{s_1}}}: {{{f_1}}}
* {{{s_2}}}: {{{f_1}}}
* {{{s_3}}}: {{{f_1}}}
* {{{s_4}}}: {{{\emptyset}}}
I'll also stipulate that {{{s_1, s_2}}} and {{{s_3}}} will not accept any
more shares, while {{{s_4}}} will accept every share it is asked to
accept. Let's walk through what happens when I try to upload {{{f}}}.
I'll assume for simplicity that {{{_loop}}} starts with
{{{
self.homeless_shares = [f_1, f_2, f_3, f_4, f_5, f_6, f_7, f_8, f_9,
f_10]
self.uncontacted_peers = [s_1, s_2, s_3, s_4]
}}}
{{{_loop}}} will start by checking to see if there are any homeless
shares. There are, of course. It then checks to see if there are any
uncontacted peers. At this point in the execution, there are, so it pops
the first uncontacted peer ({{{s_1}}}) off of the list of uncontacted
peers, and pops the first homeless share ({{{f_1}}}) off of the list of
homeless shares. It then asks {{{s_1}}} to store {{{f_1}}}. Recall that
{{{s_1}}} already has {{{f_1}}}. Since the {{{StorageServer}}}
([source:src/allmydata/storage/server.py at 3841#L35]) tells remote callers
about all of the shares it has for a given storage index, {{{s_1}}} tells
the {{{Tahoe2PeerSelector}}} instance that it has {{{f_1}}}, and that it
has not allocated any shares. This is handled by the {{{_got_response}}}
method of {{{Tahoe2PeerSelector}}}, which sees that {{{s_1}}} has not
allocated anything, but that it already has {{{f_1}}}: that means that
{{{f_1}}} is not added to the homeless list again, and that {{{s_1}}} is
set to be the prexisting peer with {{{f_1}}}.
{{{_loop}}} will now ask {{{s_2}}} if it can store {{{f_1}}} in the same
way. {{{s_2}}} will reply (in the same way as {{{s_1}}}) that it can't
(i.e.: that it hasn't allocated {{{f_2}}}), but that it happens to have
{{{f_1}}}. This causes {{{_got_response}}} to set {{{s_2}}} as the
prexisting share with {{{f_1}}}, overwriting the previously set value
({{{s_1}}})
The same process occurs with {{{s_3}}}: the end result of the
{{{_loop/_got_response}}} combination in that case is that {{{s_3}}} is
now the prexisting share with {{{f_1}}}.
Finally, {{{_loop}}} asks the last uncontacted peer, {{{s_4}}}, to store
{{{f_2}}}. {{{s_4}}} replies that it doesn't have any preexisting shares
for the storage index, and that it has allocated {{{f_2}}}. {{{s_4}}} is
recorded in {{{self.use_peers}}} as having done this.
There are now no more uncontacted peers, so the {{{Tahoe2PeerSelector}}}
instance will attempt to store a proportional amount of the homeless
shares (see [source:src/allmydata/immutable/upload.py at 4045#L263]) on each
of the remaining servers. {{{s_1, s_2}}}, and {{{s_3}}} will refuse to
accept any of these, and will not be used in any further queries as a
result. {{{s_4}}} will accept all of the shares that
{{{Tahoe2PeerSelector}}} wants to put on it, so it will be re-appended to
the list of peers to consider. On the next pass, {{{_loop}}} will attempt
to store all currently homeless shares on {{{s_4}}}, and will succeed.
{{{Tahoe2PeerSelector}}} now has no more shares to allocate, and executes
the comparison that Zooko mentions. To get {{{servers_with_shares}}},
{{{self._servers_with_shares}}} takes the union of the set of peers that
we're actually uploading shares to and the set of peers that we've
recorded as already having shares. The latter set is built using the
prexisting shares structure that is updated each time a response is
received from {{{s_1}}}, {{{s_2}}}, or {{{s_3}}}: this will never have
more than one server for {{{f_1}}} (or any {{{f_n}}}). The only other
server in this list will be {{{s_4}}, since it has all of the other
shares. {{{_servers_with_shares}}}, then, returns a list of two servers,
and {{{Tahoe2PeerSelector}}} correctly fails, because 2 is less than 3.
One example is obviously not proof that the peer selection algorithm
guards against this in all cases, though, so maybe an additional test (or
revising that one) would be a good idea. Did you have anything in
particular in mind that would be a better indicator?
This is perhaps off-topic, but if I understand the execution of
{{{_got_response}}}, it seems like we don't say that a query has made
progress (or that it is a good query) unless we know that the remote
server has either allocated shares for us, or we learn that it already has
shares that we think are homeless. This means that if I ask a server to
store one share, and it happens to have that share (but no others), its
response isn't considered successful, because the share I asked it to
store isn't in homeless shares (since we popped it/destructively modified
homeless shares when we removed it), and nothing is allocated. Is this
intentional?
(whew, that was a lot longer than I was planning on it being :-/)
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:52>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list