[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better
tahoe-lafs
trac at allmydata.org
Wed Jan 20 20:03:37 PST 2010
#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
---------------------------------------+------------------------------------
Reporter: zooko | Owner: warner
Type: defect | Status: new
Priority: critical | Milestone: 1.6.0
Component: code-peerselection | Version: 1.4.1
Keywords: reliability review-needed | Launchpad_bug:
---------------------------------------+------------------------------------
Comment(by kevan):
(I'm working on these as I have time -- I usually have a lot to do during
the week)
Replying to [comment:137 zooko]:
> I realized as I was driving home just now that I don't know what the
code will do, after Kevan's behavior.txt patch is applied, when "servers
of happiness" can be achieved only by uploading redundant shares. For
example, tests.txt adds a test in "test_upload.py" named
{{{test_problem_layout_comment_52}}} which creates a server layout like
this:
>
> {{{
> # server 0: shares 1 - 9
> # server 1: share 0
> # server 2: share 0
> # server 3: share 0
> }}}
>
> Where server 0 is read-write and servers 1, 2 and 3 are read-only. (And
by the way Kevin, please make comments state that servers 1, 2 and 3 are
read-only.)
>
> In this scenario (with {{{K == 3}}}) the uploader can't achieve "servers
of happiness" == 4 even though it can immediately see that all {{{M ==
10}}} of the shares are hosted on the grid.
>
> But what about the case that servers 1, 2 and 3 were still able to
accept new shares? Then our uploader could either abort and say "servers
of happiness couldn't be satisfied", due to the fact that it can't achieve
"servers of happiness" without uploading redundant copies of shares that
are already on the grid, or it could succeed by uploading a new copy of
shares 2 and 3.
>
> We should have a test for this case. If our uploader gives up in this
case then we should assert that the uploader gives up with a reasonable
error message and without wasting bandwidth by uploading shares. If it
proceeds in this case then we should assert that it succeeds and that it
doesn't upload more shares than it has to (which is two in this case).
There is a test for this (or something very like this) in
test_problem_layouts_comment_53:
{{{
# Try the same thing, but with empty servers after the first one
# We want to make sure that Tahoe2PeerSelector will redistribute
# shares as necessary, not simply discover an existing layout.
# The layout is:
# server 2: shares 0 - 9
# server 3: empty
# server 1: empty
# server 4: empty
d.addCallback(_change_basedir)
d.addCallback(lambda ign:
self._setup_and_upload())
d.addCallback(lambda ign:
self._add_server(server_number=2))
d.addCallback(lambda ign:
self._add_server(server_number=3))
d.addCallback(lambda ign:
self._add_server(server_number=1))
d.addCallback(_copy_shares)
d.addCallback(lambda ign:
self.g.remove_server(self.g.servers_by_number[0].my_nodeid))
d.addCallback(lambda ign:
self._add_server(server_number=4))
d.addCallback(_reset_encoding_parameters)
d.addCallback(lambda client:
client.upload(upload.Data("data" * 10000, convergence="")))
return d
}}}
Note that this is slightly different than your case, in that the other
servers have no shares at all. So the correct number of shares for the
encoder to push is 3, not 2. I didn't have the assertion in there, though,
so I'll go ahead and attach a patch where the assertion is there. This
also uncovered a bug in {{{should_add_server}}}, in which
{{{should_add_server}}} would not approve of adding unknown shares to the
{{{existing_shares}}} dict if they were on a server that was already in
{{{existing_shares}}}. I've fixed this, and added a test for it.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:144>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list