[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

Wed Jan 20 20:03:37 PST 2010

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
---------------------------------------+------------------------------------
 Reporter:  zooko                      |           Owner:  warner
     Type:  defect                     |          Status:  new   
 Priority:  critical                   |       Milestone:  1.6.0 
Component:  code-peerselection         |         Version:  1.4.1 
 Keywords:  reliability review-needed  |   Launchpad_bug:        
---------------------------------------+------------------------------------

Comment(by kevan):

 (I'm working on these as I have time --  I usually have a lot to do during
 the week)

 Replying to [comment:137 zooko]:
 > I realized as I was driving home just now that I don't know what the
 code will do, after Kevan's behavior.txt patch is applied, when "servers
 of happiness" can be achieved only by uploading redundant shares.  For
 example, tests.txt adds a test in "test_upload.py" named
 {{{test_problem_layout_comment_52}}} which creates a server layout like
 this:
 >
 > {{{
 >         # server 0: shares 1 - 9
 >         # server 1: share 0
 >         # server 2: share 0
 >         # server 3: share 0
 > }}}
 >
 > Where server 0 is read-write and servers 1, 2 and 3 are read-only. (And
 by the way Kevin, please make comments state that servers 1, 2 and 3 are
 read-only.)
 >
 > In this scenario (with {{{K == 3}}}) the uploader can't achieve "servers
 of happiness" == 4 even though it can immediately see that all {{{M ==
 10}}} of the shares are hosted on the grid.
 >
 > But what about the case that servers 1, 2 and 3 were still able to
 accept new shares?  Then our uploader could either abort and say "servers
 of happiness couldn't be satisfied", due to the fact that it can't achieve
 "servers of happiness" without uploading redundant copies of shares that
 are already on the grid, or it could succeed by uploading a new copy of
 shares 2 and 3.
 >
 > We should have a test for this case.  If our uploader gives up in this
 case then we should assert that the uploader gives up with a reasonable
 error message and without wasting bandwidth by uploading shares.  If it
 proceeds in this case then we should assert that it succeeds and that it
 doesn't upload more shares than it has to (which is two in this case).

 There is a test for this (or something very like this) in
 test_problem_layouts_comment_53:

 {{{
         # Try the same thing, but with empty servers after the first one
         # We want to make sure that Tahoe2PeerSelector will redistribute
         # shares as necessary, not simply discover an existing layout.
         # The layout is:
         # server 2: shares 0 - 9
         # server 3: empty
         # server 1: empty
         # server 4: empty
         d.addCallback(_change_basedir)
         d.addCallback(lambda ign:
             self._setup_and_upload())
         d.addCallback(lambda ign:
             self._add_server(server_number=2))
         d.addCallback(lambda ign:
             self._add_server(server_number=3))
         d.addCallback(lambda ign:
             self._add_server(server_number=1))
         d.addCallback(_copy_shares)
         d.addCallback(lambda ign:
             self.g.remove_server(self.g.servers_by_number[0].my_nodeid))
         d.addCallback(lambda ign:
             self._add_server(server_number=4))
         d.addCallback(_reset_encoding_parameters)
         d.addCallback(lambda client:
             client.upload(upload.Data("data" * 10000, convergence="")))
         return d
 }}}

 Note that this is slightly different than your case, in that the other
 servers have no shares at all. So the correct number of shares for the
 encoder to push is 3, not 2. I didn't have the assertion in there, though,
 so I'll go ahead and attach a patch where the assertion is there. This
 also uncovered a bug in {{{should_add_server}}}, in which
 {{{should_add_server}}} would not approve of adding unknown shares to the
 {{{existing_shares}}} dict if they were on a server that was already in
 {{{existing_shares}}}. I've fixed this, and added a test for it.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:144>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid