[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

Wed Jan 20 08:22:56 PST 2010

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
---------------------------------------+------------------------------------
 Reporter:  zooko                      |           Owner:  warner
     Type:  defect                     |          Status:  new   
 Priority:  critical                   |       Milestone:  1.6.0 
Component:  code-peerselection         |         Version:  1.4.1 
 Keywords:  reliability review-needed  |   Launchpad_bug:        
---------------------------------------+------------------------------------

Comment(by zooko):

 Kevan:

 I've been struggling and struggling to understand the
 {{{servers_of_happiness()}}} function.  The documentation -- that it
 attempts to find a 1-to-1 (a.k.a. "injective") function from servers to
 shares sounds great!  But, despite many attempts, I have yet to understand
 if the code is actually doing the right thing.  (Note: this may well be in
 part my fault for being thick-headed.  Especially these days, when I am
 very sleep-deprived and stressed and busy.  But if we can make a function
 that even I can understand then we'll be golden.)

 So, one thing that occurs to me as I look at this function today is that
 it might help if {{{existing_shares}}} and {{{used_peers}}} had more
 consistent data types and names.  If I understand correctly what they do
 (which is a big 'if' at this point), they could each be a map from
 {{{shareid}}} to {{{serverid}}}, or possibly a map from {{{shareid}}} to a
 set of {{{serverid}}}'s, and their names could be {{{existing_shares}}}
 and {{{planned_shares}}}, and the doc could explain that
 {{{existing_shares}}} describes shares that are already alleged to be
 hosted by servers, and {{{planned_shares}}} describes shares that we are
 currently planning to upload to servers.

 Would that be correct?  It raises the question in my mind as to why
 {{{servers_of_happiness()}}} distinguishes between those two inputs
 instead of just generating its injective function from the union of those
 two inputs.  I suspect that this is because we want to prefer existing
 shares instead of new shares when the two collide (i.e. when uploading a
 new share would be redundant) in the interests of upload efficiency.  Is
 that true?  Perhaps a note to that effect could be added to the
 {{{servers_of_happiness()}}} doc.

 I realize that I have asked so many times for further explanation of
 {{{servers_of_happiness()}}} that it has become comment-heavy.  Oh well!
 If we see ways to make the comments more concise and just as explanatory
 that would be cool, but better too many comments than too little, for this
 particular twisty little core function.  :-)

 Thanks!

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:143>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid