[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better

Tue Jan 26 17:35:19 PST 2010

#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
---------------------------------------+------------------------------------
 Reporter:  zooko                      |           Owner:  zooko     
     Type:  defect                     |          Status:  assigned  
 Priority:  critical                   |       Milestone:  eventually
Component:  code-peerselection         |         Version:  1.4.1     
 Keywords:  reliability review-needed  |   Launchpad_bug:            
---------------------------------------+------------------------------------

Comment(by davidsarah):

 I was sure that computing servers-of-happiness must correspond to some
 well-understood mathematical problem, but I've now discovered which one --
 it's called "maximum bipartite matching".

 Since servers are distinct from shares, the relation between servers and
 shares corresponds to a [http://en.wikipedia.org/wiki/Bipartite_graph
 bipartite graph].

 Any bijective function from servers to shares that is included in this
 relation is called a ''matching'' of the bipartite graph.

 The servers-of-happiness number is the maximum size of any such matching.
 Intuitively, that's because a matching gives a set of H servers that have
 at least H distinct shares between them.

 So, apply a maximum bipartite matching algorithm to the sharemap (to find
 ''some'' maximum matching), take its size, and you're done.

 (Actually, the servers-of-happiness value as we originally defined it is
 max(k-1, max_bipartite_matching_size). But the max_bipartite_matching_size
 works as a measure of the quality of the server->share mapping even when
 it is less than k-1. In that case there won't be enough shares to recover
 the file, of course.)

 I doubt that the algorithm outlined by Zooko ''always'' computes
 max_bipartite_matching_size correctly. That's because it is a greedy
 algorithm -- it works by removing edges from the bipartite graph that are
 ''not'' in some maximum matching, and assumes that it's possible to make a
 correct local decision of which edge to remove. There are definitely
 greedy algorithms that result in a near-maximum matching, but it won't
 always be maximum.

 Algorithms to find maximum bipartite matchings are really simple, just a
 few lines of code, and have reasonable algorithmic complexity. I will try
 to find one and translate it to Python.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:162>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid