[tahoe-dev] [tahoe-lafs] #778: "shares of happiness" is the wrong measure; "servers of happiness" is better
tahoe-lafs
trac at allmydata.org
Tue Jan 26 17:35:19 PST 2010
#778: "shares of happiness" is the wrong measure; "servers of happiness" is
better
---------------------------------------+------------------------------------
Reporter: zooko | Owner: zooko
Type: defect | Status: assigned
Priority: critical | Milestone: eventually
Component: code-peerselection | Version: 1.4.1
Keywords: reliability review-needed | Launchpad_bug:
---------------------------------------+------------------------------------
Comment(by davidsarah):
I was sure that computing servers-of-happiness must correspond to some
well-understood mathematical problem, but I've now discovered which one --
it's called "maximum bipartite matching".
Since servers are distinct from shares, the relation between servers and
shares corresponds to a [http://en.wikipedia.org/wiki/Bipartite_graph
bipartite graph].
Any bijective function from servers to shares that is included in this
relation is called a ''matching'' of the bipartite graph.
The servers-of-happiness number is the maximum size of any such matching.
Intuitively, that's because a matching gives a set of H servers that have
at least H distinct shares between them.
So, apply a maximum bipartite matching algorithm to the sharemap (to find
''some'' maximum matching), take its size, and you're done.
(Actually, the servers-of-happiness value as we originally defined it is
max(k-1, max_bipartite_matching_size). But the max_bipartite_matching_size
works as a measure of the quality of the server->share mapping even when
it is less than k-1. In that case there won't be enough shares to recover
the file, of course.)
I doubt that the algorithm outlined by Zooko ''always'' computes
max_bipartite_matching_size correctly. That's because it is a greedy
algorithm -- it works by removing edges from the bipartite graph that are
''not'' in some maximum matching, and assumes that it's possible to make a
correct local decision of which edge to remove. There are definitely
greedy algorithms that result in a near-maximum matching, but it won't
always be maximum.
Algorithms to find maximum bipartite matchings are really simple, just a
few lines of code, and have reasonable algorithmic complexity. I will try
to find one and translate it to Python.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/778#comment:162>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list