[tahoe-dev] Starvation amidst plenty

Shawn Willden shawn at willden.org
Mon Sep 20 16:14:09 UTC 2010


The current situation in the volunteer grid makes me think that it is
probably wise for a grid to establish some parameters for membership,
because fairness alone may not be enough to avoid a lack of room to
store new files, even though there is plenty of grid capacity
available.

Maybe this is immediately obvious to everyone else, but I hadn't
thought about it before:  Given 10 nodes that each provide 10 GB of
storage, and five nodes that each provide 1 TB of storage, and even
with all users playing fair, you will quickly get into a situation
where every small node is full and no one has sufficient diversity to
complete an upload that achieves shares-of-happiness.

With a very large grid (many nodes), a gaussian distribution of node
capacities would be fine.  The small nodes would all fill up quickly,
but there would be plenty of larger nodes to continue accepting
shares.  But Tahoe isn't really optimized for large grids.  I'm not
sure how big the grid has to get before the overhead of all of the
additional queries to place/find shares begin to cause significant
slowdowns, but based on Zooko's reluctance to invite a lot more people
into the volunteer grid (at least, I've perceived such a reluctance),
I suspect that he doesn't want too many more than the couple of dozen
nodes we have now.

Assuming grids need to be kept relatively small in terms of node
count, perhaps it would be wise for grids to establish parameters for
storage provision/consumption as a pre-requisite for membership, to
ensure that all nodes have capacities (and consumption) that are
within about an order of magnitude of one another.  Even that is
probably too much variation.  Perhaps the rule should be something
like "no node is allowed in the grid that provides less than 1% or
more than 10% of the total grid storage capacity".

Hmm.  It should be possible to work out some well-founded values to
replace those pulled-out-of-my-butt numbers, based on assumptions
about shares-of-happiness and node count.  Clearly, a key goal of a
grid has to be that at least SOH servers are non-full until the grid
as a whole is nearly full, so any grid with a group of less-than-SOH
servers that provides the bulk of the grid's storage is headed for
trouble.

Comments?

-- 
Shawn


More information about the tahoe-dev mailing list