[volunteergrid2-l] Why storage capacity variation should be small

Sat Jan 15 19:25:51 UTC 2011

At first glance, it doesn't appear that there's any problem with allowing
both large and small nodes in the same grid, as long as the owners of the
nodes "play fair", meaning the small node owner only consumes capacity less
than or equal to the capacity he provides (including consideration of
encoding expansion).

The problem arises because the large contributor also expects to be able to
consume a large amount of storage -- it's only fair, right?  But what
happens is that all of the small nodes quickly get filled up leaving only
the large nodes with any capacity, and if there aren't enough of them to
achieve good dispersion, the grid is effectively full.

Consider an extreme example:  10 nodes, nine of which provide 10 GB and one
which provides 1 TB.  The total storage in the grid is 1090 GB, but as soon
as 100 GB has been uploaded, all of the small servers are full.  The 1 TB
server still has 990 GB available, but it's unusable by anyone who actually
wants the reliability benefits of distributing their data.  So the true
capacity of this grid is only 100 GB, and the additional 990 GB on the large
node offers _no value_ to the grid -- but the owner of that node may well
have been the one who filled the grid, believing that was fair for him to do
so.

In general, if H is the "servers-of-happiness" setting, the grid becomes
effectively full as soon as all but the largest H-1 servers are full.  So to
determine the actual capacity of the grid, take the H-1 largest servers and
assume they provide the same amount of storage as the Hth-largest server,
then sum.  Obviously, the "fullness" of a grid with wide capacity variation
will depend on what you choose for H.

For a while, volunteergrid #1 was in the state that it was "full" for anyone
with H>5, even though there were terabytes of free storage, and even though
there were nearly 20 servers in the grid.  It has now been fixed, but I
don't have much confidence it will stay fixed, because some of the nodes
that became non-full don't actually have very much storage available.

A related point is that there is a lot of value in setting K, H and N to be
significantly larger values than the defaults.  I'll touch more on that in
my post about why we should institute an uptime commitment, but it's
relevant here just because wide variation in capacity means you have to
start reducing H to less than optimal values, which forces you to reduce K
as well.  Optimally, you really want to set H to be nearly S (the number of
nodes in the grid), so that's bad.

I can think of two ways to avoid this problem.

1.  Allow nodes of any capacity, but institute a limit on how much any node
operator can upload to the grid, in addition to the "fairness" rule.
 Specifically, compute total grid capacity as defined above (picking a
generous H, and I suggest H should be around 3/4 of S), divide that by the
number of nodes and specify that no one is allowed to consume more than that
amount of storage, no matter how much they provide.  This will ensure the
grid never fills up before anyone has reached their fair share, but it's
kind of complicated, and it means that a few very small nodes will impose an
artificially-low limit on maximum usage.

2.  Keep all nodes pretty close to the same capacity -- maybe limited to no
more than 3x between largest and smallest.

Hmm.  I started out typing this thinking I wanted to recommend option 2.
 Now I'm thinking that maybe option 1 is better.  It's not that complicated
to compute and it means we don't have to place as many artificial
restrictions on contributed nodes.

Obviously, both options will require some discussion/negotiation on the list
regarding minimum node capacities.  If we end up with a max-usage value of
10 GB, for example, then this grid isn't useful to me (though I'll happily
contribute 10 GB anyway).  Ideally, I want a grid with a max-usage value of
at least 500 GB, though that may not be achievable.

Thoughts?

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/volunteergrid2-l/attachments/20110115/525eb2d9/attachment.html>