[volunteergrid2-l] Why storage capacity variation should be small

Jody Harris jharris at harrisdev.com
Sat Jan 15 20:28:06 UTC 2011


Shawn,

I'm looking for a grid that can host not less than 500 GB per owner. I am
currently only contributing 1TB, so my max allowable based on that is ~300
GB, but I can easily add more nodes or storage..

jody
----
- Think carefully.


On Sat, Jan 15, 2011 at 12:25 PM, Shawn Willden <shawn at willden.org> wrote:

> At first glance, it doesn't appear that there's any problem with allowing
> both large and small nodes in the same grid, as long as the owners of the
> nodes "play fair", meaning the small node owner only consumes capacity less
> than or equal to the capacity he provides (including consideration of
> encoding expansion).
>
> The problem arises because the large contributor also expects to be able to
> consume a large amount of storage -- it's only fair, right?  But what
> happens is that all of the small nodes quickly get filled up leaving only
> the large nodes with any capacity, and if there aren't enough of them to
> achieve good dispersion, the grid is effectively full.
>
> Consider an extreme example:  10 nodes, nine of which provide 10 GB and one
> which provides 1 TB.  The total storage in the grid is 1090 GB, but as soon
> as 100 GB has been uploaded, all of the small servers are full.  The 1 TB
> server still has 990 GB available, but it's unusable by anyone who actually
> wants the reliability benefits of distributing their data.  So the true
> capacity of this grid is only 100 GB, and the additional 990 GB on the large
> node offers _no value_ to the grid -- but the owner of that node may well
> have been the one who filled the grid, believing that was fair for him to do
> so.
>
> In general, if H is the "servers-of-happiness" setting, the grid becomes
> effectively full as soon as all but the largest H-1 servers are full.  So to
> determine the actual capacity of the grid, take the H-1 largest servers and
> assume they provide the same amount of storage as the Hth-largest server,
> then sum.  Obviously, the "fullness" of a grid with wide capacity variation
> will depend on what you choose for H.
>
> For a while, volunteergrid #1 was in the state that it was "full" for
> anyone with H>5, even though there were terabytes of free storage, and even
> though there were nearly 20 servers in the grid.  It has now been fixed, but
> I don't have much confidence it will stay fixed, because some of the nodes
> that became non-full don't actually have very much storage available.
>
> A related point is that there is a lot of value in setting K, H and N to be
> significantly larger values than the defaults.  I'll touch more on that in
> my post about why we should institute an uptime commitment, but it's
> relevant here just because wide variation in capacity means you have to
> start reducing H to less than optimal values, which forces you to reduce K
> as well.  Optimally, you really want to set H to be nearly S (the number of
> nodes in the grid), so that's bad.
>
> I can think of two ways to avoid this problem.
>
> 1.  Allow nodes of any capacity, but institute a limit on how much any node
> operator can upload to the grid, in addition to the "fairness" rule.
>  Specifically, compute total grid capacity as defined above (picking a
> generous H, and I suggest H should be around 3/4 of S), divide that by the
> number of nodes and specify that no one is allowed to consume more than that
> amount of storage, no matter how much they provide.  This will ensure the
> grid never fills up before anyone has reached their fair share, but it's
> kind of complicated, and it means that a few very small nodes will impose an
> artificially-low limit on maximum usage.
>
> 2.  Keep all nodes pretty close to the same capacity -- maybe limited to no
> more than 3x between largest and smallest.
>
> Hmm.  I started out typing this thinking I wanted to recommend option 2.
>  Now I'm thinking that maybe option 1 is better.  It's not that complicated
> to compute and it means we don't have to place as many artificial
> restrictions on contributed nodes.
>
> Obviously, both options will require some discussion/negotiation on the
> list regarding minimum node capacities.  If we end up with a max-usage value
> of 10 GB, for example, then this grid isn't useful to me (though I'll
> happily contribute 10 GB anyway).  Ideally, I want a grid with a max-usage
> value of at least 500 GB, though that may not be achievable.
>
> Thoughts?
>
> --
> Shawn
>
> _______________________________________________
> volunteergrid2-l mailing list
> volunteergrid2-l at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/volunteergrid2-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/volunteergrid2-l/attachments/20110115/067f50df/attachment-0001.html>


More information about the volunteergrid2-l mailing list