<p><br>

On Mar 28, 2012 1:22 PM, "Eugen Leitl" <<a href="mailto:eugen@leitl.org" target="_blank">eugen@leitl.org</a>> wrote:<br>

><br>

> On Wed, Mar 28, 2012 at 09:00:56AM -0400, Shawn Willden wrote:<br>

><br>

> > The arguments make are the basis for the approach I (successfully) pushed<br>

> > when we started the VG2 grid:  We demand high uptime from individual<br>

><br>

> Can you please tell more about the VG2 grid? I clean missed it.</p>

<p>(Sorry I'm slow to respond -- I'm vacationing with my family and often have better things to do than read email :-) ).</p>

<p>Volunteer Grid 2 is a Tahoe grid composed of volunteers all over the world. Learning from some problems that the first volunteer grid had, I suggested to early members that VG2 establish some clear and somewhat restrictive policies in order to ensure that the grid was useful for system backups. </p>


<p>Two specific backup-driven requirements we had were high reliability/availability and relatively high capacity.  To that end, we established a 95% nominal up time requirement and a 500 GB minimum node capacity requirement.  We also avoid co-located nodes and disallow usage of more than min(storage_provided, 1 TB).  The limit on usage is to avoid having one user deploy, say, 10 TB and then try to consume that much from the grid, swamping the rest of the servers.<br>


</p>

<p>> > servers because the math of erasure coding works against you when the<br>

> > individual nodes are unreliable, and we ban co-located servers and prefer<br>

> > to minimize the number of servers owned and administered by a single person<br>

> > in order to ensure greater independence.<br>

> ><br>

> > How has that worked out?  Well, it's definitely constrained the growth rate<br>

> > of the grid.  We're two years in and still haven't reached 20 nodes.  And<br>

><br>

> It doesn't surprise me at all, since I've never heard a single squeak<br>

> about it in the usual channels. (And I'm moderately well-informed<br>

> in such matters).</p><p>I'm surprised.   It was definitely announced here when it was created, and discussed occasionally since.</p><p>

> > although our nodes have relatively high reliability, I'm not sure we've<br>

> > actually reached the 95% uptime target -- my node, for example, was down<br>

> > for over a month while I moved, and we recently had a couple of outages<br>

> > caused by security breaches.<br>

> ><br>

> > However, we do now have 15 solid, high-capacity, relatively available (90%,<br>

> > at least) nodes that are widely dispersed geographically (one in Russia,<br>

> > six in four countries in Europe, seven in six states in the US; not sure<br>

> > about the other).  So it's pretty good -- though we do need more nodes.<br>

><br>

> How large is the total storage capacity? What about introducer nodes, is<br>

> there just one?</p><p>Total storage capacity, as reported by the stats gatherer, is around 14 TB.  That's disk used (~6 TB) plus disk available (~8 TB).  As near as I can tell by eyeballing the graph and summing my estimates is that consumption grows by about 40 GB per day.  We have a helper but it's lightly used.  Only one introducer.  The node on the slowest network connection has about 1 Mbps of bandwidth, two or three nodes are on gigabit links, most are 6-50 Mbps, IIRC.  Hardware is similarly varied, with the low end being a small NAS box, the high end being some fairly powerful servers in data centers, and everything in between including some virtual servers.</p>

<p>Upload performance, as measured from my machine (which has a 50 Mbps up, 100 Mbps down connection), averages about about 300 KBps, before erasure coding, so with my settings I get around 100 KBps net upload rate.  I haven't done any download tests recently, but in the past they've been approximately the same as upload speeds, but without the erasure coding penalty.</p>

<p>-- </p><p>Shawn</p>