[tahoe-dev] erasure coding makes files more fragile, not less

Fri Mar 30 21:33:58 UTC 2012

On Mar 28, 2012 1:22 PM, "Eugen Leitl" <eugen at leitl.org> wrote:
>
> On Wed, Mar 28, 2012 at 09:00:56AM -0400, Shawn Willden wrote:
>
> > The arguments make are the basis for the approach I (successfully)
pushed
> > when we started the VG2 grid:  We demand high uptime from individual
>
> Can you please tell more about the VG2 grid? I clean missed it.

(Sorry I'm slow to respond -- I'm vacationing with my family and often have
better things to do than read email :-) ).

Volunteer Grid 2 is a Tahoe grid composed of volunteers all over the world.
Learning from some problems that the first volunteer grid had, I suggested
to early members that VG2 establish some clear and somewhat restrictive
policies in order to ensure that the grid was useful for system backups.

Two specific backup-driven requirements we had were high
reliability/availability and relatively high capacity.  To that end, we
established a 95% nominal up time requirement and a 500 GB minimum node
capacity requirement.  We also avoid co-located nodes and disallow usage of
more than min(storage_provided, 1 TB).  The limit on usage is to avoid
having one user deploy, say, 10 TB and then try to consume that much from
the grid, swamping the rest of the servers.

> > servers because the math of erasure coding works against you when the
> > individual nodes are unreliable, and we ban co-located servers and
prefer
> > to minimize the number of servers owned and administered by a single
person
> > in order to ensure greater independence.
> >
> > How has that worked out?  Well, it's definitely constrained the growth
rate
> > of the grid.  We're two years in and still haven't reached 20 nodes.
 And
>
> It doesn't surprise me at all, since I've never heard a single squeak
> about it in the usual channels. (And I'm moderately well-informed
> in such matters).

I'm surprised.   It was definitely announced here when it was created, and
discussed occasionally since.

> > although our nodes have relatively high reliability, I'm not sure we've
> > actually reached the 95% uptime target -- my node, for example, was down
> > for over a month while I moved, and we recently had a couple of outages
> > caused by security breaches.
> >
> > However, we do now have 15 solid, high-capacity, relatively available
(90%,
> > at least) nodes that are widely dispersed geographically (one in Russia,
> > six in four countries in Europe, seven in six states in the US; not sure
> > about the other).  So it's pretty good -- though we do need more nodes.
>
> How large is the total storage capacity? What about introducer nodes, is
> there just one?

Total storage capacity, as reported by the stats gatherer, is around 14 TB.
 That's disk used (~6 TB) plus disk available (~8 TB).  As near as I can
tell by eyeballing the graph and summing my estimates is that consumption
grows by about 40 GB per day.  We have a helper but it's lightly used.
 Only one introducer.  The node on the slowest network connection has about
1 Mbps of bandwidth, two or three nodes are on gigabit links, most are 6-50
Mbps, IIRC.  Hardware is similarly varied, with the low end being a small
NAS box, the high end being some fairly powerful servers in data centers,
and everything in between including some virtual servers.

Upload performance, as measured from my machine (which has a 50 Mbps up,
100 Mbps down connection), averages about about 300 KBps, before erasure
coding, so with my settings I get around 100 KBps net upload rate.  I
haven't done any download tests recently, but in the past they've been
approximately the same as upload speeds, but without the erasure coding
penalty.

-- 

Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120330/befc4859/attachment.html>