[tahoe-dev] erasure coding makes files more fragile, not less

Eugen Leitl eugen at leitl.org
Wed Mar 28 17:22:52 UTC 2012


On Wed, Mar 28, 2012 at 09:00:56AM -0400, Shawn Willden wrote:

> The arguments make are the basis for the approach I (successfully) pushed
> when we started the VG2 grid:  We demand high uptime from individual

Can you please tell more about the VG2 grid? I clean missed it.

> servers because the math of erasure coding works against you when the
> individual nodes are unreliable, and we ban co-located servers and prefer
> to minimize the number of servers owned and administered by a single person
> in order to ensure greater independence.
> 
> How has that worked out?  Well, it's definitely constrained the growth rate
> of the grid.  We're two years in and still haven't reached 20 nodes.  And

It doesn't surprise me at all, since I've never heard a single squeak
about it in the usual channels. (And I'm moderately well-informed
in such matters).

> although our nodes have relatively high reliability, I'm not sure we've
> actually reached the 95% uptime target -- my node, for example, was down
> for over a month while I moved, and we recently had a couple of outages
> caused by security breaches.
> 
> However, we do now have 15 solid, high-capacity, relatively available (90%,
> at least) nodes that are widely dispersed geographically (one in Russia,
> six in four countries in Europe, seven in six states in the US; not sure
> about the other).  So it's pretty good -- though we do need more nodes.

How large is the total storage capacity? What about introducer nodes, is
there just one?
 
> I can see two things that would make it an order of magnitude better:
>  monitoring and dynamic adjustment of erasure-coding parameters.
> 
> Monitoring is needed both to identify cases where file repairs need to be
> done before they become problematic and to provide the node reliability
> data required to dynamically determine erasure coding parameters.


More information about the tahoe-dev mailing list