[volunteergrid2-l] Recommended settings

Peter Secor secorp at gmail.com
Thu Jun 30 11:21:49 PDT 2011


Shawn - good overall summary, just a couple clarifications on the 
"Allmydata problem" from my perspective.

The main issue was that there came a point after the funding stopped 
where we could no longer afford to replace broken hardware. 
Unfortunately, because the directory nodes were stored with the same 
parameters (N=10, K=3), most directory structures quickly became 
difficult to traverse, and the grid ceased to be useful for data 
recovery quite quickly. An interesting follow-on issue is that even 
after we recovered the servers from the colo, we realized that we'd have 
to bring them nearly all up at the same time to provide easy-to-traverse 
access to the grid, and finding time/space/power to do this took awhile. 
The old production grid is mostly back up now in an undisclosed location 
(temporarily, in my garage), but still has a few hardware problems that 
we're working to repair so that everybody who needs to can recover their 
data and then we can shutdown the grid and sell the hardware.

I hope this helps inform future decisions, and feel free to ask me any 
follow-on questions.
Peter

On 6/30/11 9:25 AM, Shawn Willden wrote:
> On Wed, Jun 29, 2011 at 11:36 PM, Brad Rupp <bradrupp at gmail.com
> <mailto:bradrupp at gmail.com>> wrote:
>
>     On 6/29/2011 5:12 PM, Shawn Willden wrote:
>
>         which would have prevented the worst of the allmydata problem.
>
>
>     What was the allmydata problem?  The reason I ask is that I don't
>     want to have a "problem" with the data in VG2.
>
>
> Allmydata.com was a commercial venture selling cloud storage using
> Tahoe-LAFS, and was the company that funded all of the initial Tahoe
> development.  There was (is, actually, though it needs maintenance) a
> fairly nice Windows client that provided Dropbox-like functionality. On
> the backend, allmydata had a large number of servers hosted in a couple
> of data centers.  They had hundreds of nodes in their grid by the end.
>   They were using the default encoding parameters, N=10, K=3.  I'm not
> sure if H existed at the time, but it wasn't really relevant, because
> there were always more than 10 nodes available.
>
> As they scaled up and added more nodes, they began to suffer from more
> hardware failures, as is inevitable.  If you have hundreds of machines,
> a few are going to be broken at any given time.  In many cases they
> still hold data, but if they're down it's unavailable.  So whenever at
> eight or more machines were down, any files that happened to have shares
> on the unavailable nodes became unavailable.  Given that allmydata was
> hosting billions (trillions?) of files for thousands (tens of
> thousands?) of people with shares spread across all those machines, any
> random set of 8 nodes probably all held shares for at least one file.  I
> think this may have been compounded by some repairer bugs (my memory is
> hazy, and I don't think allmydata ever provided a complete post-mortem
> anyway).
>
> What made this really problematic was that some of those unavailable
> files were dirnodes.  There are a lot of ways to handle your Tahoe
> storage, but the most common is to have a single directory tree per
> user.  If the dirnode storing a user's root directory is unavailable,
> then _all_ of that user's files are unavailable.  By design, without the
> cap for a file, it's not even possible to find the file data, and you
> couldn't decrypt it if you did.  Without the dirnode which stores the
> caps, the files in the directory are essentially _gone_, even if all of
> the bits are present.
>
> Most directory trees also end up being fairly deep, and losing any
> dirnode in the chain means that every directory and every file below
> that dirnode is gone.
>
> The net result was that allmydata's users increasingly couldn't retrieve
> their data.  I suspect that the number of actual cases was quite small,
> but if you put yourself in such a user's shoes you can imagine how angry
> you'd be.  Not only is your precious data gone (even if allmydata says
> they'll have it back Real Soon Now), but this is a commercial service
> that advertised extreme reliability, for which you've paid good money.
>   Not a lot of money, but enough that you really think you deserve what
> you paid for.  What do you do?  Complain on every forum you can reach,
> of course.
>
> I think allmydata's business was running very close to the edge
> financially anyway -- that's pure speculation, but strongly supported by
> the way they were letting their technical staff go even as the staff was
> fighting increasing technical problems.  I'm sure that funding issues
> played a big role in their inability to keep machines up and running,
> and I suspect that the problems may have been exacerbated by the fact
> that it was likely the older machines that were failing first, the
> machines that were in the grid when it was smaller and therefore
> received more shares of the first files users created.  The first file a
> user creates is their root dirnode.
>
> All of this created a death spiral.  Worsening financial exacerbated
> technical difficulties.  Technical difficulties worsened client
> relations.  Failing client relations further lowered revenues.
>   Eventually the company failed.
>
> The part of the story that's relevant to us is the nature of the
> technical difficulties.  If N is much smaller than T, then it becomes
> not just possible but _likely_ that a small fraction of the nodes being
> down makes some files unavailable.
>
>  From a statistical perspective, if you have T >> N, then the standard
> model accurately calculates the reliability of any given file, but if
> you have per-file reliability of r and n files, the probability that
> _all_ of your files are available (which I call "total reliability")
> approaches r**n.  Even if r is something like 99.999%, as n gets big
> r**n can get unacceptably small.  T has to be much larger than n before
> the total reliability really approaches r**n.  The closer N is to T the
> closer total reliability stays to r.
>
> If you think a little about that analysis, you'll see there are two
> solutions.  One solution to this problem is to make sure that however
> many nodes there are, the shares for each file are on nearly all of
> them.  Ideally, if all your files have shares on every node on the grid
> then all of your files live or die together -- and with probability r,
> directly derived from server reliabilities, K and N.  And, note that
> with this solution dirnodes don't need higher reliability because
> they're exactly as likely to go away as individual files.
>
> Another solution is to increase r to apparently-insane levels.  Like 1 -
> 1E-15, or even higher.  This ensures that r**n stays acceptably small as
> n grows.  How do you increase r?  By increasing N.  Simply increasing N
> without adjusting K increases N/K -- the expansion factor.  But it turns
> out that if you increase N and K together, you can get higher
> reliability with lower expansion factor.  The erasure coding becomes
> more effective at maximizing reliability while minimizing expansion as
> you increase N.  At some point network overheads become problematic, but
> if you ignore that, bigger N is always better.
>
> If you allow T >> N and try to ramp up r, it also makes sense to
> increase r for dirnodes even more, because of the "link in a chain"
> effect that if you break a dirnode everything reachable from it is lost.
>
> So, from a mathematical perspective, allmydata's problem was that they
> didn't model that exponentially-decreasing total reliability as the
> number of files grew, and they didn't consider the impact on reliability
> of the linking effect of dirnodes.
>
>
> --
> Shawn
>
>
>
> _______________________________________________
> volunteergrid2-l mailing list
> volunteergrid2-l at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/volunteergrid2-l
> http://bigpig.org/twiki/bin/view/Main/WebHome


More information about the volunteergrid2-l mailing list