[volunteergrid2-l] Recommended settings
Peter Secor
secorp at gmail.com
Thu Jun 30 11:21:49 PDT 2011
Shawn - good overall summary, just a couple clarifications on the
"Allmydata problem" from my perspective.
The main issue was that there came a point after the funding stopped
where we could no longer afford to replace broken hardware.
Unfortunately, because the directory nodes were stored with the same
parameters (N=10, K=3), most directory structures quickly became
difficult to traverse, and the grid ceased to be useful for data
recovery quite quickly. An interesting follow-on issue is that even
after we recovered the servers from the colo, we realized that we'd have
to bring them nearly all up at the same time to provide easy-to-traverse
access to the grid, and finding time/space/power to do this took awhile.
The old production grid is mostly back up now in an undisclosed location
(temporarily, in my garage), but still has a few hardware problems that
we're working to repair so that everybody who needs to can recover their
data and then we can shutdown the grid and sell the hardware.
I hope this helps inform future decisions, and feel free to ask me any
follow-on questions.
Peter
On 6/30/11 9:25 AM, Shawn Willden wrote:
> On Wed, Jun 29, 2011 at 11:36 PM, Brad Rupp <bradrupp at gmail.com
> <mailto:bradrupp at gmail.com>> wrote:
>
> On 6/29/2011 5:12 PM, Shawn Willden wrote:
>
> which would have prevented the worst of the allmydata problem.
>
>
> What was the allmydata problem? The reason I ask is that I don't
> want to have a "problem" with the data in VG2.
>
>
> Allmydata.com was a commercial venture selling cloud storage using
> Tahoe-LAFS, and was the company that funded all of the initial Tahoe
> development. There was (is, actually, though it needs maintenance) a
> fairly nice Windows client that provided Dropbox-like functionality. On
> the backend, allmydata had a large number of servers hosted in a couple
> of data centers. They had hundreds of nodes in their grid by the end.
> They were using the default encoding parameters, N=10, K=3. I'm not
> sure if H existed at the time, but it wasn't really relevant, because
> there were always more than 10 nodes available.
>
> As they scaled up and added more nodes, they began to suffer from more
> hardware failures, as is inevitable. If you have hundreds of machines,
> a few are going to be broken at any given time. In many cases they
> still hold data, but if they're down it's unavailable. So whenever at
> eight or more machines were down, any files that happened to have shares
> on the unavailable nodes became unavailable. Given that allmydata was
> hosting billions (trillions?) of files for thousands (tens of
> thousands?) of people with shares spread across all those machines, any
> random set of 8 nodes probably all held shares for at least one file. I
> think this may have been compounded by some repairer bugs (my memory is
> hazy, and I don't think allmydata ever provided a complete post-mortem
> anyway).
>
> What made this really problematic was that some of those unavailable
> files were dirnodes. There are a lot of ways to handle your Tahoe
> storage, but the most common is to have a single directory tree per
> user. If the dirnode storing a user's root directory is unavailable,
> then _all_ of that user's files are unavailable. By design, without the
> cap for a file, it's not even possible to find the file data, and you
> couldn't decrypt it if you did. Without the dirnode which stores the
> caps, the files in the directory are essentially _gone_, even if all of
> the bits are present.
>
> Most directory trees also end up being fairly deep, and losing any
> dirnode in the chain means that every directory and every file below
> that dirnode is gone.
>
> The net result was that allmydata's users increasingly couldn't retrieve
> their data. I suspect that the number of actual cases was quite small,
> but if you put yourself in such a user's shoes you can imagine how angry
> you'd be. Not only is your precious data gone (even if allmydata says
> they'll have it back Real Soon Now), but this is a commercial service
> that advertised extreme reliability, for which you've paid good money.
> Not a lot of money, but enough that you really think you deserve what
> you paid for. What do you do? Complain on every forum you can reach,
> of course.
>
> I think allmydata's business was running very close to the edge
> financially anyway -- that's pure speculation, but strongly supported by
> the way they were letting their technical staff go even as the staff was
> fighting increasing technical problems. I'm sure that funding issues
> played a big role in their inability to keep machines up and running,
> and I suspect that the problems may have been exacerbated by the fact
> that it was likely the older machines that were failing first, the
> machines that were in the grid when it was smaller and therefore
> received more shares of the first files users created. The first file a
> user creates is their root dirnode.
>
> All of this created a death spiral. Worsening financial exacerbated
> technical difficulties. Technical difficulties worsened client
> relations. Failing client relations further lowered revenues.
> Eventually the company failed.
>
> The part of the story that's relevant to us is the nature of the
> technical difficulties. If N is much smaller than T, then it becomes
> not just possible but _likely_ that a small fraction of the nodes being
> down makes some files unavailable.
>
> From a statistical perspective, if you have T >> N, then the standard
> model accurately calculates the reliability of any given file, but if
> you have per-file reliability of r and n files, the probability that
> _all_ of your files are available (which I call "total reliability")
> approaches r**n. Even if r is something like 99.999%, as n gets big
> r**n can get unacceptably small. T has to be much larger than n before
> the total reliability really approaches r**n. The closer N is to T the
> closer total reliability stays to r.
>
> If you think a little about that analysis, you'll see there are two
> solutions. One solution to this problem is to make sure that however
> many nodes there are, the shares for each file are on nearly all of
> them. Ideally, if all your files have shares on every node on the grid
> then all of your files live or die together -- and with probability r,
> directly derived from server reliabilities, K and N. And, note that
> with this solution dirnodes don't need higher reliability because
> they're exactly as likely to go away as individual files.
>
> Another solution is to increase r to apparently-insane levels. Like 1 -
> 1E-15, or even higher. This ensures that r**n stays acceptably small as
> n grows. How do you increase r? By increasing N. Simply increasing N
> without adjusting K increases N/K -- the expansion factor. But it turns
> out that if you increase N and K together, you can get higher
> reliability with lower expansion factor. The erasure coding becomes
> more effective at maximizing reliability while minimizing expansion as
> you increase N. At some point network overheads become problematic, but
> if you ignore that, bigger N is always better.
>
> If you allow T >> N and try to ramp up r, it also makes sense to
> increase r for dirnodes even more, because of the "link in a chain"
> effect that if you break a dirnode everything reachable from it is lost.
>
> So, from a mathematical perspective, allmydata's problem was that they
> didn't model that exponentially-decreasing total reliability as the
> number of files grew, and they didn't consider the impact on reliability
> of the linking effect of dirnodes.
>
>
> --
> Shawn
>
>
>
> _______________________________________________
> volunteergrid2-l mailing list
> volunteergrid2-l at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/volunteergrid2-l
> http://bigpig.org/twiki/bin/view/Main/WebHome
More information about the volunteergrid2-l
mailing list