[tahoe-dev] [tahoe-lafs] #1252: use different encoding parameters for dirnodes than for files
tahoe-lafs
trac at tahoe-lafs.org
Thu Jan 6 22:09:53 UTC 2011
#1252: use different encoding parameters for dirnodes than for files
-------------------------------+--------------------------------------------
Reporter: davidsarah | Owner: davidsarah
Type: defect | Status: assigned
Priority: major | Milestone: 1.9.0
Component: code-frontend | Version: 1.8.0
Resolution: | Keywords: preservation availability dirnodes anti-censorship
Launchpad Bug: |
-------------------------------+--------------------------------------------
Comment (by swillden):
Replying to [comment:5 warner]:
> I'll also throw in a +0 for Zooko's deeper message, which perhaps he
didn't
> state explicitly this particular time, which is that our P(recover_node)
> probability is already above the it-makes-sense-to-think-about-it-
further
> threshold: the notion that unmodeled real-world failures are way more
likely
> than the nice-clean-(artificial) modeled
> all-servers-randomly-independently-fail-simultaneously failures. Once
your
> P(failure) drops below 10^5^ or something, any further modeling is just
an
> act of self-indulgent mathematics.
I have to disagree with this, both with Zooko's more generic message and
your formulation of it.
Tahoe-LAFS files do NOT have reliabilities above the it-makes-sense-to-
think-about-it level. In fact, for some deployment models, Tahoe-LAFS
default encoding parameters provide insufficiently reliable for practical
real-world needs, even ignoring extra-model events.
This fact was amply demonstrated by the problems observed at
Allmydata.com. Individual file reliabilities may appear astronomical, but
it isn't individual file reliabilities that matter. We're going to be
unhappy if ANY files are lost.
When the number of shares N is much smaller than the number of servers in
the grid (as was the case at allmydata.com) then failure of a relatively
tiny number of servers will destroy files with shares on all of those
servers. Given a large enough server set, and enough files, it becomes
reasonable to treat each file's survivability as independent and multiply
them all to compute the probability of acceptable file system performance
-- which means that the probability of the user perceiving a failure isn't
just p^d^, it's (roughly) p^t^, where t is the total number of files the
user has stored. A x^10^ factor is one thing, but allmydata.com was
facing a factor more like x^1,000^ or x^10,000^ on a per-user basis, and
an exponent of many millions (billions?) for the whole system.
Given a grid of 10 servers, what is the probability that 8 of them will be
down at one time? What about a grid of 200 servers? This is the factor
that kicked allmydata.com's butt, and it wasn't any sort of black swan.
I'm not arguing that black swans don't happen, I'm arguing that the model
say grids like allmydata.com's have inadequate reliability using 3-of-10
encoding. Then you can toss black swans on top of that.
In fact, I think for large grids you can calculate the probability of any
file being lost with, say, eight servers out of action as the number of
ways to choose the eight dead boxes divided by the number of ways to
choose 10 storage servers for a file. Assuming 200 total servers, that
calculation says that with 8 of them down, one out of every 400 files
would be unavailable, and that ignores the unreachability problem due to
the portion of those unavailable files that are dircaps AND it assumes
uniform share distribution, where in practice I'd expect older servers to
have more shares, and also to be more likely to fail.
To achieve acceptable reliability in large grids N must be increased
significantly.
The simplest way to think about and model it is to set N equal to the
number of storage servers. In that scenario, assuming uniform share
distribution and the same K for all files, the entire contents of the grid
lives or dies together and the simple single-file reliability calculation
works just fine, so if you can get it up to 1-10^-5^ (with realistic
assumptions) there's really no need to bother further, and there's
certainly no need to provide different encoding parameters for dirnodes.
There's little point in making sure the directories survive if all the
files are gone.
If you don't want to set N that large for large grids, the other option is
to accept that you have an exponent in the millions, and choose encoding
parameters such that you still have acceptable predicted reliability. If
you want to store 100M files, and have an aggregate survival probability
of 1-10^-5^, you ''need'' an individual survival probability on the order
of 1-10^-13^, minimum. Even for a thousand files you need an individual p
in the neighborhood of 1-10^-9^.
Oh, and when calculating those probabilities it's very important not to
overestimate storage server reliability. The point of erasure coding is
to reduce the server reliability requirements, which means we tend to
choose less-reliable hardware configurations for storage servers -- old
boxes, cheap blades, etc. Assuming 99.9% availability on such hardware is
foolish. I think 95% is realistic, and choose 90% to be conservative.
Luckily, in a large grid it is not necessary to increase redundancy in
order to get better survival probabilities. Scaling up both K and N in
equal proportions increases reliability, fairly rapidly. 9-of-30 encoding
produces a per-file reliability of 1-10^-16^, for example.
Bringing this line of thought to bear on the question at hand: I don't
think it makes much sense to change the encoding parameters for dirnodes.
Assuming we choose encoding parameters such that p^t^ is acceptable, an
additional factor of p^d^ won't make much difference, since t >> d.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1252#comment:6>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-dev
mailing list