[tahoe-dev] Estimating reliability

Brian Warner warner-tahoe at allmydata.com
Mon Jan 12 19:17:33 PST 2009


On Mon, 12 Jan 2009 02:09:28 -0700
Shawn Willden <shawn-tahoe at willden.org> wrote:

> including one important bit of complexity that you didn't mention: Not all
> servers have the same reliability.

Yeah, I didn't choose to analyze it in that depth. I'm glad to see it might
be more tractable that I'd assumed.

> If R < N, then some repair cycles will make no repairs, even though there
> are up to N-R failed shares. A Markov chain is then needed to account for
> the changed probabilities at each cycle, and the computations may become
> intractable. Not sure.

> However, I propose that we should handle this issue a different way, and
> eliminate R from the model altogether. Instead, we should replace it by t,
> which represents the acceptable failure tolerance for the next cycle.


Well, what I think we'll wind up with is a table (parameterized on P_i and
A), that says "if there are 10 shares left, there's a 99.99x% chance of the
file surviving for another A seconds", with similar numbers for 9 shares, 8
shares, etc. Your "t" threshold then tells us what R must be. The
checker/repairer will be configured with an R, and we can compute the overall
survivability from that. If P_i falls, or if A increases, then we may have to
increase R to achieve the desired survivability, if it's even possible.

The worst-case survivability will happen when the file is at R shares at one
sample, then falls to below k shares by the next sample. Since we know we're
doing repair, we know things will be better than this (i.e. most files will
have greater than R shares at any given sample point), and if we really
wanted to, we could construct that Markov chain, but I agree it would be
messy. Hmmm..

 [begin diversion]

 A and P_i would give you a matrix of decay probabilities Md(A,P_i): the
 chance that a file which starts the period with X shares will finish the
 period with Y shares. One side of the diagonal is full of zeros: we can only
 ever lose shares.

 We'll do repair whenever it is desired and possible (i.e. k<=i<R), so we
 construct a second "post-repair" matrix Mpr, in which e.g. Mpr(start=10,
 finish=10) = sum([Md(start=10,finish=i) for i in range(k,R)+[N]]) . This one
 does *not* have an all-zeros diagonal side, because repair will always
 result in a full N shares, even if we started out with fewer than that. Then
 we make a starting vector S[t=0], in which all files are at full-strength (
 S[t=0] = [1.0] + [0.0]*(N-1) ), then for each A period we multiply: S[t=x] =
 S[t=x-1] * Mpr .

 The probability of a file being lost is then the sum of the bad end of
 S[t=x], from 0 shares to k-1. Call this 1-Psurvive(t=x). As time goes by,
 files will irrevocably fall into this end, so lim(Psurvive(t=x), x->inf) is
 0.0 (nothing is immortal), but we're interested in how quickly it gets there.
 It's possible that at some point, the distribution of numbers of shares sort
 of flattens out, and lim(Psurvive(t=x)/Psurvive(t=x-1), x->inf) is
 well-defined (and, I suspect, would approach that limit from above). If so,
 that ratio tells you your long-term probability of surviving over any
 arbitrary period A, which would make it easier to evaluate a long-term goal
 like "I want any given file to have a less than 1 in a million chance of
 being lost over a 5-year period".

 The amount of reconstruction work then depends upon how often we wind up in
 the region of the Mpr table where repair took place.

 [end diversion]

So yeah, avoiding all that math and sticking with the worst-case scenario
instead, we assume that we've got R shares at the start of each period, and
care about the probability that we wind up with fewer than k at the end. In
that model, we only care about one row of the transition matrix (the start=R
row), and we really only care about one number: the chance that we'll land in
the bad end of that row (finish<k). This is then the same from one period to
the next, and the long-term probability is easier to calculate (and the
actual value would be better than this model). I'm not sure that this
simplified model makes it at all easy to figure out the amount of repair
work, however.

> Based on t and the file share set PMF, I can also calculate, on average,
> how much reconstruction work will be needed, and therefore the bandwidth
> required. Want to alter A? No problem, just recalculate t and the bandwidth
> requirements.

Not sure I see that yet.


cool stuff, I'm tempted to spend a couple of days playing with this again
(but really I should be working on code instead..)

thanks,
 -Brian


More information about the tahoe-dev mailing list