[tahoe-dev] Low-effort repairs
Shawn Willden
shawn-tahoe at willden.org
Wed Jan 14 23:33:09 PST 2009
Perhaps this has already been considered and discarded for reasons not obvious
to me, but I thought it worth throwing out.
Zooko's repairer does "proper" repairs. By "proper", I mean it regenerates
the N shares and redistributes them, bringing file availability up to full.
Doing that requires downloading k shares to reconstruct the file,
re-processing it with the erasure coding algorithm, and then uploading d
shares (if d is the number lost).
However, it occurs to me that there may be situations in which a quick and
dirty repair job may be adequate, and much cheaper. Rather than regenerating
the shares and delivering the actual lost copies, the repairer can simply
make additional copies of the shares still remaining. If the repairer
doesn't actually have to touch them, but can instead simply direct peers that
have shares to send them to peers that don't, then it becomes a VERY low-cost
operation, as it relies on distributed bandwidth.
Obviously, a quick repair isn't as good as the real thing, since it leaves you
in a position where losing less than N-k shares could still be fatal, but in
many cases, it's probably good enough.
For example, the probability of losing a file with N=10, k=3, p=.95 and four
lost shares which have been replaced by duplicates of still-extant shares is
9.9e-8, as compared to 1.6e-9 for a proper repair job. Not that much worse.
If there's storage to spare, the repairer could even direct all six peers to
duplicate their shares, achieving a file loss probability of 5.8e-10, which
is *better* than the nominal case, albeit at the expense of consuming 12
shares of distributed storage rather than 10.
Zooko, something to think about after the repairer is stabilized and the 1.3
release is out.
Shawn.
More information about the tahoe-dev
mailing list