[tahoe-dev] Low-effort repairs

Shawn Willden shawn-tahoe at willden.org
Wed Jan 14 23:33:09 PST 2009


Perhaps this has already been considered and discarded for reasons not obvious 
to me, but I thought it worth throwing out.

Zooko's repairer does "proper" repairs.  By "proper", I mean it regenerates 
the N shares and redistributes them, bringing file availability up to full.  
Doing that requires downloading k shares to reconstruct the file, 
re-processing it with the erasure coding algorithm, and then uploading d 
shares (if d is the number lost).

However, it occurs to me that there may be situations in which a quick and 
dirty repair job may be adequate, and much cheaper.  Rather than regenerating 
the shares and delivering the actual lost copies, the repairer can simply 
make additional copies of the shares still remaining.  If the repairer 
doesn't actually have to touch them, but can instead simply direct peers that 
have shares to send them to peers that don't, then it becomes a VERY low-cost 
operation, as it relies on distributed bandwidth.

Obviously, a quick repair isn't as good as the real thing, since it leaves you 
in a position where losing less than N-k shares could still be fatal, but in 
many cases, it's probably good enough.

For example, the probability of losing a file with N=10, k=3, p=.95 and four 
lost shares which have been replaced by duplicates of still-extant shares is 
9.9e-8, as compared to 1.6e-9 for a proper repair job.  Not that much worse.

If there's storage to spare, the repairer could even direct all six peers to 
duplicate their shares, achieving a file loss probability of 5.8e-10, which 
is *better* than the nominal case, albeit at the expense of consuming 12 
shares of distributed storage rather than 10.

Zooko, something to think about after the repairer is stabilized and the 1.3 
release is out.

	Shawn.


More information about the tahoe-dev mailing list