[tahoe-dev] Selecting optimal FEC parameters (also, idea for new peer selection policy)

Wed Aug 12 08:01:26 PDT 2009

A key question for anyone setting up a small grid is what FEC parameters to 
use.  In order to conserve space and bandwidth, you want to minimize FEC 
expansion but increasing reliability requires increasing FEC expansion.  
There are also download performance implications to consider.

What I'd eventually like to do is to write some code that computes optimal K 
and M for a given grid size and reliability requirement.  To do that, though, 
I have to first figure out what "optimal" means, and what an optimal 
selection algorith looks like.  Note that my discussion here is aimed at 
small grids.  Large grids have entirely different dynamics.

Some observations:

1.  Though I've previously indicated that it's a bad idea to keep a share 
locally when uploading for backup, I've reconsidered this notion.  A local 
share is useless for ONE purpose of backups -- disaster recovery -- but it 
improves performance of retrievals for many other purposes of backups.

2.  Retrieval performance is maximized when shares are retrived from as many 
servers at once (assuming all are roughly equally responsive).  This means 
that K should be set to the size of the grid, and M adjusted based on 
reliability requirements.  This is the reverse of what I have been thinking.

3.  M larger than the grid means that each server will receive multiple 
shares.  A reasonable first assumption is that all shares on a given server 
will survive or fail together, so the effective reliability of a file is a 
function not of how many shares must be lost for it to disappear, but how 
many servers.  Setting M to K**2 (with K = number of servers) ensures that 
any one of the servers has all the information needed to restore any file, at 
the cost of an FEC expansion factor of K.

4.  Choosing M such that K does not divide M means that some servers get a 
larger number of shares.  I need to do the math, but I suspect that if K / M 
(read K divides M), then reliability is not improved by increasing M by less 
than K.

5.  Not all servers provide the same amount of storage space, so some may get 
full and begin refusing to accept shares.  I think the simplest way to 
address this is to simply exclude that server from the selection process and 
recompute K and M.

6.  If a server is not available at a given time, the same idea can be 
applied:  Just recompute K and M based on the available grid.  It may be that 
not enough servers are available to provide the required reliability.  If so, 
I'd like the upload to FAIL.

What this is shaping up to be is, perhaps, a peer selection policy that could 
be implemented in Tahoe which has as it's input parameter a required 
reliabity.  The per-server reliability estimates could be supplied in a 
variety of ways.  Ideally, long-term, I'd like Tahoe to estimate those 
reliabilities based on server availability history.

So I'm envisioning a peer selection policy that does something like:

1.  Query all servers in the grid to find out if they'd acccept a share.
2.  Take the list that respond affirmatively, and set K to the size of that 
list.
3.  Based on the configured reliability parameter r (could be provided by the 
client or set in tahoe.cfg, or both -- client would override tahoe.cfg), 
compute a value for M that satisfies r, given K and the per-server 
reliability estimates.  I suspect that M will always be a multiple of K.
3a. If no such M exists, FAIL.
3b. If there is such an M, do the FEC coding with the computed M and K and 
push the shares.

This approach actually does away entirely with the notion of "shares of 
happiness" or "servers of happiness" and replaces it with a "failure 
probability of happiness" measure.  I suppose there would also need to be a 
pair of reliability thresholds -- desired and minimum, where 'desired' would 
be used for initial distributions and 'minimum' would be the repair 
threshold.

For large grids, we could incorporate a configurable ceiling for K, and modify 
step 1 above to only query servers (according to the permuted list) until 
ceiling_K servers respond in the affirmative.

	Shawn