[tahoe-dev] Selecting optimal FEC parameters (also, idea for new peer selection policy)
Shawn Willden
shawn-tahoe at willden.org
Wed Aug 12 08:01:26 PDT 2009
A key question for anyone setting up a small grid is what FEC parameters to
use. In order to conserve space and bandwidth, you want to minimize FEC
expansion but increasing reliability requires increasing FEC expansion.
There are also download performance implications to consider.
What I'd eventually like to do is to write some code that computes optimal K
and M for a given grid size and reliability requirement. To do that, though,
I have to first figure out what "optimal" means, and what an optimal
selection algorith looks like. Note that my discussion here is aimed at
small grids. Large grids have entirely different dynamics.
Some observations:
1. Though I've previously indicated that it's a bad idea to keep a share
locally when uploading for backup, I've reconsidered this notion. A local
share is useless for ONE purpose of backups -- disaster recovery -- but it
improves performance of retrievals for many other purposes of backups.
2. Retrieval performance is maximized when shares are retrived from as many
servers at once (assuming all are roughly equally responsive). This means
that K should be set to the size of the grid, and M adjusted based on
reliability requirements. This is the reverse of what I have been thinking.
3. M larger than the grid means that each server will receive multiple
shares. A reasonable first assumption is that all shares on a given server
will survive or fail together, so the effective reliability of a file is a
function not of how many shares must be lost for it to disappear, but how
many servers. Setting M to K**2 (with K = number of servers) ensures that
any one of the servers has all the information needed to restore any file, at
the cost of an FEC expansion factor of K.
4. Choosing M such that K does not divide M means that some servers get a
larger number of shares. I need to do the math, but I suspect that if K / M
(read K divides M), then reliability is not improved by increasing M by less
than K.
5. Not all servers provide the same amount of storage space, so some may get
full and begin refusing to accept shares. I think the simplest way to
address this is to simply exclude that server from the selection process and
recompute K and M.
6. If a server is not available at a given time, the same idea can be
applied: Just recompute K and M based on the available grid. It may be that
not enough servers are available to provide the required reliability. If so,
I'd like the upload to FAIL.
What this is shaping up to be is, perhaps, a peer selection policy that could
be implemented in Tahoe which has as it's input parameter a required
reliabity. The per-server reliability estimates could be supplied in a
variety of ways. Ideally, long-term, I'd like Tahoe to estimate those
reliabilities based on server availability history.
So I'm envisioning a peer selection policy that does something like:
1. Query all servers in the grid to find out if they'd acccept a share.
2. Take the list that respond affirmatively, and set K to the size of that
list.
3. Based on the configured reliability parameter r (could be provided by the
client or set in tahoe.cfg, or both -- client would override tahoe.cfg),
compute a value for M that satisfies r, given K and the per-server
reliability estimates. I suspect that M will always be a multiple of K.
3a. If no such M exists, FAIL.
3b. If there is such an M, do the FEC coding with the computed M and K and
push the shares.
This approach actually does away entirely with the notion of "shares of
happiness" or "servers of happiness" and replaces it with a "failure
probability of happiness" measure. I suppose there would also need to be a
pair of reliability thresholds -- desired and minimum, where 'desired' would
be used for initial distributions and 'minimum' would be the repair
threshold.
For large grids, we could incorporate a configurable ceiling for K, and modify
step 1 above to only query servers (according to the permuted list) until
ceiling_K servers respond in the affirmative.
Shawn
More information about the tahoe-dev
mailing list