| 13 | ==== Brian says: ==== |
| 14 | |
| 15 | Having a function or class to control server-selection is a great idea. The |
| 16 | current code already separates out responsibility for server-selection into a |
| 17 | distinct class, at least for immutable files |
| 18 | (source:src/allmydata/immutable/upload.py#L131 {{{Tahoe2PeerSelector}}}). It |
| 19 | would be pretty easy to make the uploader use different classes according to |
| 20 | a {{{tahoe.cfg}}} option. |
| 21 | |
| 22 | However, there are some additional properties that need to be satified by the |
| 23 | sever-selection algorithm for it to work at all. The basic Tahoe model is |
| 24 | that the filecap is both necessary and sufficient (given some sort of grid |
| 25 | membership) to recover the file. This means that the eventual |
| 26 | '''downloader''' needs to be able to find the same servers, or at least have |
| 27 | a sufficiently-high probability of finding "enough" servers within a |
| 28 | reasonable amount of time, using only information which is found in the |
| 29 | filecap. |
| 30 | |
| 31 | If the downloader is allowed to ask every server in the grid for shares, then |
| 32 | anything will work. If you want to keep the download setup time low, and/or |
| 33 | if you expect to have more than a few dozen servers, then the algorithm needs |
| 34 | to be able to do something better. Note that this is even more of an issue |
| 35 | for mutable shares, where it is important that publish-new-version is able to |
| 36 | track down and update all of the old shares: the chance of accidental |
| 37 | rollback increases when it cannot reliably/cheaply find them all. |
| 38 | |
| 39 | Another potential goal is for the download process to be tolerant of new |
| 40 | servers, removed servers, and shares which have been moved (possibly as the |
| 41 | result of repair or "rebalancing"). Some use cases will care about this, |
| 42 | while others may never change the set of active servers and won't care. |
| 43 | |
| 44 | It's worth pointing out the properties we were trying to get when we came up |
| 45 | with the current "tahoe2" algorithm: |
| 46 | |
| 47 | * for mostly static grids, download uses minimal do-you-have-share queries |
| 48 | * adding one server should only increase download search time by 1/numservers |
| 49 | * repair/rebalancing/migration may move shares to new places, including |
| 50 | servers which weren't present at upload time, and download should be able |
| 51 | to find and use these shares, even though the filecap doesn't change |
| 52 | * traffic load-balancing: all non-full servers get new shares at the same |
| 53 | bytes-per-second, even if serverids are not uniformly distributed |
| 54 | |
| 55 | We picked the pseudo-random permuted serverlist to get these properties. I'd |
| 56 | love to be able to get stronger diversity among hosts, racks, or data |
| 57 | centers, but I don't yet know how to get that '''and''' get the properties |
| 58 | listed above, while keeping the filecaps small. |