1 | | See http://tahoebs1.allmydata.com:8123/file/URI%3ACHK%3Ayhw67uwzcdszqtzim6zxmiqzte%3A323pu5fz27sbntnxgffa57eqisabggvdnmq3zqoqyckgkimgmfma%3A3%3A10%3A3448/@@named=/peer-selection-tahoe2.txt |
| 1 | Copied from http://tahoebs1.allmydata.com:8123/file/URI%3ACHK%3Ayhw67uwzcdszqtzim6zxmiqzte%3A323pu5fz27sbntnxgffa57eqisabggvdnmq3zqoqyckgkimgmfma%3A3%3A10%3A3448/@@named=/peer-selection-tahoe2.txt |
| 2 | |
| 3 | = THIS PAGE DESCRIBES HISTORICAL DESIGN CHOICES. SEE docs/architecture.txt FOR CURRENT DOCUMENTATION = |
| 4 | |
| 5 | When a file is uploaded, the encoded shares are sent to other peers. But to |
| 6 | which ones? The PeerSelection algorithm is used to make this choice. |
| 7 | |
| 8 | Early in 2007, we were planning to use the following "Tahoe Two" algorithm. |
| 9 | By the time we released 0.2.0, we switched to "tahoe3", but when we released |
| 10 | v0.6, we switched back (ticket #132). |
| 11 | |
| 12 | As in Tahoe Three, the verifierid is used to consistently-permute the set of |
| 13 | all peers (by sorting the peers by HASH(verifierid+peerid)). Each file gets a |
| 14 | different permutation, which (on average) will evenly distribute shares among |
| 15 | the grid and avoid hotspots. |
| 16 | |
| 17 | With our basket of (usually 10) shares to distribute in hand, we start at the |
| 18 | beginning of the list and ask each peer in turn if they are willing to hold |
| 19 | on to one of our shares (the "lease request"). If they say yes, we remove |
| 20 | that share from the basket and remember who agreed to host it. Then we go to |
| 21 | the next peer in the list and ask them the same question about another share. |
| 22 | If a peer says no, we remove them from the list. If a peer says that they |
| 23 | already have one or more shares for this file, we remove those shares from |
| 24 | the basket. If we reach the end of the list, we start again at the beginning. |
| 25 | If we run out of peers before we run out of shares, we fail unless we've |
| 26 | managed to place at least some number of the shares: the likely threshold is |
| 27 | to attempt to place 10 shares (out of which we'll need 3 to recover the |
| 28 | file), and be content if we can find homes for at least 7 of them. |
| 29 | |
| 30 | In small networks, this approach will loop around several times and place |
| 31 | several shares with each node (e.g. in a 5-host network with plenty of space, |
| 32 | each node will get 2 shares). In large networks with plenty of space, the |
| 33 | shares will be placed with the first 10 peers in the permuted list. In large |
| 34 | networks that are somewhat full, we'll need to traverse more of the list |
| 35 | before we find homes for the shares. The average number of peers that we'll |
| 36 | need to talk to is vaguely equal to 10 / (1-utilization), with a bunch of |
| 37 | other terms that relate to the distribution of free space on the peers and |
| 38 | the size of the shares being offered. Small files with small shares will fit |
| 39 | anywhere, large files with large shares will only fit on certain peers, so |
| 40 | the mesh may have free space but no holes large enough for a very large file, |
| 41 | which might indicate that we should try again with a larger number of |
| 42 | (smaller) shares. |
| 43 | |
| 44 | When it comes time to download, we compute a similar list of permuted |
| 45 | peerids, and start asking for shares beginning with the start of the list. |
| 46 | Each peer gives us a list of the shareids that they are holding. Eventually |
| 47 | (depending upon how much churn the peerlist has experienced), we'll find |
| 48 | holders for at least 3 shares, or we'll run out of peers. If the mesh is very |
| 49 | large and we want to fail faster, we can establish an upper bound on how many |
| 50 | peers we should talk to (perhaps by recording the permuted peerid of the last |
| 51 | node to which we sent a share, or a count of the total number of peers we |
| 52 | talked to during upload). |
| 53 | |
| 54 | I suspect that this approach handles churn more efficiently than tahoe3, but |
| 55 | I haven't gotten my head around the math that could be used to show it. On |
| 56 | the other hand, it takes a lot more round trips to find homes in small meshes |
| 57 | (one per share, whereas tahoe three can do just one per node). |
| 58 | |