Changes between Version 3 and Version 4 of TahoeThree


Ignore:
Timestamp:
2009-10-19T18:19:21Z (15 years ago)
Author:
davidsarah
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • TahoeThree

    v3 v4  
    1 See http://tahoebs1.allmydata.com:8123/file/URI%3ACHK%3Ah2laisecpigc7uqg6uyctzmeq4%3Aaoef77jjwsryjbz6cav5lhid3lisqhxm5ym7qdrn5vhm24oa4oyq%3A3%3A10%3A4173/@@named=/peer-selection-tahoe3.txt
     1Copied from http://tahoebs1.allmydata.com:8123/file/URI%3ACHK%3Ah2laisecpigc7uqg6uyctzmeq4%3Aaoef77jjwsryjbz6cav5lhid3lisqhxm5ym7qdrn5vhm24oa4oyq%3A3%3A10%3A4173/@@named=/peer-selection-tahoe3.txt
     2
     3= THIS PAGE DESCRIBES HISTORICAL ARCHITECTURE CHOICES: THE CURRENT CODE DOES NOT WORK AS DESCRIBED HERE. =
     4
     5When a file is uploaded, the encoded shares are sent to other peers. But to
     6which ones? The PeerSelection algorithm is used to make this choice.
     7
     8In the old (May 2007) version, the verifierid is used to consistently-permute
     9the set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each
     10file gets a different permutation, which (on average) will evenly distribute
     11shares among the grid and avoid hotspots.
     12
     13This permutation places the peers around a 2^256^-sized ring, like the rim of
     14a big clock. The 100-or-so shares are then placed around the same ring (at 0,
     151/100*2^256^, 2/100*2^256^, ... 99/100*2^256^). Imagine that we start at 0 with
     16an empty basket in hand and proceed clockwise. When we come to a share, we
     17pick it up and put it in the basket. When we come to a peer, we ask that peer
     18if they will give us a lease for every share in our basket.
     19
     20The peer will grant us leases for some of those shares and reject others (if
     21they are full or almost full). If they reject all our requests, we remove
     22them from the ring, because they are full and thus unhelpful. Each share they
     23accept is removed from the basket. The remainder stay in the basket as we
     24continue walking clockwise.
     25
     26We keep walking, accumulating shares and distributing them to peers, until
     27either we find a home for all shares, or there are no peers left in the ring
     28(because they are all full). If we run out of peers before we run out of
     29shares, the upload may be considered a failure, depending upon how many
     30shares we were able to place. The current parameters try to place 100 shares,
     31of which 25 must be retrievable to recover the file, and the peer selection
     32algorithm is happy if it was able to place at least 75 shares. These numbers
     33are adjustable: 25-out-of-100 means an expansion factor of 4x (every file in
     34the grid consumes four times as much space when totalled across all
     35StorageServers), but is highly reliable (the actual reliability is a binomial
     36distribution function of the expected availability of the individual peers,
     37but in general it goes up very quickly with the expansion factor).
     38
     39If the file has been uploaded before (or if two uploads are happening at the
     40same time), a peer might already have shares for the same file we are
     41proposing to send to them. In this case, those shares are removed from the
     42list and assumed to be available (or will be soon). This reduces the number
     43of uploads that must be performed.
     44
     45When downloading a file, the current release just asks all known peers for
     46any shares they might have, chooses the minimal necessary subset, then starts
     47downloading and processing those shares. A later release will use the full
     48algorithm to reduce the number of queries that must be sent out. This
     49algorithm uses the same consistent-hashing permutation as on upload, but
     50instead of one walker with one basket, we have 100 walkers (one per share).
     51They each proceed clockwise in parallel until they find a peer, and put that
     52one on the "A" list: out of all peers, this one is the most likely to be the
     53same one to which the share was originally uploaded. The next peer that each
     54walker encounters is put on the "B" list, etc.
     55
     56All the "A" list peers are asked for any shares they might have. If enough of
     57them can provide a share, the download phase begins and those shares are
     58retrieved and decoded. If not, the "B" list peers are contacted, etc. This
     59routine will eventually find all the peers that have shares, and will find
     60them quickly if there is significant overlap between the set of peers that
     61were present when the file was uploaded and the set of peers that are present
     62as it is downloaded (i.e. if the "peerlist stability" is high). Some limits
     63may be imposed in large grids to avoid querying a million peers; this
     64provides a tradeoff between the work spent to discover that a file is
     65unrecoverable and the probability that a retrieval will fail when it could
     66have succeeded if we had just tried a little bit harder. The appropriate
     67value of this tradeoff will depend upon the size of the grid, and will change
     68over time.