[tahoe-dev] issue #610: upload should take better advantage of existing shares

Mon Feb 9 16:56:59 PST 2009

Brian opened this ticket, which explains most of dreid's performance  
problems:

"""
Our current upload process (which is nearly the oldest code in the  
entire tahoe tree) could be smarter in the presence of existing  
shares. If a file is uploaded in January, then a few dozen servers  
are added in February, then in March it is (for whatever reason)  
uploaded again, here's what currently happens:
  * peer selection comes up with a permuted list of servers, with the  
same partial ordering as the original list but with the new servers  
inserted in various pseudo-random places
  * each server in the list is asked, in turn, if they would be  
willing to hold on to the next sequentially numbered share
  * each server might say yes or no. In addition, each server will  
return a list of shares that they might already have
  * the client never asks a server to accept a share that it already  
had a home for, but it also never unasks a server to hold a share  
that it later learns is housed somewhere else
So, if the client queries a server which already has a share, that  
server will probably end up with two shares. In addition, many shares  
will probably end up being sent to a new server even though some  
other server (later in the permuted list) already has a copy.

To fix this, the upload process needs to do more work:

  * it needs to cancel share-upload requests if it later learns that  
some other server already has that particular share
    * perhaps it should perform some sort of validation on the  
claimed already-uploaded share
  * if it detects any evidence of pre-existing shares, it should put  
more energy into finding additional ones
  * it needs to ask more servers than it strictly needs (for upload  
purposes) to increase the chance that it can detect this evidence
We're planning an overhaul of immutable upload/download code, both to  
improve parallelism and to replace the DeferredList with a state  
machine (to make it easier to bypass stalled servers, for example).  
These goals should be included in that work.

This process will work best when the shares are closer to the  
beginning of the permuted list. A "share rebalancing" mechanism  
should be created to gradually move shares in this direction over  
time. This is another facet of repair: no only should there be enough  
shares in existence, but they should be located in the best place for  
a downloader to find them quickly.

"""

tickets mentioned in this message:
http://allmydata.org/trac/tahoe/ticket/610 # upload should take  
better advantage of existing shares