[tahoe-dev] [tahoe-lafs] #287: download: tolerate lost or missing servers

tahoe-lafs trac at tahoe-lafs.org
Tue Aug 10 05:07:29 UTC 2010


#287: download: tolerate lost or missing servers
-------------------------------+--------------------------------------------
     Reporter:  warner         |       Owner:                                             
         Type:  defect         |      Status:  new                                        
     Priority:  major          |   Milestone:  1.8.0                                      
    Component:  code-encoding  |     Version:  0.7.0                                      
   Resolution:                 |    Keywords:  download availability performance test hang
Launchpad Bug:                 |  
-------------------------------+--------------------------------------------

Comment (by warner):

 The #798 new downloader (at least in the form that will probably appear in
 tahoe-1.8.0) addresses somebut not all of this ticket.

  * servers which disconnect during download: these ought to be handled
    perfectly: new servers will be located and spun up, necessary hashes
 will
    be retrieved, and the download should continue without a hitch
  * servers which are in a stuck state (e.g. a silent disconnect) before
 the
    download begins will be tolerated: DYHB requests to them will stall,
 but
    other servers will be queried, and the download proper will begin as
 soon
    as enough shares are located. There is a hard-coded 10 second timeout,
 and
    DYHB queries which are not answered within this time will be replaced
 with
    a new query. The downloader will allow 10 non-overdue queries to be
    outstanding at any given time.
  * servers which enter a stuck state after the DYHB query has been
 answered
    are '''not''' yet handled well. There is code to react to an "OVERDUE"
    state (by switching to new shares), but there is not yet any code to
    actually declare an OVERDUE state (I couldn't settle on a reasonable
    heuristic to distinguish between a stuck server and one that is merely
    slow).

 The goals described in this ticket's description are still desireable:

  * have a list of peers, sorted by "goodness" (probably speed)
  * when a server hasn't responded in a while, move it to the bottom of the
    list
  * keep a couple of extra shares in reserve, to quickly fill in for a
 server
    that gets stuck

 So we should at least keep this ticket open until the new downloader is
 capable of declaring an OVERDUE state and thus becomes tolerant to servers
 that get stuck after the DYHB queries. And probably the criteria for
 closing
 it should be the implementation of the scheme where we have a list of
 shares
 sorted by responsiveness.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/287#comment:29>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-dev mailing list