[tahoe-dev] [tahoe-lafs] #287: download: tolerate lost or missing servers
tahoe-lafs
trac at tahoe-lafs.org
Tue Aug 10 05:07:29 UTC 2010
#287: download: tolerate lost or missing servers
-------------------------------+--------------------------------------------
Reporter: warner | Owner:
Type: defect | Status: new
Priority: major | Milestone: 1.8.0
Component: code-encoding | Version: 0.7.0
Resolution: | Keywords: download availability performance test hang
Launchpad Bug: |
-------------------------------+--------------------------------------------
Comment (by warner):
The #798 new downloader (at least in the form that will probably appear in
tahoe-1.8.0) addresses somebut not all of this ticket.
* servers which disconnect during download: these ought to be handled
perfectly: new servers will be located and spun up, necessary hashes
will
be retrieved, and the download should continue without a hitch
* servers which are in a stuck state (e.g. a silent disconnect) before
the
download begins will be tolerated: DYHB requests to them will stall,
but
other servers will be queried, and the download proper will begin as
soon
as enough shares are located. There is a hard-coded 10 second timeout,
and
DYHB queries which are not answered within this time will be replaced
with
a new query. The downloader will allow 10 non-overdue queries to be
outstanding at any given time.
* servers which enter a stuck state after the DYHB query has been
answered
are '''not''' yet handled well. There is code to react to an "OVERDUE"
state (by switching to new shares), but there is not yet any code to
actually declare an OVERDUE state (I couldn't settle on a reasonable
heuristic to distinguish between a stuck server and one that is merely
slow).
The goals described in this ticket's description are still desireable:
* have a list of peers, sorted by "goodness" (probably speed)
* when a server hasn't responded in a while, move it to the bottom of the
list
* keep a couple of extra shares in reserve, to quickly fill in for a
server
that gets stuck
So we should at least keep this ticket open until the new downloader is
capable of declaring an OVERDUE state and thus becomes tolerant to servers
that get stuck after the DYHB queries. And probably the criteria for
closing
it should be the implementation of the scheme where we have a list of
shares
sorted by responsiveness.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/287#comment:29>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-dev
mailing list