[tahoe-dev] [tahoe-lafs] #928: start downloading as soon as you know where to get K shares
tahoe-lafs
trac at allmydata.org
Wed Jan 27 12:53:55 PST 2010
#928: start downloading as soon as you know where to get K shares
-----------------------------------------------+----------------------------
Reporter: zooko | Owner: zooko
Type: defect | Status: new
Priority: major | Milestone: 1.6.0
Component: code-peerselection | Version: 1.5.0
Keywords: download availability performance | Launchpad_bug:
-----------------------------------------------+----------------------------
Comment(by zooko):
I figured your Downloader rewrite would. Also, probably lots of other
improvements such as more pipelining. Is there a list of your intended
improvements to Downloader? I have vague recollections of lots of good
ideas you had for Downloader improvements.
I still think it is a good idea to apply this patch now, however, (after
thorough tests and code review) because:
1. Now I understand that the bad behavior I've been seeing (especially on
the allmydata.com prod grid) in which downloads hang is ''not'' caused
solely by a server failing ''during'' a download, as I formerly thought
(#287), but is caused by there being any server connected to the network
which is in the hung state (such that it maintains its TCP connections but
refuses to answer {{{get_buckets()}}}). With current trunk, as long as
there is any such server connected to a grid then all downloads from that
grid will hang.
2. Likewise, with current trunk, the slowest server (even if it isn't
completely hung) determines the alacrity of beginning an immutable file
download. This explains the behavior that I've observed in which all
downloads take a few seconds to start (because there is one server on that
grid which is slow or overloaded).
3. With this patch, you'll download from the K servers that answered the
{{{get_buckets}}} first (assuming only one share per server) instead of
the K servers that have primary shares (or, in the case that you don't get
K servers with primary shares, random servers with secondary shares).
This sounds potentially a nice performance improvement, especially for
heterogeneous and geographically spread-out grids.
4. This patch is nicely self-contained, as I hope you (Brian) will take
the time to verify by reviewing it. It could be made ''more'' self-
contained by changing it to callback instead of errback when K buckets
couldn't be located (as described in comment:4), and I should probably do
so out of an abundance of caution, but I intend to first examine why the
errback doesn't do what I expect. I guess it could also be made smaller
by taking out the part that changes reporting of status from "responses
received/queries sent" to "responses received+queries failed/queries
sent". I changed that only because it seemed slightly inaccurate to omit
the queries failed in the reporting, but it isn't really necessary for
this patch.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/928#comment:6>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list