#928 closed defect

start downloading as soon as you know where to get K shares — at Version 11

Reported by: zooko Owned by: zooko
Priority: major Milestone: 1.6.0
Component: code-peerselection Version: 1.5.0
Keywords: download availability performance hang Cc:
Launchpad Bug:

Description (last modified by zooko)

The current code (2a2cc5c4b8a490ae) performs immutable file download in four phases. In phase 1 it sends a get_buckets() remote invocation message to every storage server that it knows about (CiphertextDownloader._get_all_shareholders()). In phase 2 it requests the CEB (a.k.a. UEB) from each server in turn until it gets a valid CEB (CiphertextDownloader._obtain_uri_extension()). In phase 3 it requests the crypttext hash tree from each server in turn until it gets a valid hash tree (CiphertextDownloader._get_crypttext_hash_tree()). In phase 4 is actually downloads the blocks of data from servers in parallel. Out of the shares which it learned about in phase 1, which shares does it choose? It chooses "primary shares" first (their sharenum is < K) because those can be erasure-decoded at no computational cost, then it chooses randomly from among secondary shares until it has K.

Now currently phase 1 does not end, and therefore phase 2 does not begin, until all servers have answered the get_buckets()! To close this ticket, make it so phase 1 ends as soon as at least K buckets have been found.

The nice non-invasive way to do this is to replace the DeferredList? in CiphertextDownloader._get_all_shareholders() with an object that acts like a DeferredList? but fires once at least K shares (or possibly once at least K servers) have been found.

Change History (21)

comment:1 Changed at 2010-01-27T16:18:22Z by zooko

Here's a patch with no tests. David-Sarah offered to write tests, last night. If they don't, I'll write tests this evening.

Changed at 2010-01-27T16:18:59Z by zooko

comment:2 Changed at 2010-01-27T16:20:32Z by zooko

That patch had a typo that was caught by the current unit tests. Here is the patch with the typo fixed.

Changed at 2010-01-27T16:20:46Z by zooko

comment:3 Changed at 2010-01-27T17:40:29Z by zooko

Amber, driving me to work and listening to me talk about this patch, pointed out that we need to count number of unique share numbers instead of total number of buckets. For example, if we contact all servers, and they all reply "Oh yes! I do have a bucket -- here it, for share number 0.", then with the current patch -- dontwait2.darcspatch.txt -- it will think that we got enough shares to download the file although we didn't. Here is a new patch named dontwait3.darcspatch.txt which fixes this.

Looking at the code to fix this, it appears to me that the current trunk has a similar same problem. It measures len(self_share_buckets) to determine if it should abort or not in CiphertextDownloader._got_all_shareholders(), and if there are insufficient buckets then it errors out with Failed to get enough shareholders: have %d, need %d. However, if it has enough buckets but then it turns out that it doesn't have enough unique shares then this will be caught later, in, len(self._shares[source:src/allmydata/immutable/download.py@4164#L1025 CiphertextDownloader._activate_enough_buckets()], and reported with the error message {{{Unable to activate enough shares: have %d, need %d.

I think this patch will eliminate the possibility of that second case, so all such failures will be detected and reported in the same way.

We should add a test of this case.

comment:4 Changed at 2010-01-27T19:06:40Z by zooko

So here is the latest version. This fails the current tests, not to mention the new tests that we need to write for it, possibly because the deferred returned by CiphertextDownloader._got_all_shareholders() can now errback in addition to callback, where on trunk it can only callback. Looking at the chain in CiphertextDownloader.start(), it seems to me like an errback from _get_all_shareholders() should be nicely handled by _finished() and then by _failed(). Hm.

Changed at 2010-01-27T19:07:34Z by zooko

comment:5 follow-up: Changed at 2010-01-27T19:53:10Z by warner

FWIW, my Downloader rewrite will include this feature.

Also, for the currently-proposed patch, it would be a good idea to make sure that we grab the UEB from each share (and validate it) before starting the download. Specifically I want to make sure that if the storage server can open the file (and therefore returns YES to the get_buckets query), but then experiences errors or corruption when actually reading the share file, that the downloader will correctly switch over to a different share. I don't know if the current downloader can switch to a new share after it starts the download process for real.

comment:6 Changed at 2010-01-27T20:53:55Z by zooko

I figured your Downloader rewrite would. Also, probably lots of other improvements such as more pipelining. Is there a list of your intended improvements to Downloader? I have vague recollections of lots of good ideas you had for Downloader improvements.

I still think it is a good idea to apply this patch now, however, (after thorough tests and code review) because:

  1. Now I understand that the bad behavior I've been seeing (especially on the allmydata.com prod grid) in which downloads hang is not caused solely by a server failing during a download, as I formerly thought (#287), but is caused by there being any server connected to the network which is in the hung state (such that it maintains its TCP connections but refuses to answer get_buckets()). With current trunk, as long as there is any such server connected to a grid then all downloads from that grid will hang.
  2. Likewise, with current trunk, the slowest server (even if it isn't completely hung) determines the alacrity of beginning an immutable file download. This explains the behavior that I've observed in which all downloads take a few seconds to start (because there is one server on that grid which is slow or overloaded).
  3. With this patch, you'll download from the K servers that answered the get_buckets first (assuming only one share per server) instead of the K servers that have primary shares (or, in the case that you don't get K servers with primary shares, random servers with secondary shares). This sounds potentially a nice performance improvement, especially for heterogeneous and geographically spread-out grids.
  4. This patch is nicely self-contained, as I hope you (Brian) will take the time to verify by reviewing it. It could be made more self-contained by changing it to callback instead of errback when K buckets couldn't be located (as described in comment:4), and I should probably do so out of an abundance of caution, but I intend to first examine why the errback doesn't do what I expect. I guess it could also be made smaller by taking out the part that changes reporting of status from "responses received/queries sent" to "responses received+queries failed/queries sent". I changed that only because it seemed slightly inaccurate to omit the queries failed in the reporting, but it isn't really necessary for this patch.

comment:7 Changed at 2010-01-27T21:06:53Z by zooko

Here is a version of the patch which omits the status reporting change and makes the deferred returned by _get_all_shareholders() callback instead of errback when it runs out of outstanding queries.

Changed at 2010-01-27T21:07:30Z by zooko

comment:8 in reply to: ↑ 5 Changed at 2010-01-27T21:12:43Z by zooko

Replying to warner:

Also, for the currently-proposed patch, it would be a good idea to make sure that we grab the UEB from each share (and validate it) before starting the download. Specifically I want to make sure that if the storage server can open the file (and therefore returns YES to the get_buckets query), but then experiences errors or corruption when actually reading the share file, that the downloader will correctly switch over to a different share. I don't know if the current downloader can switch to a new share after it starts the download process for real.

The current downloader cannot switch to a new share after it starts the download process. What you suggest here would be a good idea, but it would require a bigger, more disruptive patch than the current patch. The current patch basically only changes one thing: when does the deferred returned from _get_all_shareholders() fire. It changes it from firing once all queries have resolved, to firing as soon as K shares are found (or all queries are resolved). To make the improvement you suggest here we would have to chain the verification of UEB (and presumably also of ciphertext hash tree) to the remote get_buckets query which would reorder the control flow.

Now I don't think it is a big loss if we get only K chances to find an uncorrupted UEB (and ciphertext hash tree) versus getting ~N chances. In practice I guess we've never seen a corrupted UEB, so we always succeed on the first UEB we try. But the issue that this ticket is intended to address is a big issue in practice, so I would be happy with the trade-off.

Better handling of corrupted UEBs, ciphertext hash trees, and blocks would then be an improvement for your Downloader rewrite.

comment:9 Changed at 2010-01-27T21:14:19Z by zooko

There was an edit-o mistake in a comment in the previous version of the patch. Here it is with that edit-o fixed.

Changed at 2010-01-27T21:14:38Z by zooko

comment:10 Changed at 2010-01-27T23:47:24Z by zooko

Brian reviewed this and made a couple of comments on IRC which I took into account in this next version (6) of the patch.

Changed at 2010-01-27T23:48:43Z by zooko

Changed at 2010-01-28T06:54:04Z by davidsarah

Support for hanging servers until a Deferred fires in a NoNetworkGrid?

Changed at 2010-01-28T06:56:16Z by davidsarah

Incomplete tests for download when some servers are hung

comment:11 Changed at 2010-01-28T15:44:42Z by zooko

  • Description modified (diff)

Changed at 2010-01-29T12:47:54Z by davidsarah

New tests for #928 (patch bundle includes dontwait6). Problem with errback instead of callback fixed; all existing tests run green. Known bug in _copy_all_shares_from.

Changed at 2010-01-29T18:54:37Z by zooko

Note: See TracTickets for help on using tickets.