#798 closed enhancement (fixed)

improve random-access download to retrieve/decrypt less data

Reported by: warner Owned by: davidsarah
Priority: major Milestone: 1.8.0
Component: code-network Version: 1.5.0
Keywords: performance pycryptopp random-access download large confidentiality review-needed docs Cc: jeremy@…
Launchpad Bug:

Description

Currently, using a Range: header on an HTTP GET (to fetch just a portion of a file, instead of the whole thing) causes the tahoe client to download the entire file into a tmpfile, then serve out just the portion that was requested. To make this faster, we should only fetch the segments that contain the desired range. Two changes need to happen to make this work:

  • the Downloader must be rewritten, to fetch segments on demand, to (sometimes) cache previously cached segments, and decrypt just the necessary data
  • pycryptopp needs to provide random-access AES processing, so we can decrypt data starting at some point other than the beginning of the file (pycryptopp#18)

The new Downloader should have a couple of layers:

  • top layer receives a read(offset, length) request (maybe a even fully-general readv)
  • that layer looks at the set of cached segments, tries to satisfy the request from cache
  • if not, submit requests for segment fetches to the lower layer
  • lower layer looks to see what servers it has available, which requests are already in flight
  • it sends out more requests, or prepares new servers (querying for share presence, fetching hash trees, fetching block data)

Similar code will be needed for MDMF mutable files, since those are specified to contain multiple segments, and we'll want random-access for them too.

Attachments (12)

new-downloader-v1.diff (121.3 KB) - added by warner at 2010-03-11T19:35:25Z.
work-in-progress of new downloader, maybe 80% complete
new-downloader-v2.diff (127.7 KB) - added by warner at 2010-04-23T23:35:11Z.
latest WIP patch
new-downloader-v3.diff (141.6 KB) - added by warner at 2010-04-26T09:53:14Z.
latest WIP patch, a few tests pass
new-downloader-v4.diff (149.6 KB) - added by warner at 2010-04-28T16:16:55Z.
one system test works
new-downloader-v5.diff (154.7 KB) - added by warner at 2010-04-29T19:00:21Z.
rest of SystemTest? now passes
new-downloader-v6.diff (165.4 KB) - added by warner at 2010-05-01T02:37:16Z.
v6 patch: most existing tests pass
new-downloader-v7.diff (219.3 KB) - added by warner at 2010-05-10T03:55:47Z.
12 uncovered lines left, some tests disabled
new-downloader-v8.diff (281.7 KB) - added by warner at 2010-05-27T23:59:51Z.
more integration, refactoring
new-downloader-v9.diff (400.0 KB) - added by warner at 2010-06-28T22:32:24Z.
more small improvements
new-downloader-v10.diff (379.2 KB) - added by warner at 2010-07-26T02:26:52Z.
added OVERDUE for get_buckets calls, reenabled some "hung server" tests
new-downloader-v10a.dpatch (267.7 KB) - added by davidsarah at 2010-08-01T05:49:45Z.
Brian's New Downloader, for testing in 1.8beta (or alpha)
798-docs.dpatch (3.9 KB) - added by davidsarah at 2010-08-30T03:46:28Z.
docs/performance.txt, architecture.txt: updates taking into account new downloader. refs #798

Change History (40)

comment:1 Changed at 2009-12-04T04:56:40Z by davidsarah

  • Component changed from code-performance to code-network
  • Keywords performance pycryptopp added

comment:2 Changed at 2009-12-20T17:38:36Z by davidsarah

  • Keywords random-access added

comment:3 Changed at 2009-12-20T17:39:04Z by davidsarah

  • Keywords download added

comment:4 Changed at 2010-02-02T03:33:07Z by davidsarah

  • Keywords large added
  • Milestone changed from eventually to 1.7.0

comment:5 Changed at 2010-02-02T03:49:45Z by davidsarah

  • Owner set to warner

Brian is writing the new Downloader.

comment:6 Changed at 2010-03-11T18:08:51Z by jsgf

  • Cc jeremy@… added

Changed at 2010-03-11T19:35:25Z by warner

work-in-progress of new downloader, maybe 80% complete

comment:7 Changed at 2010-03-14T05:12:23Z by davidsarah

  • Keywords confidentiality added

As discussed in #990, a positive security side-effect of this rewrite will be to avoid caching plaintext in order to satisfy byterange requests.

Changed at 2010-04-23T23:35:11Z by warner

latest WIP patch

Changed at 2010-04-26T09:53:14Z by warner

latest WIP patch, a few tests pass

Changed at 2010-04-28T16:16:55Z by warner

one system test works

comment:8 Changed at 2010-04-28T16:32:23Z by warner

in the v4 patch, SystemTest.test_filesystem now passes. I had to disable the "download status" checking code.. that part isn't implemented yet.

Next steps:

  • add download status: require rework, since we're no longer downloading whole files at a time, and the status-display code thinks in such terms
  • integration work:
    • split download2.py into different pieces. Probably move existing download.py's LiteralFileNode into a new literal.py, let the top-level (non-literal) ImmutableFileNode live in filenode.py, move most of the rest of download2.py (the ShareFinder and Share classes) into download.py
    • get rid of the old ImmutableFileNode class
  • write specific tests for new interesting cases: guessing offsets wrong, read requests fail, others
  • implement "response overdue" behavior

Changed at 2010-04-29T19:00:21Z by warner

rest of SystemTest? now passes

Changed at 2010-05-01T02:37:16Z by warner

v6 patch: most existing tests pass

comment:9 follow-up: Changed at 2010-05-01T02:44:37Z by warner

In the v6 patch, most of the existing tests pass. Remaining old-test work to do:

  • test_hung_download (touches internals, needs rewrite)
  • test_immutable (not sure, looks interesting)
  • test_web (format of NotEnoughSharesError changed)
    • either need to update tests to match, or change format to match old one
  • test_system (disabled download-status)

And all the TODOs from above.

I've been thinking about the download-status data. What I'd like to record is a list of messages, one per share, with (send time, start+offset, response size, response time) for each one (plus information about the initial DYHB query). For now I'd just dump this information as text in the download-status page, but eventually I'd like a scrolling JS/Canvas-based diagram. It would have time along the X axis, and probably (server,share) along the Y axis. Each message send would be a horizontal line that spans from the send to the receipt, maybe with a thickness related to the number of bytes being requested. There will be lots of overlapping requests, so they must be handled cleanly.

There should be vertical lines to indicate when each completed segment is delivered to the caller (running from the top-most active share to the bottom of the chart, with an arrow pointing downwards). There should also be vertical lines to indicate when the caller requests the next segment: when callers are stalling (i.e. streaming music not buffering too much) it should be obvious by looking at the chart.

Another sort of chart that would be interesting would have byte-offset-in-share as the Y axis, showing how we fetch different pieces of the share at different times.

comment:10 in reply to: ↑ 9 Changed at 2010-05-01T03:13:51Z by davidsarah

Replying to warner:

In the v6 patch, most of the existing tests pass. Remaining old-test work to do:

  • test_hung_download (touches internals, needs rewrite)
  • test_immutable (not sure, looks interesting)

[...]

You might consider merging these two. When I wrote test_hung_download, I wasn't familiar enough with the existing tests to see that it was partially duplicating some of test_immutable.

comment:11 Changed at 2010-05-08T20:20:19Z by zooko

If you like this ticket, you might also like the "Brian's New Downloader" bundle of tickets: #605 (two-hour delay to connect to a grid from Win32, if there are many storage servers unreachable), #800 (improve alacrity by downloading only the part of the Merkle Tree that you need), #809 (Measure how segment size affects upload/download speed.), #287 (download: tolerate lost or missing servers), and #448 (download: speak to as few servers as possible).

comment:12 Changed at 2010-05-08T22:47:48Z by zooko

  • Milestone changed from 1.7.0 to 1.8.0

Brian's New Downloader is now planned for v1.8.0.

Changed at 2010-05-10T03:55:47Z by warner

12 uncovered lines left, some tests disabled

comment:13 Changed at 2010-05-10T03:58:31Z by warner

I did a lot of code-coverage work, so the v7 patch fixes a number of real bugs in the previous ones. There are still some significant functional things to fix, though, notably the state=OVERDUE heuristic is missing, and ShareFinder isn't sending requests in parallel. Both fronts will benefit from download-status displays. I found an interesting JS library which might be good for generating the display: http://www.simile-widgets.org/timeline/

Changed at 2010-05-27T23:59:51Z by warner

more integration, refactoring

comment:14 Changed at 2010-05-28T00:12:07Z by warner

The v8 patch has a lot of integration work: the new downloader now mostly lives in immutable/downloader/ , all the filenodes live in immutable/filenode.py (except for Literal, which was moved into immutable/literal.py). Repairer was rewritten to use the new downloader (and it's waaaay simpler now). Some of the old downloader code was removed (except that Verifier still uses it). The download-status display is functional, but shows mostly raw request-response timestamps (the SIMILE Timeline work is not in this patch, and might not really want to be in the Tahoe node at all, maybe it should go into a separate tool).

At this point, the new downloader is almost as good as the old downloader, so I'm starting to think about landing it after 1.7 is released. It doesn't yet handle servers hanging very well, but the old downloader didn't either. The biggest problem is that the new downloader will basically pick share-holding servers at random, rather than preferring the fastest responders. The old downloader would, I think, stick with the first 'k' servers that responded positively to the DYHB requests, so even though it couldn't tolerate the loss of any server, it would use the fastest-responding ones. The new downloader could easily pick the slowest responders by accident.

The biggest feature I want to write next is the hanging-server handling (state=OVERDUE). I know that it will take some fiddling with the heuristics before we find something that feels right. I'd prefer to have the timeline visualization in place to support this work, but it's looking to be a decent amount of work, so I may plunge ahead without it, or find some alternative approach (GD or some python image-drawing library, probably without zooming). To do state=OVERDUE right will also provoke structural changes to the way we manage remote servers (specifically I want an object with a lifetime equal to the TCP connection's, which remembers how long requests took in the past, so it can help guess if an outstanding request is overdue or merely slow). I might want to land this patch first, before starting on that work, since I'm really far out on a branch right now. My patch is updated to current trunk, but if anyone gets busy and makes some deep changes to the tree, I may have a challenging merge job ahead. So merging sooner rather than later feels like a good idea.

Changed at 2010-06-28T22:32:24Z by warner

more small improvements

Changed at 2010-07-26T02:26:52Z by warner

added OVERDUE for get_buckets calls, reenabled some "hung server" tests

comment:15 Changed at 2010-07-29T21:27:57Z by zooko

  • Keywords review-needed added

Whoo! Ready for review! And user testing. Try it out!

comment:16 Changed at 2010-08-01T05:31:30Z by davidsarah

I needed to fix a few minor issues when converting this to a darcs patch:

  • the patch to client.py touches a line next to another line changed in the ticket1074 branch (no real conflict)
  • 'patch' failed to create the empty file src/allmydata/download/__init__.py
  • pyflakes warning: src\allmydata\test\test_hung_server.py:12: '_corrupt_share_data' imported but unused
  • DEFAULT_MAX_SEGMENT_SIZE missing from interfaces.py
  • some uses of find_shares (two in test/no_network.py and one in test/test_download.py) needed to be changed to find_uri_shares.
  • 'print r_ev' statement in util/spans.py should be commented out.

With these changes it doesn't fail any tests on my machine.

Changed at 2010-08-01T05:49:45Z by davidsarah

Brian's New Downloader, for testing in 1.8beta (or alpha)

comment:17 Changed at 2010-08-09T01:51:42Z by davidsarah

  • Owner changed from warner to davidsarah
  • Status changed from new to assigned

comment:18 Changed at 2010-08-09T05:12:53Z by davidsarah

Reviewing cbcb728e7ea0031d:

  • the (start, length) form of the [Simple]Spans constructor is not used outside test code (and the test code can be changed to pass a [(start, length)] array). Removing this would slightly simplify the constructor and avoid a possibly error-prone overloading.
  • in the Spans class comment, ", frequently used to represent .newsrc contents" is out-of-context and not needed.
  • in the _check method of Spans, if assertions are switched on then the self._spans array is re-sorted in order to check whether it is ordered. This is unnecessary: if you add an assert length > 0, length in the loop, then the loop will be checking a condition that is stronger than the array being ordered, given that the starts and lengths are numbers. (The sorting actually takes O(n) time rather than O(n log n) on each call to _check, because Timsort will detect the ordered input, but it's still unnecessary overhead.)
  • the assert statements should include the variables they use in the exception message, e.g. assert start > prev_end, (start, prev_end).
  • "overlap(s_start, s_length, start, length) or adjacent(s_start, s_length, start, length)" is equivalent to "overlap(s_start, s_length, start-1, length+2)".
  • in the only other use of adjacent (in DataSpans.add), only the start0 < start1 case should be able to occur. Inline and simplify.
  • in the loop over enumerate(self._spans), you could exit early when s_start > start+length. At that point you know where to insert the (start, length) span without having to re-sort.
  • a Spans object behaves somewhat like a set of the elements in all of its spans, but the __contains__ and __iter__ methods are not consistent with that (instead viewing it as a set of (start, length) pairs). I realize this may allow for more convenient use of in and iteration, but it should at least be documented.
  • _check and assert_invariants do similar things; give them the same name.
  • DataSpans._dump is poorly named.
  • DataSpans.assert_invariants should check that none of the data strings are zero-length.
  • is it intentional that DataSpans.add calls self.assert_invariants() but remove (and pop, although that's much simpler) don't?
  • if s_start <= start < s_end: I find this Python construct too clever by half. Anyway, at this point s_start <= start is always true (otherwise we won't get past the first continue), so I would write this as
    assert s_start <= start, (s_start, start)
    if start < s_end:
    
  • Perhaps rename s_* to old_*.
  • DataSpans.add: if I understand correctly, case A also covers:
        OLD      OLD    OLD    OLD
    NEW       NEW      NEWW   NEEWW
    
    This isn't immediately clear from the comment. Depicting it as
        OLD
    NEW.....
    
    might help. Also, use uppercase consistently for the cases.
  • suffix_len in case D has a different meaning to suffix_len in case E. Maybe rename it to replace_len for case D.

Tests:

  • The Spans class is tested by ByteSpans, but it just stores spans of integers, not necessarily byte offsets. I would suggest s/ByteSpans/TestSpans/ and s/StringSpans/TestDataSpans/.
  • I could be wrong, but I don't think the deterministic tests are covering all cases in add and remove. I'd prefer to see those tests have full coverage rather than relying on the randomized tests to make up the gap.

comment:19 Changed at 2010-08-09T20:10:51Z by warner

good points! I think I'll implement most of them, but after the 1.8.0 release.

comment:20 Changed at 2010-08-14T05:33:40Z by zooko

  • Keywords docs-needed added

This patch broke some docs:

When downloading a file, the current version just asks all known servers for any shares they might have…

When asked to read an arbitrary range of an immutable file, Tahoe-LAFS will download from the beginning of the file up until it has enough of the file to satisfy the requested read…

I think these documentation issues should be fixed before we release Tahoe-LAFS v1.8.0 final.

comment:21 follow-up: Changed at 2010-08-14T18:20:02Z by warner

Let's make a new ticket for the improvements suggested in comment:18, so we can close this ticket as soon as the docs fixes in comment:20 are resolved.

Changed at 2010-08-30T03:46:28Z by davidsarah

docs/performance.txt, architecture.txt: updates taking into account new downloader. refs #798

comment:22 Changed at 2010-08-30T03:47:02Z by davidsarah

  • Keywords docs added; docs-needed removed

comment:23 in reply to: ↑ 21 Changed at 2010-09-10T19:36:56Z by zooko

Replying to warner:

Let's make a new ticket for the improvements suggested in comment:18, so we can close this ticket as soon as the docs fixes in comment:20 are resolved.

Okay I created #1196 (clean up and optimize spans).

comment:24 Changed at 2010-09-10T19:37:45Z by zooko

  • Owner changed from davidsarah to zooko
  • Status changed from assigned to new

comment:25 Changed at 2010-09-10T19:37:52Z by zooko

  • Status changed from new to assigned

comment:26 Changed at 2010-09-10T20:02:35Z by zooko

  • Owner changed from zooko to davidsarah
  • Status changed from assigned to new

comment:27 Changed at 2010-09-10T20:14:33Z by david-sarah@…

In f32dddbcedea3c7c:

docs/frontends/FTP-and-SFTP.txt: docs/performance.txt, architecture.txt: updates taking into account new downloader (revised). refs #798

comment:28 Changed at 2010-09-10T20:54:17Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.