#1181 new defect

new-downloader requests too much data, builds up

Reported by: warner Owned by:
Priority: major Milestone: soon
Component: code-network Version: 1.8β
Keywords: unfinished-business immutable download performance Cc: zooko
Launchpad Bug:

Description

In #1170 comment:43 I describe a problem that I noticed in the way the new downloader code handles data for long reads (hundreds of segments). It looks like we ask for data which we don't end up using. The .received DataSpans structure on one of the later-to-arrive shares grew by about 350 ranges (and 35kB) over the course of 762 segments. The one increment I looked at was by 64 bytes.

I have two guesses, the second more likely than the first:

  • our use of IncompleteHashTree.needed_hashes() with the include_leaf=True argument is telling us that we should ask for e.g. the hash of block0, so we ask for that part of the hash tree. But then we retrieve block0 itself, and can hash it ourselves and add it to the hash tree, meaning we never use the H(block0) data from self._received. This would result in a 32-byte hash value being added (and never removed) from Self._received once every other segment.
  • The ciphertext hash tree nodes are requested from all shares, because we never know which one will arrive first (and which ones won't arrive at all). But we only consume the data for these hash nodes out of the first share that provides them: we leave the others alone. This will result in a variable amount of data being left over in each share: larger amounts when the ciphertext hash tree path down to the current segnum leaf flips over higher branches (half of the time you'll only get one extra node, a quarter of the time you'll get two extra nodes, etc, maximizing on the first and numsegs/2 segments in which you have to get a full ln2(numsegs) nodes).

I'm not so sure it's the first, since I ran the unit tests with those include_leaf= arguments set to False, and they failed.

But the ciphertext hash tree feels pretty likely.

The simple fix is to just discard the whole self._received structure after each segment is complete: in the ideal case (correctly guessed segsize), it should be empty at that point anyways, and even if the segsize guess was wrong, the bandwidth hit for flushing the data is probably not more than a single block (and the round-trips hit is probably zero). Here's a patch to implement the simple fix:

diff --git a/src/allmydata/immutable/downloader/share.py b/src/allmydata/immutable/downloader/share.py
index f7ed4e8..413f907 100644
--- a/src/allmydata/immutable/downloader/share.py
+++ b/src/allmydata/immutable/downloader/share.py
@@ -531,6 +531,9 @@ class Share:
             for o in observers:
                 # goes to SegmentFetcher._block_request_activity
                 o.notify(state=COMPLETE, block=block)
+            # now clear our received data, to dodge the #1170 spans.py
+            # complexity bug
+            self._received = DataSpans()
         except (BadHashError, NotEnoughHashesError), e:
             # rats, we have a corrupt block. Notify our clients that they
             # need to look elsewhere, and advise the server. Unlike

The complex fix is to consult the ciphertext hash tree when these node hashes arrive, and either add them to the hash tree or discard them (instead of the current practice of always storing them and then only remove them if the hash tree still needs them). It might be easier to do this if DataSpans had a feature to label each byte (in some efficient way), so you could ask it for the ranges labelled "ciphertext hash tree nodes", and discard just those. (this might also help with pipelining segments, so you merge requests but still get the large blocks of data back in the right order, to minimize your buffering needs).

Attachments (1)

1181-patch.diff (2.1 KB) - added by warner at 2010-08-19T00:57:38Z.
patch which also fixes unit test failures

Download all attachments as: .zip

Change History (6)

comment:1 Changed at 2010-08-19T00:21:36Z by warner

Incidentally, that patch causes a few unit tests to fail, but it's only the two in test_immutable that assert a specific number of read() calls. They can safely be modified to raise the call limit.

Changed at 2010-08-19T00:57:38Z by warner

patch which also fixes unit test failures

comment:2 Changed at 2010-09-06T06:33:54Z by zooko

  • Keywords unfinished-business immutable download performance added

We've applied 1181-patch.diff as c89a464510394089 but I guess that's not the real solution, so I'm leaving this ticket open.

comment:3 Changed at 2010-09-10T21:28:41Z by zooko

  • Milestone changed from soon to 1.9.0

comment:4 Changed at 2011-08-12T03:15:13Z by zooko

  • Cc zooko added

comment:5 Changed at 2011-08-15T04:16:29Z by davidsarah

  • Milestone changed from 1.9.0 to soon
Note: See TracTickets for help on using tickets.