#1170 closed defect

new-downloader performs badly when downloading a lot of data from a file — at Version 81

Reported by: zooko Owned by:
Priority: critical Milestone: 1.8.0
Component: code-network Version: 1.8β
Keywords: immutable download performance regression Cc:
Launchpad Bug:

Description (last modified by terrell)

I tried to download a movie file from Test Grid with Tahoe-LAFS v1.8.0c2 (new-downloader) and the file wouldn't download fast enough or smoothly enough to watch in a streaming fashion during download.

Some of Zooko's most recent measurements:

run..... version downloaded download KBps flags
101 1.8.0c2 100 MB 91
102 1.7.1 100 MB 182
103 1.7.1 100 MB 207
104 1.8.0c2 100 MB 82
105 1.7.1 100 MB 211
109 1.7.1 100 MB 228 cProfile
110 1.8.0c2 100 MB 113 cProfile
111 1.8.0c2 413 MB 19 spanstrace
112 1.8.0c2 456 MB 113 spanstrace, spans.py.diff
113 1.7.1 543 MB 241 on office network
114 1.8.0c2 456 MB 147 spans.py.diff + comment:54
115 1.7.1 456 MB 224 flog
116 1.8.0c2 314 MB 154 spans.py.diff + comment:54 flog
117 1.8.0c2 36 MB 160 cProfile spans.py.diff + comment:54 prof
118 1.8.0c2 1490 MB 158 cProfile spans.py.diff + comment:54 prof
119 1.7.1 543 MB 99 on office network, noflog
120 1.8.0c2 543 MB 319 on office network, noflog, spans.py.diff + comment:54
121 1.7.1 543 MB 252 on office network, noflog
122 1.8.0c2 1490 MB 157 noflog, spans.py.diff + comment:54, nologging in immutable/download/share.py
123 1.8.0c2 100 MB 242 noflog, spans.py.diff + comment:54, nologging in immutable/download/*.py status.html
124 1.7.1 100 MB 219 noflog
125 1.8.0c2 100 MB 161 noflog, spans.py.diff + comment:54, nologging in immutable/download/*.py
126 1.7.1 100 MB 223 noflog
127 1.8.0c2 100 MB 155 noflog, spans.py.diff + comment:54, nologging in immutable/download/*.py status.html
129 1.8.0c2 100 MB 291 on office network, noflog, 1170-combo.diff
130 1.8.0c2 100 MB 179 on office network, noflog, 1170-combo.diff
131 1.8.0c2 100 MB 276 on office network, noflog, 1170-combo.diff
132 1.8.0c2 100 MB 179 on office network, noflog, 1170-combo.diff
133 1.8.0c2 100 MB 279 on office network, noflog, 1170-combo.diff
134 1.8.0c2 100 MB 262 on office network, noflog, 1170-combo.diff
135 1.8.0c2 100 MB 180 on office network, noflog, 1170-combo.diff
136 1.8.0c2 100 MB 284 on office network, noflog, 1170-combo.diff
137 1.8.0c2 100 MB 286 on office network, noflog, 1170-combo.diff
138 1.7.1 867 MB 265 on office network, noflog
139 1.8.0c2 866 MB 169 on office network, noflog, 1170-combo.diff status.html
140 1.7.1 100 MB 169 terrell, home cable modem
141 1.8.0c2 100 MB 223 terrell, home cable modem, 1170-combo.diff status.html
142 1.8.0c2 100 MB 215 terrell, home cable modem, 1170-combo.diff status.html
143 1.8.0c2 100 MB 235 terrell, home cable modem, 1170-combo.diff status.html
144 1.7.1 100 MB 159 terrell, home cable modem
145 1.8.0c2 100 MB 131 terrell, home cable modem, 1170-combo.diff status.html
146 1.7.1 100 MB 184 terrell, home cable modem
147 1.8.0c2 100 MB 233 terrell, home cable modem, 1170-combo.diff status.html
148 1.7.1 100 MB 172 terrell, home cable modem
149 1.8.0c2 100 MB 259 terrell, home cable modem, 1170-combo.diff status.html
150 1.7.1 100 MB 141 terrell, home cable modem
151 1.8.0c2 100 MB 139 terrell, home cable modem, 1170-combo.diff status.html
152 1.7.1 100 MB 159 terrell, home cable modem
153 1.8.0c2 100 MB 241 terrell, home cable modem, 1170-combo.diff status.html
154 1.7.1 100 MB 172 terrell, home cable modem
155 1.8.0c2 100 MB 229 terrell, home cable modem, 1170-combo.diff status.html
156 1.7.1 100 MB 159 terrell, home cable modem
157 1.8.0c2 100 MB 174 terrell, home cable modem, 1170-combo.diff status.html
158 1.7.1 100 MB 173 terrell, home cable modem
159 1.8.0c2 100 MB 262 terrell, home cable modem, 1170-combo.diff status.html
160 1.7.1 100 MB 152 terrell, home cable modem
161 1.8.0c2 100 MB 248 terrell, home cable modem, 1170-combo.diff status.html
162 1.7.1 100 MB 135 terrell, home cable modem

Change History (139)

comment:1 Changed at 2010-08-12T16:28:41Z by zooko

Oh, all those 32-byte reads must have been all the hashes in the Merkle Trees. I assume that those are indeed coalesced using the clever spans structure src/allmydata/util/spans.py@4666. Nevertheless we should investigate the very poor performance shown in this download status file.

comment:2 follow-ups: Changed at 2010-08-12T18:15:54Z by warner

yeah, the 32/64-byte reads are hashtree nodes. The spans structure only coaleses adjacent/overlapping reads (the 64-byte reads are the result of two neighboring 32-byte hashtree nodes being fetched), but all requests are pipelined (note the "txtime" column in the "Requests" table, which tracks remote-bucket-read requests), and the overhead of each message is fairly small (also note the close proximity of the "rxtime" for those batches of requests). So I'm not particularly worried about merging these requests further.

My longer-term goal is to extend the Spans data structure with some sort of "close enough" merging feature: given a Spans bitmap, return a new bitmap with all the small holes filled in, so e.g. a 32-byte gap between two hashtree nodes (which might not be strictly needed until a later segment is read) would be retrieved early. The max-hole-size would need to be tuned to match the overhead of each remote-read message (probably on the order of 30-40 bytes): there's a breakeven point somewhere in there.

Another longer-term goal is to add a readv()-type API to the remote share-read protocol, so we could fetch multiple ranges in a single call. This doesn't shave much overhead off of just doing multiple pipelined read() requests, so again it's low-priority.

And yes, a cleverer which-share-should-I-use-now algorithm might reduce stalls like that. I'm working on visualization tools to show the raw download-status events in a Gantt-chart -like form, which should make it easier to develop such an algorithm. For now, you want to look at the Request table for correlations between reads that occur at the same time. For example, at the +1.65s point, I see several requests that take 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. But note that sh8 (from the other server) took even longer: this suggests that the congestion was on your end of the line, not theirs, especially since the next segment arrived in less than half a second.

comment:3 in reply to: ↑ 2 Changed at 2010-08-12T18:38:22Z by zooko

Replying to warner:

yeah, the 32/64-byte reads are hashtree nodes. The spans structure only coaleses adjacent/overlapping reads (the 64-byte reads are the result of two neighboring 32-byte hashtree nodes being fetched), but all requests are pipelined (note the "txtime" column in the "Requests" table, which tracks remote-bucket-read requests), and the overhead of each message is fairly small (also note the close proximity of the "rxtime" for those batches of requests).

I don't understand what those columns mean (see #1169 (documentation for the new download status page)).

comment:4 in reply to: ↑ 2 Changed at 2010-08-12T19:12:36Z by zooko

Replying to warner:

For now, you want to look at the Request table for correlations between reads that occur at the same time.

I'm having trouble interpreting it (re: #1169).

For example, at the +1.65s point, I see several requests that take 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5. But note that sh8 (from the other server) took even longer: this suggests that the congestion was on your end of the line, not theirs, especially since the next segment arrived in less than half a second.

I tried to watch the same movie from my office network and got similarly unwatchable results, download status page attached. Could it be a problem with the way my client, VLC.app, is reading?

Changed at 2010-08-12T19:13:18Z by zooko

Changed at 2010-08-12T21:04:16Z by zooko

comment:5 Changed at 2010-08-12T21:04:27Z by zooko

Well, it wasn't the VLC.app client. I did another download of the same file using wget. The performance was bad--38 KB/s:

p$ wget http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
--2010-08-12 12:54:47--  http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
Resolving localhost... ::1, fe80::1, 127.0.0.1
Connecting to localhost|::1|:3456... failed: Connection refused.
Connecting to localhost|fe80::1|:3456... failed: Connection refused.
Connecting to localhost|127.0.0.1|:3456... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1490710513 (1.4G) [application/ogg]
Saving to: `bbb-360p24.i420.lossless.drc.ogg.fixed.ogg+bbb-24fps.flac.via-ffmpeg.ogg'

16% [=========================>                                                                                                                                       ] 247,203,678 38.0K/s  eta 8h 0m   ^C

Here is the download status page for this download (attached). Note that one server had a DYHB RTT of 3 minutes and another had a DYHB RTT of 8 minutes! There were no incident report files or twistd.log entries.

comment:6 Changed at 2010-08-12T21:10:31Z by zooko

The two servers with dramatically higher DYHB RTTs introduced themselves as:

Service Name  	
Nickname
PeerID
Connected? 	Since 	First Announced
Version
storage  	
linuxpal
rpiw4n3ffzygwyzlkjb55upikk6mewtv
Yes: to 18.62.1.14:55058 	13:57:08 12-Aug-2010 	21:26:57 11-Aug-2010
allmydata-tahoe/1.7.1

and

storage  	
sunpal7
62nlabgfiuzbfseufd2yoymbjzsenbuz
Yes: to 18.62.6.169:64882 	14:25:33 12-Aug-2010 	21:26:57 11-Aug-2010
allmydata-tahoe/1.7.1

I pinged their IP addresses:

--- 18.62.6.169 ping statistics ---
21 packets transmitted, 21 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 74.049/78.417/91.016/3.839 ms
--- 18.62.1.14 ping statistics ---
21 packets transmitted, 21 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 73.153/78.478/92.159/4.260 ms

Changed at 2010-08-12T21:33:50Z by zooko

comment:7 Changed at 2010-08-12T21:34:16Z by zooko

  • Keywords regression performance added
  • Priority changed from major to critical

Okay, I've finally realized that this is a regression of the feature that we added in v1.6.0 to start fetching blocks as soon as you've learned about enough shares and to use the lowest-latency servers. Attached is the download status page from v1.7.1 of trying to download this same file from the same test grid. It performs much better:

$ wget http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
--2010-08-12 15:06:55--  http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg
Resolving localhost... ::1, fe80::1, 127.0.0.1
Connecting to localhost|::1|:3456... failed: Connection refused.
Connecting to localhost|fe80::1|:3456... failed: Connection refused.
Connecting to localhost|127.0.0.1|:3456... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1490710513 (1.4G) [application/ogg]
Saving to: `bbb-360p24.i420.lossless.drc.ogg.fixed.ogg+bbb-24fps.flac.via-ffmpeg.ogg.1'

 1% [=>                                                                                                                                                               ] 25,182,400   260K/s  eta 1h 43m  ^C

We can't release Tahoe-LAFS v1.8.0 with this behavior because it is a significant regression: people who use grids with slow or occasionally slow servers such as the public Test Grid would be ill-advised to upgrade from v1.7.1 to v1.8.0 and we don't like to release new versions that some users are ill-advised to upgrade to.

comment:8 Changed at 2010-08-12T21:46:52Z by zooko

I've noticed that when tickets get more than one attachment it becomes confusing for the reader to understand what is what, so here's a quick recap:

comment:9 Changed at 2010-08-12T21:53:39Z by zooko

The feature that we released in v1.6.0 was ticket #928, and we did add some sort of unit tests for it, by making some servers not respond to DYHB at all: http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_hung_server.py?rev=37a242e01af6cf76

(In the pre-1.6.0 version, that situation would cause download to stall indefinitely, so that was our primary goal at that time and that is what the tests ensure no longer happens.)

comment:10 Changed at 2010-08-12T21:54:35Z by zooko

Note: the wget speed indicator is telling about "current" speed and so it varies a lot during a download. To get reliable speed measurements I guess I should let the wget finish which means, I suppose, I should download a smaller file! I would use the download status pages timings as an indicator of performance instead of the wget speed indicator.

comment:11 in reply to: ↑ 2 Changed at 2010-08-12T22:10:16Z by davidsarah

Replying to warner:

One clear improvement would be to fetch shares 0 and 5 from different servers: whatever slowed down the reads of sh0 also slowed down sh5.

Yes, I was going to point that out. Given that the DYHB responses were:

serverid   sent         received     shnums   RTT
lv3fqmev   +0.001393s   +0.105560s            104ms
tavrk54e   +0.003111s   +0.211572s            208ms
xflgj7cg   +0.004095s   +0.111008s            107ms
sp26qyqc   +0.006173s   +0.117722s   0,5      112ms
sroojqcx   +0.007326s   +0.297761s   1,6      290ms
4rk5oqd7   +0.008324s   +0.212271s   2        204ms
nszizgf5   +0.009295s   +0.204480s   3,7,8    195ms
62nlabgf   +0.010490s   +0.203262s            193ms
fp3xjndg   +0.011982s   +0.242262s   4,9      230ms
rpiw4n3f   +0.013246s   +0.113830s            101ms
  • Should the downloader have chosen to get shares from sp26qyqc, nszizgf5 and 4rk5oqd7, rather than getting two shares from sp26qyqc and one from nszizgf5?
  • The file has a happiness of 5. Shouldn't the uploader have distributed the shares more evenly?
Last edited at 2010-08-13T00:38:15Z by davidsarah (previous) (diff)

comment:12 Changed at 2010-08-12T22:32:06Z by warner

Yeah, I'd like to see some more quantifiable data. It's a pity that the old-downloader doesn't provide as much information as the new one (a flog might help), but obviously I learned from experience with old-downloader while building the instrumentation on the new-downloader :).

The status data you show from both downloaders show a server in common, and the other server responded to the DYHB very quickly, so for at least the beginning of the download, I don't think the downloader has enough information to do any better.

Many of the new-downloader block-requests (I'm looking at the +179s to +181s mark) show correlated stalls of both the "fast" server (sp26) and the other "slow" server (nszi). If the problem were a single slow server, I'd expect to see big differences between the response times.

Interesting. So, the main known-problem with the new-downloader (at least the one on the top of my personal list) is its willingness to pull multiple shares from the same server (a "diversity failure"), which obviously has the potential to be slower than getting each share from a different server.

This is plausibly acceptable for the first segment, because the moment we receive the DYHB response that takes us above "k" shares, we're faced with a choice: start downloading now, or wait a while (how long??) in the hopes that new responses will increase diversity and result in a faster download.

But after the first segment, specifically after we've received the other DYHB responses, the downloader really ought to get as much diversity as it can, so pulling multiple shares from the same server (when there's an alternative) isn't excusable after that point.

The fix for this is to implement the next stage of the new-downloader project, which is to rank servers (and which-share-from-which-server mappings) according so some criteria (possibly speed, possibly cost, fairness, etc), and reevaluate that list after each segment is fetched. This is closely tied into the "OVERDUE" work, which is tied into the notion of cross-file longer-term server quality/reputation tracking, which is loosely tied into the notion of alternative backend server classes.

And I can't get that stage finished and tested in the next week, nor is a change that big a very stable thing to land this close to a release. So I'm hoping that further investigation will reveal something convenient, like maybe that 1.7.1 is actually just as variable as new-downloader on this grid, or that the two-shares-from-one-server problem isn't as bad as it first appears.

I *do* have a quick-and-dirty patch that might improve matters, which is worth experimenting with. I'll have to dig it out of a dark corner of my laptop, but IIRC it added an artificial half-second delay after receiving >=k shares from fewer than k servers. If new shares were found before that timer expired, the download would proceed with good diversity. If not, the download would begin with bad diversity after a small delay.

It fixed the basic problem, but I don't like arbitrary delays, and didn't address the deeper issue (you could still wind up pulling shares from slow servers even after you have evidence that there are faster ones available), so I didn't include it in #798.

comment:13 Changed at 2010-08-12T22:50:46Z by warner

RE davidsarah's comment:

Yeah, that's the sort of heuristic that I didn't want to guess at. It'll be easier to see this stuff when I land the visualization code. The arrival order of positive responses is:

  • sp26 +117ms (sh0+sh5)
  • nszi +204ms (sh3+sh7+sh8)
  • 4rk5 +212ms (sh2)
  • fp3x +242ms (sh4+sh9)
  • sroo +298ms (sh1+sh6)

At +117ms, we don't have enough shares to download. At +204ms, we have enough shares but we'd like more diversity: we can't know that we could achieve our ideal diversity by waiting another 8 milliseconds, so we start downloading the first segment immediately.

By the time the second segment is started (at +977ms), we have a clearer picture of the available shares. We also have about 40kB of experience with each server (or 80kB for sp26, since we happened to fetch two shares from it), which we might use to make some guesses about speeds. When the second segment is started, at the very least we should prefer an arragement that gives us one share from each server. We might also like to prefer shares that we've already been using (since we'll have to fetch fewer hash-tree nodes to validate them); note that these two goals are already in conflict. We should prefer servers which gave us faster responses, if we believe that they're more likely to give fast responses in the future. But if we only hit up the really fast servers, they'll be using more bandwidth, which might cost them money, so they might prefer that we spread some of the load onto the slower servers, whatever we mutually think is fair.

And we need serendipity too: we should occasionally download a share from a random server, because it might be faster than any of the ones we're currently using, although maybe it won't be, so a random server may slow us down. All five of these goals conflict with each other, so there are weights and heuristics involved, which will change over time.

And we should remember some of this information beyond the end of a single download, rather than starting with an open mind each time, to improve overall efficiency.

So yeah, it's a small thread that, when tugged, pulls a giant elephant into the room. "No no, don't tug on that, you never know what it might be attached to".

So I'm hoping to find a quicker smaller solution for the short term.

comment:14 Changed at 2010-08-13T05:45:37Z by zooko

Brian asked for better measurements, and I ran quite a few (appended below). I think these results are of little use as they are very noisy and as far as I can tell I was just wrong when I thought, earlier today, that 1.8.0c2 was downloading this file slower than 1.7.1 did.

On the other hand I think these numbers are trying to tell us that something is wrong. Why does it occasionally take 40s to download 100K?

After I post this comment I will attach some status reports and flogs.

With v1.7.1 and no flogtool tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   8398      0  0:00:11  0:00:11 --:--:-- 25621

real 0m11.913s
user 0m0.004s
sys  0m0.006s

With 1.7.1 and flogtool tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100Kb.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  11044      0  0:00:09  0:00:09 --:--:-- 24679

real 0m9.062s
user 0m0.003s
sys  0m0.006s

v1.7.1 without tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100Kc.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   8134      0  0:00:12  0:00:12 --:--:-- 23310

real 0m12.301s
user 0m0.004s
sys  0m0.006s

v1.7.1 and flogtool tail:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100Kd.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   9050      0  0:00:11  0:00:11 --:--:-- 24716

real 0m11.057s
user 0m0.004s
sys  0m0.006s

Now switched from office to home. v1.7.1 and no flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0  33899      0  0:00:29  0:00:29 --:--:--  287k

real 0m29.509s
user 0m0.006s
sys  0m0.013s

v1.7.1 and flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0  51857      0  0:00:19  0:00:19 --:--:--  228k

real 0m19.294s
user 0m0.005s
sys  0m0.012s

v1.7.1 and no flogtool tail, 100K:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   2499      0  0:00:40  0:00:40 --:--:-- 25031

real 0m40.018s
user 0m0.005s
sys  0m0.011s

v1.7.1 and flogtool tail, 100K:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0   5249      0  0:00:19  0:00:19 --:--:-- 24746

real 0m19.059s
user 0m0.005s
sys  0m0.009s

v1.8.0c2 and no flogtool tail, 100K:

$ time curl --range 0-100000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100K.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  80318      0  0:00:01  0:00:01 --:--:-- 80516

real 0m1.253s
user 0m0.004s
sys  0m0.005s

v1.8.0c2 and no flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0    98k      0  0:00:09  0:00:09 --:--:--  118k

real 0m9.961s
user 0m0.005s
sys  0m0.015s

v1.8.0c2 and flogtool tail, 1M:

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0   210k      0  0:00:04  0:00:04 --:--:--  268k

real 0m4.645s
user 0m0.004s
sys  0m0.009s

Changed at 2010-08-13T05:47:09Z by zooko

Changed at 2010-08-13T05:47:25Z by zooko

comment:15 Changed at 2010-08-13T05:57:27Z by zooko

I just did one more download with 1.7.1 of 1M in order to get both the status page and the flog I named this download "run 99" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 99, at my home, with Tahoe-LAFS v1.7.1, the first 1M of http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg :

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0  38477      0  0:00:25  0:00:25 --:--:--  245k'

real 0m25.999s
user 0m0.005s
sys  0m0.012s

I will now attach the status output and flog of run 99.

comment:16 Changed at 2010-08-13T06:14:54Z by zooko

I just did one more download with 1.8.0c2 of 1M in order to get both the status page and the flog I named this download "run 100" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 100, at my home, with Tahoe-LAFS v1.8.0c2, the first 1M of http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg :

$ time curl --range 0-1000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-1M.ogg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0   139k      0  0:00:06  0:00:06 --:--:--  169k

real 0m7.377s
user 0m0.004s
sys  0m0.010s

comment:17 Changed at 2010-08-13T06:21:12Z by zooko

  • Keywords regression removed
  • Milestone changed from 1.8.0 to eventually
  • Priority changed from critical to major

Okay there's no solid evidence that there is a regression from 1.7.1. I think Brian should use this ticket to analyze my flogs and status pages if he wants and then change it to be a ticket about download server selection. :-) Removing "regression".

comment:18 Changed at 2010-08-13T06:52:56Z by zooko

I just did one more download with 1.8.0c2 of 100M in order to get both the status page and the flog I named this download "run 101" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 101, at my home, with Tahoe-LAFS v1.8.0c2, the first 1M of http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg :

$ echo this is run 101 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 101
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0  90900      0  0:18:20  0:18:20 --:--:-- 96978

real 18m20.118s
user 0m0.097s
sys  0m0.536s

comment:19 Changed at 2010-08-13T07:18:54Z by zooko

I just did one more download with 1.7.1 of 100M in order to get both the status page and the flog I named this download "run 102" so that I could keep its status page, flog, and stdout separate from all the others on this ticket. Here is run 102, at my home, with Tahoe-LAFS v1.7.1, the first 1M of http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg :

$ echo this is run 102 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 102
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   182k      0  0:08:54  0:08:54 --:--:-- 26.3M

real 8m54.910s
user 0m0.066s
sys  0m0.478s

comment:20 Changed at 2010-08-13T07:19:10Z by zooko

Interesting that 1.7.1 was twice as fast as 1.8.0c2 this time.

Changed at 2010-08-13T07:19:48Z by zooko

comment:21 Changed at 2010-08-13T07:20:37Z by zooko

Annoyingly, 1.7.1 has a bug where it doesn't show downloads in the status page sometimes, and that happened this time, so I can't show you the status page for run 102.

comment:22 Changed at 2010-08-13T15:51:30Z by zooko

run 103 1.7.1 the first 100M http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg

$ echo this is run 103 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 103
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   207k      0  0:07:50  0:07:50 --:--:-- 27.1M

real 7m50.696s
user 0m0.063s
sys  0m0.469s

Changed at 2010-08-13T15:52:22Z by zooko

comment:23 Changed at 2010-08-13T17:51:36Z by zooko

run 104

1.8.0rc2-4698

the first 100M

http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg

$ echo this is run 104 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 104
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0  81763      0  0:20:23  0:20:23 --:--:-- 57095

real 20m23.119s
user 0m0.102s
sys  0m0.554s

comment:24 Changed at 2010-08-13T17:56:14Z by zooko

  • Keywords regression added
  • Milestone changed from eventually to 1.8.0
  • Priority changed from major to critical

Hm, okay it really looks like there is a substantial (2X) slowdown for using Tahoe-LAFS v1.8.0c2 instead of v1.7.1 on today's (and yesterday's) Test Grid. I'm re-adding the regression tag which means I think this issue should block 1.8.0 release until we at least understand it better.

Changed at 2010-08-13T17:57:41Z by zooko

comment:25 Changed at 2010-08-13T18:45:51Z by zooko

run 105

1.7.1

the first 100M

http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg

$ echo this is run 105 tahoe-lafs v1.7.1 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 105 tahoe-lafs v1.7.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   211k      0  0:07:40  0:07:40 --:--:-- 19.8M

real 7m41.179s
user 0m0.061s
sys  0m0.481s

Changed at 2010-08-13T18:46:45Z by zooko

comment:26 Changed at 2010-08-13T19:10:39Z by zooko

run 106

1.8.0c2-r4698

the first 100M

http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg

$ echo this is run 106 tahoe-lafs v1.8.0c2-r4698 ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 106 tahoe-lafs v1.8.0c2-r4698
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   112k      0  0:14:29  0:14:29 --:--:-- 94083

real 14m29.309s
user 0m0.082s
sys  0m0.528s

comment:27 follow-up: Changed at 2010-08-14T01:52:35Z by warner

I had an idea for a not-too-complex share-selection algorithm this morning:

  • first, have the ShareFinder report all shares as soon as it learns about them, instead of its current behavior of withholding them until someone says that they're hungry
  • also record the DYHB RTT in each Share object, so it can be found later. Keep the list of Shares sorted by this RTT (with share-number as the secondary sort key).
  • then, each time the SegmentFetcher needs to start using a new share, use the following algorithm:
sharemap = {} # maps shnum to Share instance
num_shares_per_server = {} # maps Server to a count of shares
for max_shares_per_server in itertools.count(1):
  progress = False
  for sh in shares:
    if sh.shnum in sharemap:
      continue
    if num_shares_per_server[sh.server] >= max_shares_per_server:
      continue
    sharemap[sh.shnum] = sh
    num_shares_per_server[sh.server] += 1
    progress = True
    if len(sharemap) >= k:
      return SUCCESS
  if not progress:
    return FAIL

The general idea is to cycle through all the shares we know about, but first try to build a sharemap that only uses one share per server (i.e. perfect diversity). That might fail because the shares are not diverse enough, so we can walk through the loop a second time and be willing to accept two shares per server. If that fails, we raise our willingness to three shares per server, etc. If we ever finish a loop without adding at least one share to our sharemap, we declare failure: this indicates that there are not enough distinct shares (that we know about so far) to succeed.

If this returns FAIL, that really means we should declare "hunger" and ask the ShareFinder to look for more shares. If we return SUCCESS but max_shares_per_server > 1, then we should ask for more shares too (but start the segment anyways: new shares may help the next segment do better).

This is still vulnerable to certain pathological situations, like if everybody has a copy of sh0 but only the first server has a copy of sh1: this will use sh0 from the first server then circle around and have to use sh1 from that server as well. A smarter algorithm would peek ahead, realize the scarcity of sh1, and add sh1 from the first server so it could get sh0 from one of the other servers instead.

But I think this might improve the diversity of downloads without going down the full itertools.combinations-enumerating route that represents the "complete" way to approach this problem.

Last edited at 2010-08-14T18:34:50Z by warner (previous) (diff)

comment:28 follow-up: Changed at 2010-08-14T03:47:11Z by zooko

This seems promising. It sounds like you might think that the slowdown of 1.8.0c2 vs. 1.7.1 on the current Test Grid might be due to one server being used to serve two shares in 1.8.0c2 when two different servers would be used—one for each share—in 1.7.1. Is that what you think? Have you had a chance to look at my flogs attached to this ticket to confirm that this is what is happening?

comment:29 in reply to: ↑ 27 Changed at 2010-08-14T05:18:37Z by zooko

Replying to warner:

But I think this might improve the diversity of downloads without going down the full itertools.combinations-enumerating route that represents the "complete" way to approach this problem.

(Parenthetical historical observation which is pleasurable to me: Your heuristic algorithm for server selection (for download) in comment:27, and your observation that it is susceptible to failure in certain cases, is similar to my proposed heuristic algorithm for server selection for upload in #778 (comment:156:ticket:778, for the benefit of future cyborg archaeologist historians). David-Sarah then observed that finding the optimal solution was a standard graph theory problem named "maximum matching of a bipartite graph". Kevan then implemented it and thus we were able to finish #778.)

My copy of Cormen, Leiserson, Rivest 1st Ed. says (chapter 27.3) that the Ford-Fulkerson solution requires computation O(V * E) where V is the number of vertices (num servers plus num shares) and E is the number of edges (number of (server, share) tuples).

Now what Kevan actually implemented in happinessutil.py just returns the size of the maximum matching, and what we want here is an actual matching. I'm not 100% sure but I think if you save all the path's that are returned from augmenting_path_for() in servers_of_happiness() and return the resulting set of paths then you'll have your set of server->share mappings.

comment:30 in reply to: ↑ 28 Changed at 2010-08-14T06:08:34Z by zooko

Replying to zooko:

This seems promising. It sounds like you might think that the slowdown of 1.8.0c2 vs. 1.7.1 on the current Test Grid might be due to one server being used to serve two shares in 1.8.0c2 when two different servers would be used—one for each share—in 1.7.1.

Okay this does appear to be happening in at least one of the slow v1.8.0c2 downloads attached to this ticket. I looked at attachment:1.8.0c2-r4698-run-106-down-0.html and every request-block in it (for three different shares) went to the same server -- nszizgf5 -- which was the first server to respond to the DYHB (barely) and which happened to be the only server that had three shares. So at least for that run, Brian's idea that fetching blocks of different shares from the same server is a significant slowdown seems to be true.

comment:31 Changed at 2010-08-14T07:02:04Z by zooko

  • Summary changed from does new-downloader perform badly for certain situations (such as today's Test Grid)? to new-downloader performs badly when the first server to reply to DYHB has K shares

comment:32 Changed at 2010-08-15T04:08:45Z by zooko

In http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004998.html I wrote:

Hey waitasecond. As far as I understand, Tahoe-LAFS v1.7.1 should also—just like v1.8.0c2—start downloading all three shares from Greg's server as soon as that server is the first responder to the DYHB:

http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/immutable/download.py?rev=9f3995feb9b2769c#L923

Am I misunderstanding? So the question of why 1.7.1 seems to download 2 or 3 times as fast as 1.8.0c2 on this grid remains open.

comment:33 Changed at 2010-08-16T05:25:21Z by zooko

run 107

1.7.1

the first 10MB

with cProfile profiling running but no flog running

$ echo this is run 107 tahoe-lafs v1.7.1 cProfile ; time curl --range 0-10000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-10M.ogg
this is run 107 tahoe-lafs v1.7.1 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9765k  100 9765k    0     0   171k      0  0:00:57  0:00:57 --:--:-- 2424k

real    0m57.139s
user    0m0.010s
sys     0m0.059s
Last edited at 2010-08-16T05:27:14Z by zooko (previous) (diff)

Changed at 2010-08-16T05:28:11Z by zooko

comment:34 Changed at 2010-08-16T05:36:27Z by zooko

run 108

1.8.0c2

the first 10MB

with cProfile profiling running but no flog running

$ echo this is run 108 tahoe-lafs v1.8.0c2 cProfile ; time curl --range 0-10000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-10M.ogg
this is run 108 tahoe-lafs v1.8.0c2 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9765k  100 9765k    0     0   247k      0  0:00:39  0:00:39 --:--:--  264k

real    0m39.496s
user    0m0.009s
sys     0m0.052s

Changed at 2010-08-16T05:37:29Z by zooko

comment:35 Changed at 2010-08-16T05:53:28Z by zooko

run 109

1.7.1

the first 100MB

with cProfile profiling running but no flog running

$ echo this is run 109 tahoe-lafs v1.7.1 cProfile ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 109 tahoe-lafs v1.7.1 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   228k      0  0:07:06  0:07:06 --:--:-- 21.5M

real    7m6.626s
user    0m0.059s
sys     0m0.460s

Changed at 2010-08-16T05:53:54Z by zooko

comment:36 Changed at 2010-08-16T06:13:37Z by zooko

run 110

1.8.0c2

the first 100MB

with cProfile profiling running but no flog running

$ echo this is run 110 tahoe-lafs v1.8.0c2 cProfile ; time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
this is run 110 tahoe-lafs v1.8.0c2 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   113k      0  0:14:19  0:14:19 --:--:--   98k

real    14m19.256s
user    0m0.079s
sys     0m0.504s

Changed at 2010-08-16T06:14:33Z by zooko

Changed at 2010-08-16T06:17:45Z by zooko

comment:37 follow-up: Changed at 2010-08-16T06:22:55Z by zooko

  • Summary changed from new-downloader performs badly when the first server to reply to DYHB has K shares to new-downloader performs badly when downloading a lot of data from a file

Okay, the problem with the current downloader in 1.8.0c2 is that it goes slower and slower as it downloads more and more data from a file. It consistently wins (or at least ties) 1.7.1 in downloads <= 10MB but consistently loses badly for 100 MB. Also the profiling result in attachment:prof-run-110-dump.txt shows major CPU usage in spans:

         661324561 function calls (661269650 primitive calls) in 919.130 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    91950    0.464    0.000    0.646    0.000 spans.py:142(dump)
  1374341    0.310    0.000    0.310    0.000 spans.py:152(__iter__)
    41352    0.062    0.000    0.149    0.000 spans.py:156(__nonzero__)
   142497    0.194    0.000    0.253    0.000 spans.py:159(len)
    18390    0.059    0.000    0.438    0.000 spans.py:164(__add__)
    22962    1.083    0.000    8.079    0.000 spans.py:170(__sub__)
     9195    0.034    0.000    0.346    0.000 spans.py:186(__and__)
153765314  121.874    0.000  197.580    0.000 spans.py:203(overlap)
153189988   54.374    0.000   54.374    0.000 spans.py:215(adjacent)
    18390    1.032    0.000    1.343    0.000 spans.py:238(len)
    18390    5.727    0.000    7.413    0.000 spans.py:248(dump)
    87347    1.430    0.000  460.111    0.005 spans.py:25(__init__)
     9195    0.633    0.000  459.852    0.050 spans.py:256(get_spans)
     6906    0.114    0.000    0.115    0.000 spans.py:260(assert_invariants)
     9193    0.305    0.000    0.372    0.000 spans.py:271(get)
     6906    2.174    0.000    3.233    0.000 spans.py:295(add)
  2678832   32.014    0.000   46.032    0.000 spans.py:34(_check)
     6904    0.742    0.000    1.630    0.000 spans.py:389(remove)
     3436    0.007    0.000    0.042    0.000 spans.py:434(pop)
  1315865  140.193    0.000  458.353    0.000 spans.py:46(add)
  1275620    2.471    0.000    6.287    0.000 spans.py:82(remove)

comment:38 Changed at 2010-08-16T07:01:29Z by zooko

Looking at immutable/downloader/share.py, I have the following review comments:

  • This comment confuses me:
    59           self._pending = Spans() # request sent but no response received yet
    60	        self._received = DataSpans() # ACK response received, with data
    61	        self._unavailable = Spans() # NAK response received, no data
    62	
    63	        # any given byte of the share can be in one of four states:
    64	        #  in: _wanted, _requested, _received
    65	        #      FALSE    FALSE       FALSE : don't care about it at all
    66	        #      TRUE     FALSE       FALSE : want it, haven't yet asked for it
    67	        #      TRUE     TRUE        FALSE : request is in-flight
    68	        #                                   or didn't get it
    69	        #      FALSE    TRUE        TRUE  : got it, haven't used it yet
    70	        #      FALSE    TRUE        FALSE : got it and used it
    71	        #      FALSE    FALSE       FALSE : block consumed, ready to ask again
    

Are _wanted, _requested, and _received old names for _pending, _received, and _unavailable? Or perhaps from a different design entirely? And that's six states, not four.

  • A span is add'ed to _pending in _send_requests() and removed from _pending in _got_data() but is not removed if the request errbacks instead of callbacks. That would be a bug for it still to be marked as "pending" after the request errbacked, wouldn't it?

comment:39 Changed at 2010-08-16T07:18:19Z by zooko

This looks like a bug: share.py _got_data():

        self._received.add(start, data)

That could explain the slowdown -- the items added to _received here are never removed, because the removal code in _satisfy_block_data() is:

        block = self._received.pop(blockstart, blocklen)

I added the following assertions to trunk/src/allmydata/util/spans.py@4666:

hunk ./src/allmydata/util/spans.py 47
             raise
 
     def add(self, start, length):
+        assert isinstance(start, (int, long))
+        assert isinstance(length, (int, long))
         assert start >= 0
         assert length > 0
         #print " ADD [%d+%d -%d) to %s" % (start, length, start+length, self.dump())

And indeed these assertions fail because data is not an integer. However, then when I add this patch:

hunk ./src/allmydata/immutable/downloader/share.py 741
                 share=repr(self), start=start, length=length, datalen=len(data),
                 level=log.NOISY, parent=lp, umid="5Qn6VQ")
         self._pending.remove(start, length)
-        self._received.add(start, data)
+        self._received.add(start, length)
 
         # if we ask for [a:c], and we get back [a:b] (b<c), that means we're
         # never going to get [b:c]. If we really need that data, this block

This causes a bunch of tests to fail in ways that I don't understand.

comment:40 Changed at 2010-08-16T07:35:04Z by zooko

more review notes:

                    self._spans[i] = (left_start, left_length)
                    self._spans.append( (right_start, right_length) )
                    self._spans.sort()

with

                    self._spans[i] = (left_start, left_length)
                    self._spans.insert(i+1, (right_start, right_length))
  • DataSpans.add() calls assert_invariants() which iterates over all the spans. Also DataSpans.add() itself searches for where to make modifications by iterating from the beginning, which seems unnecessary. Couldn't it do a binary search to find the place it needs to modify and then modify only a local neighborhood there?

comment:41 Changed at 2010-08-16T07:37:16Z by zooko

In immutable/downloader/share.py _unavailable can have things added to it (in case of intentional over-read or in case of failing/corrupted server) but never has things removed from it. Does that matter? I suspect that it is intentional and doesn't hurt but I'm not sure.

comment:42 Changed at 2010-08-16T14:17:54Z by zooko

Also Brian discovered another bug in new-downloader last night. Here are some excerpts from IRC.

<warner> hm, ok, so my new share-selection algorithm is, I think, revealing a
	 bug in the rest of the downloader code			        [23:54]
<warner> there's a funny place where the number of segments in the file
	 (needed to build the block-hash-tree) is not known right away,
	 because we have to get the UEB to know for sure	        [23:55]
<warner> so the CommonShare object speculatively creates an
	 IncompleteHashTree, with a guessed value for numsegs
<warner> and then replaces it later
<warner> (in hindsight, it's probably not a good idea to do that.. I should
	 just leave it as None until we know for sure)		        [23:56]
<warner> the test failure is that the first segment uses sh0,sh1,sh5 , and all
	 have their numsegs updated, but the second segment then switches to
	 some different shares, and those new ones are still using the guessed
	 value for numsegs					        [23:57]
<warner> seg1 used sh0,1,6
<zooko> Hm.							        [23:58]
<zooko> Good find!
<warner> and sh6 threw an exception because of the wrong-sized hashtree, so it
	 fell back to sh5
<warner> each time it tried to use a share that wasn't part of the original
	 set, it got that error and returned back to sh0,1,5	        [23:59]

comment:43 follow-up: Changed at 2010-08-16T18:14:04Z by warner

Responses to earlier comments:

  • spans.py definitely looks O()-suspicious. Does attachment:prof-run-110-dump.txt suggest that in a 100MB file download, we spent half of the total time in Spans.__init__?
  • yes, the comment about _wanted, _requested, _received is stale.
  • the failure to remove data from _pending upon errback is a bug
  • LayoutInvalid is better than assert, yeah
  • the self._received.add(start,data) is correct: _received is a DataSpans instance, not Spans, and it holds strings, not booleans. _received holds the data that comes back from the server until the "satisfy" code consumes it. It has methods like get and pop, whereas the simpler Spans class merely has methods for is-range-in-span.
    • that said, I don't understand how assertions added to spans.py#L47 would ever fire. If those same assertions were added to spans.py#L295, I'd get it. What types were start/length in your observed assertion failures? And what was the stack trace?
    • The patch to call self._received.add(start, length) is wrong; it must be called with (int,str).
  • all the comments about efficiency improvements in Spans are probably correct
  • adding data to _unavailable should be benign: the amount of unavailable data is small and constant (if the share is intact, we should only add to _unavailable during the first few reads if we've guessed the segsize wrong).

Now some new ideas. I've found a couple of likely issues.

  • looking at timestamps from flog-run-106, the inter-segment timing definitely is growing over the course of the download. It's noisy, but it goes from about 0.8s at the start (seg0), to about 1.5s-2.0s at the end (seg762). I haven't looked at smaller deltas (i.e. only inside the "desire" code) to rule out network variations, but it certainly points to a leak or complexity increase of some sort that gets worse as the download progresses.
    • comparing downloads of the first 100MB, the middle 100MB, and the last 100MB would rule out anything that's influenced by the absolute segment number.
  • looking at the Spans.dump() strings in the flog, I see that two of the three shares (sh7+sh8) have an ever-growing .received DataSpans structure. A cursory check suggests they are growing by 64 bytes and one range per segment. By the end of the download (seg762), sh7 is holding 37170 bytes in 378 ranges (whereas sh3 only has 1636 bytes in 22 ranges, and remains mostly constant)
    • This means that we're asking for data which we then don't end up using. We keep it around in _received because it might be useful later: maybe we ask for the wrong data because our guess of the segsize (and thus numsegs, and thus the size/placement of the hashtrees) was wrong. But later we might take advantage of whatever we fetched by mistake.
    • I have two theories:
      • the IncompleteHashTree.needed_hashes() call, when asked what hashes we need to validate leaf 0, might tell us we need the hash for leaf 0 too. However, that hash can be computed from the block of data that makes up the leaf, so we don't really need to fetch it. (whereas we *do* need the hash for leaf 1, since it sits on the "uncle chain" for leaf0). If the desire-side code is conservatively/incorrectly asking for the leaf0 hash, but the satisfy-side code doesn't use it, then we'll add a single 32-byte hash node per segment.
      • ciphertext hash tree nodes: when we start working on a segment, the desire-side code will ask for ciphertext hash tree nodes from each segment we're using. However, the satisfy-side code will only use the hashes from the first response: by the time the second response arrives, the ciphertext hash tree is satisfied, so that clause isn't reached. This means that we'll leave that data in ._received forever. This seems most likely: it would explain why the first share (sh3) doesn't grow, whereas the later two shares do, and why I saw a 64-byte increment (the actual growth would depend upon the segment number, and how many new uncle-chain nodes are needed, but 2-nodes is a fairly common value).
  • The .received leftover-data issue shouldn't be such a big deal, N=378 is not a huge number, but the measured increase in inter-segment time suggests that whatever the O() complexity is, N=378 is enough to cause problems.

So I think the next directions to pursue are:

  • build some kind of test framework to exercise a large download without using real remote_read calls, ideally 100MB or 1GB in a few seconds. This would use a Share subclass that returns data immediately (well, after a eventual-send) rather than ever touching a server. It might also need to stub out some of the hashtree checks, but does need real needed_hashes computations. Then we fix the code until this test finishes in a reasonable amount of time. While I wouldn't have the test case assert anything about runtime, I would have it assert things like ._received doesn't grow over the course of the test.
  • measure the CPU seconds needed to download a 100MB file from both old-downloader and new-downloader: if we're looking at a O(n3) problem, it will manifest as a much heavier CPU load. (if we were merely looking at a pipelining failure, the CPU time would be the same, but wallclock time would be higher).
  • stare at Spans (and specifically DataSpans) for computational-complexity problems. Build some tests of these with N=400ish and see how efficient they are. They're supposed to be linear wrt number-of-ranges, but either they aren't, or they're being called in a way which makes it worse
  • consider commenting out the dump() calls for a test, or some assert_invariants calls, to see if we're hitting that old problem where the data structure is efficient unless we leave in the self-checks or debugging messages
  • (hard) figure out how to drop the unused ciphertext-hash-tree nodes from the second and later shares
  • look at IncompleteHashTree.needed_hashes and see if we're actually requesting the leaf node that we don't really need.
  • consider an entirely different DataSpans structure. The perhaps-too-clever overlap/merge behavior is mostly just exercised during the fetch of the first segment, before we're sure about the correct number of segments (we fetch some data speculatively to reduce roundtrips; if we guess wrong, we'll get the wrong data, but DataSpans lets us easily use that data later if it turns out to be what we needed for some other purpose). Perhaps a data structure which was less tuned for merging adjacent ranges would be better, maybe one which has an explicit merge() method that's only called just before the requests are sent out. Or maybe the value of holding on to that data isn't enough to justify the complexity.
    • a related thought (mostly to remind me of the idea later): for pipelining purposes, I'd like to be able to label the bits in a Spans with their purpose: if we send parallel requests for both seg2 and seg3, I'd like the seg2 data to arrive first, so e.g. the hashes needed to validate seg2 should arrive before the bulk block data for seg3. A label on the bits like "this is for seg2" would let us order the requests in such a way to reduce our memory footprint. A label like this might also be useful for handling the unused-ciphertext-hash-tree-nodes problem, if we could remove data from a DataSpans that's labelled with an already-complete segnum.

Finally, the bug zooko mentioned in comment:42 is real. I'm still working on it, but basically it prevents us from using shares that arrive after the initial batch of requests: they are not initialized properly and don't get a correct block hash tree. I'm working on a fix. The symptom is that we fall back to the initial shares, but if those have died, the download will fail, which is wrong.

And I'm still working on the new share-selection algorithm. The code works, and my basic unit tests work, but certain ones require the comment:42 bug to be fixed before it is safe to use (the bug will hurt current downloads, but occurs less frequently).

Changed at 2010-08-16T21:54:35Z by davidsarah

Short-term hack to test for asymptotic inefficiency of DataSpans?.get_spans

comment:44 in reply to: ↑ 37 ; follow-up: Changed at 2010-08-16T22:13:42Z by davidsarah

Replying to zooko:

Okay, the problem with the current downloader in 1.8.0c2 is that it goes slower and slower as it downloads more and more data from a file. It consistently wins (or at least ties) 1.7.1 in downloads <= 10MB but consistently loses badly for 100 MB. Also the profiling result in attachment:prof-run-110-dump.txt shows major CPU usage in spans:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
    87347    1.430    0.000  460.111    0.005 spans.py:25(__init__)
     9195    0.633    0.000  459.852    0.050 spans.py:256(get_spans)
...
  1315865  140.193    0.000  458.353    0.000 spans.py:46(add)

This is the smoking gun. The code of DataSpans.get_spans is:

    def get_spans(self):
        """Return a Spans object with a bit set for each byte I hold"""
        return Spans([(start, len(data)) for (start,data) in self.spans])

and the Spans constructor has the loop:

    for (start,length) in _span_or_start:
        self.add(start, length)

Spans.add does a linear search (plus a sort, if there is no overlap, but Timsort takes linear time for an already-sorted array), so the overall complexity of DataSpans.get_spans is Θ(n2) where n is the number of spans.

Since Spans uses essentially the same invariant as DataSpans for its array of spans (they are sorted with no overlaps or adjacency), it is possible to implement get_spans in Θ(1) time. However I suspect that the important difference here is between Θ(n2) and Θ(n).

The diff's implementation of get_spans includes a call to s._check. It may also be worth doing another profile run without that call.

(Some of my comments in ticket:798#comment:18 would reduce the number of calls to overlap and eliminate calls to adjacent, but I don't think that's the critical issue by itself.)

Last edited at 2010-08-16T22:50:21Z by davidsarah (previous) (diff)

comment:45 in reply to: ↑ 44 Changed at 2010-08-16T23:06:37Z by davidsarah

Replying to davidsarah:

Spans.add does a linear search (plus a sort, if there is no overlap, but Timsort takes linear time for an already-sorted array), so the overall complexity of DataSpans.get_spans is Θ(n2) where n is the number of spans.

Since Spans uses essentially the same invariant as DataSpans for its array of spans (they are sorted with no overlaps or adjacency), it is possible to implement get_spans in Θ(1) time. However I suspect that the important difference here is between Θ(n2) and Θ(n).

Note that, given this problem and Brian's observations in comment:43, the overall time for a download will be Θ(n3). So maybe we do need a better data structure (some sort of balanced tree or heap, maybe) if we want to get to Θ(n log n) rather than Θ(n2) for the whole download. But maybe that can wait until after releasing 1.8.

(Actually, just logging the output of Spans.dump calls will by itself cause Θ(n2) behaviour for the whole download, although with a fairly small constant.)

comment:46 in reply to: ↑ 43 Changed at 2010-08-17T05:19:10Z by zooko

Replying to warner:

  • that said, I don't understand how assertions added to spans.py#L47 would ever fire. If those same assertions were added to spans.py#L295, I'd get it. What types were start/length in your observed assertion failures? And what was the stack trace?

This was my mistake. I must have confused it with a different test run. Those assertions never fire.

comment:47 Changed at 2010-08-18T12:57:33Z by zooko

run 111

1.8.0c2

requesting all of the file

with flog running

$ echo this is run 111 tahoe-lafs v1.8.0c2-r4699+ ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 111 tahoe-lafs v1.8.0c2-r4699+
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 27 1421M   27  394M    0     0  18995      0 21:47:59  6:02:29 15:45:30     0^C

real    362m29.965s
user    0m1.123s
sys     0m4.187s

comment:48 Changed at 2010-08-18T12:58:38Z by zooko

Oh, and in run 111 (comment:47) I had added log messages for all events which touched the Share._received Spans object so the resulting flogfile is a trace of everything that affects that object.

comment:49 Changed at 2010-08-18T14:07:45Z by zooko

The following run has patch attachment:spans.py.diff.

run 112

1.8.0c2

requesting all of the file

with flog running

$ echo this is run 112 tahoe-lafs v1.8.0c2-r4699+ ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 112 tahoe-lafs v1.8.0c2-r4699+
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 30 1421M   30  435M    0     0   113k      0  3:33:42  1:05:29  2:28:13  139k^C

real    65m29.907s
user    0m0.350s
sys     0m2.302s

comment:50 Changed at 2010-08-18T15:57:23Z by zooko

The patch helped a lot—compare run 112 to 111—but not enough to make trunk as fast as 1.7.1 on large downoads—compare run 112 to runs 102, 103, 105, and 109.

I intend to write a tool which reads the traces of what was done to the Share._received Spans object and does those operations to a Spans object so that we can run benchmark it and profile it in isolation.

run version downloaded download KBps flags
101 1.8.0c2 100 MB 91
102 1.7.1 100 MB 182
103 1.7.1 100 MB 207
104 1.8.0c2 100 MB 82
105 1.7.1 100 MB 211
109 1.7.1 100 MB 228 cProfile
110 1.8.0c2 100 MB 113 cProfile
111 1.8.0c2 413 MB 19 spanstrace
112 1.8.0c2 456 MB 113 spanstrace, patched

comment:51 Changed at 2010-08-18T16:04:20Z by zooko

attachment:debuggery-trace-spans.dpatch.txt adds logging of all events that touched Share._received at loglevel CURIOUS. attachment:run-111-above28-flog.pickle.bz2 and attachment:run-112-above28-flog.pickle.bz2 are the flogs from run 111 and run 112 with only events logged at level CURIOUS or above.

comment:52 Changed at 2010-08-18T17:02:52Z by zooko

run version downloaded download KBps flags
113 1.7.1 543 MB 241 on office network
$ echo this is run 113 tahoe-lafs v1.7.1 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 113 tahoe-lafs v1.7.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 36 1421M   36  518M    0     0   241k      0  1:40:26  0:36:37  1:03:49  257k^C

real    36m39.238s
user    0m0.329s
sys     0m2.733s

comment:53 Changed at 2010-08-18T18:56:04Z by warner

BTW, be sure to pay attention to the DataSpans too, specifically Share._received . That's the one that I observed growing linearly with number-of-segments-read.

I'm close to finishing my rework of the way Shares are handled. If we can make new-downloader fast enough by fixing complexity issues in spans.py, we should stick with that for 1.8.0, because those are probably smaller and less intrusive changes. If not, here are the properties of my Share-handling changes:

  • use a new diversity-seeking Share selection algorithm, as described in comment:27 . This should distribute the download load evenly among all known servers when they have equal number of shares, and as evenly as possible (while still getting k shares) when not. If more shares are discovered later, the algorithm will recalculate the sharemap and take advantage of the new shares, and we'll keep looking for new shares as long as we don't have the diversity that we want (one share per server).
  • fix the problem in which late shares (not used for the first segment, but located and added later) were not given the right sized hashtree and threw errors, causing them to be dropped. I think this completely broke the "tolerate loss of servers" feature, but the problem might have been caused by the diversity-seeking algorithm change, rather than something that was in new-downloader originally.
  • deliver all shares to the SegmentFetcher as soon as we learn about them, instead of waiting for the fetcher to tell us it's hungry. This gives the fetcher more information to work with.

I might be able to attach a patch tomorrow.. there are still some bugs in it, and I haven't finished implementing the last point (push shares on discovery, not pull on hunger).

comment:54 Changed at 2010-08-18T23:38:51Z by warner

Oh, hey, here's a simple patch to try out:

diff --git a/src/allmydata/immutable/downloader/share.py b/src/allmydata/immutable/downloader/share.py
index f7ed4e8..413f907 100644
--- a/src/allmydata/immutable/downloader/share.py
+++ b/src/allmydata/immutable/downloader/share.py
@@ -531,6 +531,9 @@ class Share:
             for o in observers:
                 # goes to SegmentFetcher._block_request_activity
                 o.notify(state=COMPLETE, block=block)
+            # now clear our received data, to dodge the #1170 spans.py
+            # complexity bug
+            self._received = DataSpans()
         except (BadHashError, NotEnoughHashesError), e:
             # rats, we have a corrupt block. Notify our clients that they
             # need to look elsewhere, and advise the server. Unlike

Since self._received is supposed to be empty after each segment is complete (unless we guess the segsize wrong), this patch simply manually empties it at that point. No data is retained from one segment to the next: any mistakes will just cause us to ask for more data next time.

If the problem in this bug is a computational complexity in DataSpans, this should bypass it, by making sure we never add more than 3 or 4 ranges to one, since even O(n^3) is small when n is only 3 or 4. (we should still fix the problem, but maybe the fix can wait for 1.8.1). If the problem is in Spans, or elsewhere, then this won't help.

comment:55 follow-up: Changed at 2010-08-19T02:31:18Z by zooko

attachment:run-112-above28-flog-dump-sh8-on-nsziz.txt is a flogtool dump of attachment:attachment:run-112-above28-flog.pickle.bz2 grepped for just one particular share (sh8 on nsziz). It is suitable as the input file for misc/simulators/bench_spans.py.

comment:56 in reply to: ↑ 55 Changed at 2010-08-19T02:31:47Z by zooko

attachment:run-112-above28-flog-dump-sh8-on-nsziz.txt is a flogtool dump of attachment:run-112-above28-flog.pickle.bz2 grepped for just one particular share (sh8 on nsziz). It is suitable as the input file for misc/simulators/bench_spans.py.

The output that I get on my Macbook Pro is:

MAIN Zooko-Ofsimplegeos-MacBook-Pro:~/playground/tahoe-lafs/trunk$ PYTHONPATH=~/playground/tahoe-lafs/trunk/support/lib/python2.6/site-packages/ python  ~/playground/tahoe-lafs/trunk/misc/simulators/bench_spans.py ~/Public/Drop\ Box/run-112-above28-flog-dump-sh8-on-nsziz.txt 
all results are in time units per N
time units per second: 1000000; seconds per time unit: 0.000001
(microseconds)
    600 best: 2.265e+01,   3th-best: 2.402e+01, mean: 2.462e+01,   3th-worst: 2.502e+01, worst: 2.585e+01 (of     10)
   6000 best: 1.069e+02,   3th-best: 1.119e+02, mean: 1.137e+02,   3th-worst: 1.149e+02, worst: 1.201e+02 (of     10)
  60000 best: 2.916e+01,   1th-best: 2.916e+01, mean: 5.080e+02,   1th-worst: 9.868e+02, worst: 9.868e+02 (of      2)

This is even though I have attachment:spans.py.diff applied.

Last edited at 2010-08-19T02:37:12Z by zooko (previous) (diff)

comment:57 Changed at 2010-08-19T04:27:20Z by zooko

Okay, the patch from comment:54 seems to have improved performance significantly. I just performed run 114:

MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 114 tahoe-lafs v1.8.0c2-r4699+comment:54 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 114 tahoe-lafs v1.8.0c2-r4699+comment:54
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 10 1421M   10  146M    0     0   149k      0  2:42:42  0:16:42  2:26:00  159k^R
 30 1421M   30  435M    0     0   147k      0  2:44:53  0:50:30  1:54:23  144k^C

real    50m30.207s
user    0m0.290s
sys     0m2.112s

Here is the full table:

run version downloaded download KBps flags
101 1.8.0c2 100 MB 91
102 1.7.1 100 MB 182
103 1.7.1 100 MB 207
104 1.8.0c2 100 MB 82
105 1.7.1 100 MB 211
109 1.7.1 100 MB 228 cProfile
110 1.8.0c2 100 MB 113 cProfile
111 1.8.0c2 413 MB 19 spanstrace
112 1.8.0c2 456 MB 113 spanstrace, spans.py.diff
113 1.7.1 543 MB 241 on office network
114 1.8.0c2 456 MB 147 spans.py.diff + comment:54

I'm not sure if v1.8.0c2 is now good enough to be considered "not a significant regression" vs. v1.7.1 for downloading large files. I'll go download a large file with v1.7.1 now on my home network for comparison...

comment:58 Changed at 2010-08-19T05:07:19Z by zooko

Hm, it seems like v1.7.1 is still substantially faster than v1.8.0c2+comment:54:

run version downloaded download KBps flags
115 1.7.1 456 MB 224
MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 115 tahoe-lafs v1.7.1 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 115 tahoe-lafs v1.7.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 30 1421M   30  435M    0     0   224k      0  1:48:09  0:33:06  1:15:03  240k^C

real    33m6.746s
user    0m0.287s
sys     0m2.307s
Last edited at 2010-08-19T05:44:07Z by zooko (previous) (diff)

comment:59 Changed at 2010-08-19T05:44:58Z by zooko

Well the good news is that comment:54 fixes the problem that downloads go slower the bigger they are (as expected). The bad news is that even with comment:54 Tahoe-LAFS v1.8.0c2 is substantially slower than v1.7.1 for large files:

run version downloaded download KBps flags
116 1.8.0c2 314 MB 154 spans.py.diff + comment:54
MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 115 tahoe-lafs v1.8.0c2+comment:54 ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 115 tahoe-lafs v1.8.0c2+comment:54
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 21 1421M   21  300M    0     0   154k      0  2:37:11  0:33:16  2:03:55  160k^C

real    33m16.529s
user    0m0.203s
sys     0m1.507s

I'm going to start another run with v1.8.0c2, this time with the cProfile tool running, and go to sleep.

Changed at 2010-08-19T06:01:17Z by zooko

comment:60 Changed at 2010-08-19T06:01:41Z by zooko

I ran 1.8.0c2 under the profiler for a few minutes and then stopped it in order to get the profiling stats (attached). Unfortunately, they do not show any more smoking gun of CPU usage, so the remaining slowdown from v1.7.1 to v1.8.0c2 is likely to be one of the network-scheduling issues that Brian has been thinking about (server selection, pipelining), or else some other sort of subtle timing issue...

Here are the profiling stats for a brief (~4 minute) run of 1.8.0c2:

MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 117 tahoe-lafs v1.8.0c2+comment:54 cProfile ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 117 tahoe-lafs v1.8.0c2+comment:54 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  2 1421M    2 35.1M    0     0   160k      0  2:30:45  0:03:43  2:27:02  169k^C

real    3m43.778s
user    0m0.027s
sys     0m0.210s

The functions with the most "cumtime" (time spent in the function or in any of the functions that it called) are:

        1    0.000    0.000  275.676  275.676 base.py:1156(run)
        1    0.175    0.175  275.675  275.675 base.py:1161(mainLoop)
    30070    0.576    0.000  267.477    0.009 selectreactor.py:93(doSelect)
    30070  259.463    0.009  260.149    0.009 {select.select}
    30070    0.191    0.000    7.795    0.000 base.py:751(runUntilCurrent)
     3724    0.031    0.000    7.205    0.002 eventual.py:18(_turn)
    27682    0.300    0.000    6.634    0.000 log.py:71(callWithLogger)
    27682    0.242    0.000    6.297    0.000 log.py:66(callWithContext)
     3453    0.075    0.000    5.742    0.002 share.py:187(loop)
    27682    0.116    0.000    5.719    0.000 context.py:58(callWithContext)
    27682    0.199    0.000    5.587    0.000 context.py:32(callWithContext)
    27681    0.228    0.000    5.310    0.000 selectreactor.py:144(_doReadOrWrite)
     3453    0.089    0.000    5.299    0.002 share.py:238(_do_loop)
    26522    0.347    0.000    4.397    0.000 tcp.py:443(doRead)
    26506    0.306    0.000    4.376    0.000 tcp.py:114(doRead)
28407/17703    0.257    0.000    3.770    0.000 defer.py:453(_runCallbacks)
13880/11193    0.031    0.000    2.824    0.000 defer.py:338(callback)
13983/11295    0.031    0.000    2.795    0.000 defer.py:433(_startRunCallbacks)
3064/3052    0.014    0.000    2.658    0.001 defer.py:108(maybeDeferred)
     3453    0.092    0.000    2.654    0.001 share.py:701(_send_requests)
     2639    0.007    0.000    2.032    0.001 referenceable.py:406(callRemote)
     2608    0.034    0.000    2.025    0.001 banana.py:633(dataReceived)
     2639    0.052    0.000    2.012    0.001 referenceable.py:418(_callRemote)
     2604    0.005    0.000    2.009    0.001 share.py:733(_send_request)
     2608    0.344    0.000    1.988    0.001 banana.py:701(handleData)
     2640    0.008    0.000    1.791    0.001 banana.py:183(send)
     2640    0.021    0.000    1.783    0.001 root.py:92(send)
     2651    0.199    0.000    1.683    0.001 banana.py:191(produce)
    26506    1.680    0.000    1.680    0.000 {built-in method recv}
28693/25918    0.072    0.000    1.651    0.000 defer.py:266(addCallbacks)
     4299    0.044    0.000    1.494    0.000 share.py:267(_get_satisfaction)
    12418    0.125    0.000    1.052    0.000 hashtree.py:298(needed_hashes)

I'll go ahead and leave a download running under the profiler overnight just in case something turns up.

Changed at 2010-08-19T06:04:37Z by zooko

Changed at 2010-08-19T06:06:35Z by zooko

comment:61 follow-up: Changed at 2010-08-19T06:11:10Z by zooko

If you wanted to investigate why 1.8.0c2 is so much slower than 1.7.1 at downloading a large file even after applying the comment:54 patch, then you could use attachment:run-115-flog.pickle.bz2 and attachment:run-116-flog.pickle.bz2 as evidence. Hm, hey waitasecond, in my earlier testing (recorded in this ticket), 1.8.0c2 was faster then 1.7.1 for small files (<= 10 MB). This was also the case for Nathan Eisenberg's benchmarks (posted to tahoe-dev). But currently it looks to me like the average download speed (as reported by curl during its operation) is the same at the beginning of the download as at the end, i.e. even during the first 10 MB or so 1.8.0c2 is only getting about 150 KBps where 1.7.1 is getting more than 200 KBps. Did something change?

I guess I (or someone) should run 1.7.1 vs. 1.8.0c2+comment:54 on 10 MB files. But I'm way too tired to start that again right now.

Man, I'm really worn out from staying up night after night poking at this and then having to get up early the next morning to help my children get ready for school and myself ready for work. I could use more help!

comment:62 follow-up: Changed at 2010-08-19T06:44:51Z by davidsarah

Perhaps the remaining issue is server selection. Let's try Brian's comment:27 diversity-seeking algorithm, combined with the comment:54 fix.

comment:63 in reply to: ↑ 62 Changed at 2010-08-19T14:07:15Z by zooko

Replying to davidsarah:

Perhaps the remaining issue is server selection. Let's try Brian's comment:27 diversity-seeking algorithm, combined with the comment:54 fix.

I'm willing to try the comment:27 diversity-seeking algorithm, but I also would like to verify whether or not server-selection is one of the factors by inspecting the flogs...

comment:64 Changed at 2010-08-19T14:10:46Z by zooko

Yes, the overnight run yielded no smoking gun (smoking CPU?) that I can see. I'll attach the full profiling results as an attachment.

MUSI Zooko-Ofsimplegeos-MacBook-Pro:~/Desktop$ echo this is run 118 tahoe-lafs v1.8.0c2+comment:54 cProfile ; time curl http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb.ogg
this is run 118 tahoe-lafs v1.8.0c2+comment:54 cProfile
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1421M  100 1421M    0     0   158k      0  2:32:39  2:32:39 --:--:--  168k

real    152m39.145s
user    0m1.024s
sys     0m7.661s
         275630510 function calls (274835891 primitive calls) in 28165.965 CPU seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 28165.965 28165.965 base.py:1156(run)
        1    8.156    8.156 28165.964 28165.964 base.py:1161(mainLoop)
  1226198   24.746    0.000 27834.397    0.023 selectreactor.py:93(doSelect)
  1226198 27531.442    0.022 27547.532    0.022 {select.select}
  1226198    8.846    0.000  313.488    0.000 base.py:751(runUntilCurrent)
   148051    1.222    0.000  285.941    0.002 eventual.py:18(_turn)
  1107126   12.628    0.000  257.892    0.000 log.py:71(callWithLogger)
  1107126   10.433    0.000  243.790    0.000 log.py:66(callWithContext)
   136521    3.020    0.000  229.335    0.002 share.py:187(loop)
  1107126    5.102    0.000  218.355    0.000 context.py:58(callWithContext)
  1107126    8.483    0.000  212.571    0.000 context.py:32(callWithContext)
   136521    1.831    0.000  211.328    0.002 share.py:238(_do_loop)
  1107126    9.274    0.000  200.718    0.000 selectreactor.py:144(_doReadOrWrite)
  1064200   13.055    0.000  178.141    0.000 tcp.py:114(doRead)
  1064262   15.145    0.000  165.679    0.000 tcp.py:443(doRead)
1100168/689294   10.315    0.000  135.808    0.000 defer.py:453(_runCallbacks)
536780/433736    1.023    0.000  110.261    0.000 defer.py:338(callback)
540512/437468    1.259    0.000  109.033    0.000 defer.py:433(_startRunCallbacks)
   136521    3.623    0.000  107.058    0.001 share.py:701(_send_requests)
118185/118168    0.513    0.000   94.733    0.001 defer.py:108(maybeDeferred)
   102399    0.197    0.000   81.560    0.001 share.py:733(_send_request)
   102514    0.290    0.000   81.467    0.001 referenceable.py:406(callRemote)
   102514    1.867    0.000   80.688    0.001 referenceable.py:418(_callRemote)
   104846    1.446    0.000   79.737    0.001 banana.py:633(dataReceived)
   104846   14.309    0.000   78.171    0.001 banana.py:701(handleData)
   102515    0.320    0.000   72.324    0.001 banana.py:183(send)
   102515    0.785    0.000   72.004    0.001 root.py:92(send)
  1064200   70.123    0.000   70.123    0.000 {built-in method recv}
   102562    7.417    0.000   68.190    0.001 banana.py:191(produce)
   170643    1.742    0.000   60.005    0.000 share.py:267(_get_satisfaction)
1114150/1011232    2.898    0.000   50.939    0.000 defer.py:266(addCallbacks)
   494839    5.175    0.000   42.363    0.000 hashtree.py:298(needed_hashes)
  4010897    6.757    0.000   41.319    0.000 tcp.py:413(write)
   494839    6.477    0.000   37.139    0.000 hashtree.py:128(needed_for)
  4010897   18.303    0.000   34.562    0.000 abstract.py:177(write)
   102809    1.151    0.000   33.768    0.000 banana.py:1049(handleClose)
   136521    2.571    0.000   33.606    0.000 share.py:556(_desire)
    45498    0.163    0.000   32.926    0.001 hashutil.py:51(tagged_hash)
    45499    0.083    0.000   32.782    0.001 fetcher.py:83(loop)
  1230099    3.737    0.000   32.762    0.000 banana.py:22(int2b128)
    45499    0.451    0.000   32.699    0.001 fetcher.py:91(_do_loop)
   227571    0.386    0.000   31.989    0.000 hashutil.py:31(update)
   102514    1.152    0.000   31.951    0.000 call.py:652(receiveClose)
   227571   31.557    0.000   31.557    0.000 {method 'update' of '_sha256.SHA256' objects}

Changed at 2010-08-19T14:11:27Z by zooko

Changed at 2010-08-19T17:27:57Z by warner

patch to prefer share diversity, forget leftover data after each segment, and fix handling of numsegs

comment:65 Changed at 2010-08-19T17:29:47Z by warner

the "1170-combo.diff" patch combines the approaches as suggested in comment:62 . Please give it a try and see if it helps. I'll try to look at the flogs to see what servers were used, to see if that run has a diversity issue or not.

comment:66 Changed at 2010-08-19T18:27:51Z by zooko

Okay, I investigated server selection on the bus to work this morning. attachment:run-115-flog.pickle.bz2 shows:

22:10:56.877 [649]: <CiphertextDownloader #1>(u6h6p4mlr3j7): starting download
22:10:56.881 [650]: sending DYHB to [tavrk54e]
22:10:56.882 [651]: sending DYHB to [xflgj7cg]
22:10:56.883 [652]: sending DYHB to [sp26qyqc]
22:10:56.884 [653]: sending DYHB to [zkj6swl2]
22:10:56.885 [654]: sending DYHB to [sroojqcx]
22:10:56.886 [655]: sending DYHB to [4rk5oqd7]
22:10:56.887 [656]: sending DYHB to [nszizgf5]
22:10:56.887 [657]: sending DYHB to [62nlabgf]
22:10:56.888 [658]: sending DYHB to [fp3xjndg]
22:10:56.888 [659]: sending DYHB to [rpiw4n3f]
22:10:57.006 [660]: got results from [xflgj7cg]: shnums []
22:10:57.010 [661]: got results from [rpiw4n3f]: shnums []
22:10:57.017 [662]: got results from [62nlabgf]: shnums []
22:10:57.058 [663]: got results from [nszizgf5]: shnums [3, 7, 8]
22:10:57.066 [664]: got results from [4rk5oqd7]: shnums [2]
22:10:57.110 [665]: got results from [fp3xjndg]: shnums [4, 9]
22:10:57.126 [666]: got results from [tavrk54e]: shnums []
22:10:57.181 [667]: got results from [zkj6swl2]: shnums []
22:10:57.205 [668]: got results from [sroojqcx]: shnums [1, 6]
22:10:57.474 [669]: got results from [sp26qyqc]: shnums [0, 5]

The 1.7.1 flog doesn't which servers are actually being used for Request Blocks, but we know that 1.7.1 will always choose to get all three shares from nszizgf5 in a case like this.

Therefore I don't think that 1.8's share-selection can be part of the explanation for why 1.8 is slower than 1.7.

(This doesn't mean that improved share selection wouldn't make 1.9 faster than 1.8 is now.)

comment:67 in reply to: ↑ 61 Changed at 2010-08-19T19:13:37Z by warner

Replying to zooko:

Hm, hey waitasecond, in my earlier testing (recorded in this ticket), 1.8.0c2 was faster then 1.7.1 for small files (<= 10 MB). This was also the case for Nathan Eisenberg's benchmarks (posted to tahoe-dev). But currently it looks to me like the average download speed (as reported by curl during its operation) is the same at the beginning of the download as at the end, i.e. even during the first 10 MB or so 1.8.0c2 is only getting about 150 KBps where 1.7.1 is getting more than 200 KBps. Did something change?

There's a sizeable startup time in 1.7.1 (lots of roundtrips), which went away in 1.8.0c2 . I think we're all in agreement about the small-file speedups that provides (i.e. we've not seen any evidence to the contrary). The change is on the order of a few seconds, though, so I think a 10MB file (or portion of a file) that takes 10MB/150kBps= 60s to complete won't be affected very much. I don't think you'll be able to see its effects in the curl output.

Nathan's tests were on hundreds or thousands of small files.

From my tests, the new-downloader sees about 500ms more taken to complete the first segment than the second and following ones. I believe that's the time spend doing server selection, UEB fetches, and the large hash chain fetches.

Changed at 2010-08-20T00:07:51Z by zooko

I ran three more measurements today at the office -- runs 119, 120, and 121 . These are the curl stdout from those. I will update a table with these results and put it into the original opening comment of this ticket.

comment:68 Changed at 2010-08-20T00:13:38Z by zooko

  • Description modified (diff)

Changed at 2010-08-20T16:31:25Z by zooko

I ran several more measurements from home, intended to test whether the logging in new-downloader is partially responsible for new-downloader's slowness. These are the curl stdout from those runs. I will update the table in the opening comment of this ticket to include these runs.

Changed at 2010-08-20T16:34:35Z by zooko

status page results for run 123

Changed at 2010-08-20T16:35:42Z by zooko

comment:69 Changed at 2010-08-20T16:43:19Z by zooko

  • Description modified (diff)

comment:70 Changed at 2010-08-20T16:45:39Z by zooko

Brian: I updated the table in the initial comment. Please let me know what other sorts of measurements you would like from me. It looks to me like there is still a significant regression in 1.8.0c2+comment:54+spans.py.diff even if I comment-out almost all calls to log.msg() in immutable/download/*.py. I will attach the patch that I used to comment out all those logging calls. I'll probably go ahead and apply your attachment:1170-combo.diff and run 100 MB downloads from the office during work today.

comment:71 Changed at 2010-08-20T21:59:18Z by warner

I did a quick test at home with a def msg(*args,**kwargs):pass in src/allmydata/util/log.py, and didn't see a noticable change (the noise level was pretty high, so even if there were a 10% difference, I probably wouldn't have been able to spot it).

In some other testing at work, I was unable to see a consistent performance difference between 171 and my comment:65 combo-patch, but the speed was warbling all over the place, so I don't feel that it was a very conclusive run. I'd patched both to only use a single server (nszi?), to reduce the variables.

What I'd like to do is to run a series of tests from my home network (no other traffic) using my personal backupgrid server (no other traffic), to see how consistent the results are. Maybe tomorrow I'll get a chance to try that.

Changed at 2010-08-21T00:44:05Z by zooko

I ran several more measurements from the office, intended to test whether Brian's attachment:1170-combo.diff made 1.8.0c2 competitive with 1.7.1. Sadly it appears not. :-( I'll update the table in the initial comment with these results.

Changed at 2010-08-21T00:47:15Z by zooko

I ran several more measurements from the office, intended to test whether Brian's attachment:1170-combo.diff made 1.8.0c2 competitive with 1.7.1. Sadly it appears not. :-( I'll update the table in the initial comment with these results. This attachment is a better-named copy of attachment:perf-measure-office.txt .

Changed at 2010-08-21T00:48:33Z by zooko

comment:72 Changed at 2010-08-21T00:59:01Z by zooko

  • Description modified (diff)

comment:73 Changed at 2010-08-21T01:04:51Z by zooko

Brian: please inspect the table in the ticket initial comment. It seems like there is a bimodal distribution with 1170-combo.diff, half of the time it runs at about 179 or 180 KBps (it ran at 169 KBps for the large download) and the other half of the time it runs at 262–291 KBps. The latter range is slightly faster than 1.7.1! I attached the down-status.html for the long download that ran at 169 KBps: run139-down-status.html.bz2.

comment:74 Changed at 2010-08-21T01:33:57Z by davidsarah

  • Description modified (diff)

comment:75 Changed at 2010-08-21T01:48:56Z by davidsarah

  • Description modified (diff)

comment:76 Changed at 2010-08-21T04:58:28Z by zooko

  • Description modified (diff)

comment:77 Changed at 2010-08-21T05:06:59Z by zooko

  • Description modified (diff)

Changed at 2010-08-21T17:05:52Z by terrell

comment:78 Changed at 2010-08-21T17:22:30Z by terrell

  • Description modified (diff)

and run 142

[11:37:28:trel:~/Desktop/tahoestuff/trunkpatched] time curl --range 0-100000000 http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg  > bbb-100M.ogg
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95.3M  100 95.3M    0     0   215k      0  0:07:32  0:07:32 --:--:--  182k

real	7m32.860s 
user	0m0.107s
sys	0m0.765s
Last edited at 2010-08-21T17:27:55Z by terrell (previous) (diff)

Changed at 2010-08-21T18:17:35Z by zooko

Changed at 2010-08-21T18:40:29Z by zooko

comment:79 Changed at 2010-08-21T19:32:33Z by zooko

run-zooko1000 was from my local coffeeshop—Caffe Sole in South Boulder—and the status.html shows this interesting pattern that the downloader immediately issued 10 DYHB queries (as expected), and then it took 9.6 seconds for the first DYHB response to arrive. Then the really weird, part, it took 8.4s more for the next seven DYHB responses to arrive (totalling 18s from request to response)! Then, still weird, it took a total request-to-response time of 6 minutes for the ninth response and a total of 8 minutes for the tenth. Also, as soon as the first response arrived the downloader issued a new DYHB request, and that one, the eleventh one, took 8.92s for the response to arrive.

So, I suppose there is something very messed up about the network at my local coffeeshop. Perhaps it blocks a flow that starts on an idle TCP connection while it is trying to figure out how to insert ads into any HTTP responses, or something. Note that these TCP connections were all already established long before the download began.

Take-aways?

I guess it is that we should not make assumptions about "reasonable" for IP traffic. That is: if we want to support people who use Tahoe-LAFS from coffeeshops, over tethered cell phones, at Burning Man, on satellite uplinks, on the International Space Station, etc. (which I do).

Another take-away is that 1.8.0c2+combo.diff did pretty well in this situation! (I think 1.7.1 probably would have done well too but I didn't get a chance to try it.)

Changed at 2010-08-21T20:09:49Z by terrell

status page for run 141

Changed at 2010-08-21T20:10:06Z by terrell

status page for run 142

comment:80 Changed at 2010-08-21T20:13:33Z by terrell

  • Description modified (diff)

Changed at 2010-08-22T04:41:52Z by terrell

capture from stdout - alternating between 1.7.1 and trunk+combo - run from cable modem at home on pubgrid

Changed at 2010-08-22T04:42:42Z by terrell

Changed at 2010-08-22T04:42:56Z by terrell

Changed at 2010-08-22T04:43:12Z by terrell

Changed at 2010-08-22T04:43:24Z by terrell

Changed at 2010-08-22T04:43:39Z by terrell

Changed at 2010-08-22T04:43:57Z by terrell

Changed at 2010-08-22T04:44:11Z by terrell

Changed at 2010-08-22T04:44:29Z by terrell

Changed at 2010-08-22T04:44:43Z by terrell

Changed at 2010-08-22T04:44:57Z by terrell

comment:81 Changed at 2010-08-22T04:57:54Z by terrell

  • Description modified (diff)

runs 143-162 generated with the following bash script to alternate the clients and grab the status file:

feel free to adapt and reuse.

#!/bin/bash
##################################
# EDIT THESE VARIABLES
##################################
FIRSTRUN=143
RUNS=20
FILESIZE=100000000
BASE171="/Users/trel/Desktop/tahoestuff/allmydata-tahoe-1.7.1"
BASEPATCH="/Users/trel/Desktop/tahoestuff/trunkpatched"
CACHEDIR="/Users/trel/.tahoe/private/cache/download"
FILENAME="http://localhost:3456/file/URI%3ACHK%3A4klgnafrwsm2nx3bqy24ygac5a%3Acrz7nhthi4bevzrug6xwgif2vhiacp7wk2cfmjutoz2ns3w45qza%3A3%3A10%3A1490710513/@@named=/bbb-360p24.i420.lossless.drc.ogg.fixed.ogg%2Bbbb-24fps.flac.via-ffmpeg.ogg"
SAVEFILE="deleteme.ogg"
##################################
RUNTOTAL=$FIRSTRUN
RUNCOUNT=0
while [ $RUNCOUNT -lt $RUNS ]; do
  let RUNTOTAL=FIRSTRUN+RUNCOUNT
  echo "-----------------------------------------------"
  echo "RUN $RUNTOTAL"
  if [ $[$RUNTOTAL % 2] -eq 0 ]
  then
    cd $BASE171
  else
    cd $BASEPATCH
  fi
  pwd
  bin/tahoe --version
  bin/tahoe stop
  bin/tahoe start
  echo "waiting 10s for node to spin up and connect..."
  sleep 10
  echo "curl --range 0-$FILESIZE $FILENAME > $SAVEFILE"
  time curl --range 0-$FILESIZE $FILENAME > $SAVEFILE
  if [ `pwd` = $BASEPATCH ]
  then
    echo "saving status.html..."
    curl http://localhost:3456/status/down-0 > status-$RUNTOTAL.html
    bzip2 status-$RUNTOTAL.html
  fi
  rm -f $CACHEDIR/*
  rm $SAVEFILE
  let RUNCOUNT=RUNCOUNT+1
  sleep 1
done


Note: See TracTickets for help on using tickets.