[tahoe-dev] [tahoe-lafs] #1264: Performance regression for large values of K
tahoe-lafs
trac at tahoe-lafs.org
Wed Nov 24 07:42:34 UTC 2010
#1264: Performance regression for large values of K
------------------------------+---------------------------------------------
Reporter: francois | Owner:
Type: defect | Status: new
Priority: major | Milestone: soon
Component: code-network | Version: 1.8.0
Resolution: | Keywords: performane regression download
Launchpad Bug: |
------------------------------+---------------------------------------------
Comment (by zooko):
> You'll find the [attachment:tahoe.profile profile generated] by
downloading a 20-of-60 10 MB file and then stopping the client. I'll
hopefully address the remaining points tomorrow.
Okay I did a quick dump of the stats from [attachment:tahoe.profile],
first sorted by total time spent in each function excluding time spent
waiting for functions that it called to return ("tottime"), and then
sorted by total time spent in each function including time spent waiting
for functions it called to return ("cumtime").
{{{
>>> p.sort_stats('time').print_stats(8)
Tue Nov 23 14:36:37 2010 tahoe.profile
21516265 function calls (21475767 primitive calls) in 240.371 CPU
seconds
Ordered by: internal time
List reduced from 1491 to 8 due to restriction <8>
ncalls tottime percall cumtime percall filename:lineno(function)
1406 147.383 0.105 147.492 0.105 {select.select}
1370101 21.221 0.000 21.221 0.000 {posix.stat}
50882 14.011 0.000 51.913 0.001
traceback.py:273(extract_stack)
1370097 7.485 0.000 28.704 0.000 linecache.py:43(checkcache)
50882 7.292 0.000 8.031 0.000 traceback.py:27(format_list)
1370097 3.531 0.000 7.492 0.000 linecache.py:13(getline)
2031 2.392 0.001 2.392 0.001 {built-in method send}
1370097 1.749 0.000 3.435 0.000 linecache.py:33(getlines)
21516265 function calls (21475767 primitive calls) in 240.371 CPU
seconds
Ordered by: cumulative time
List reduced from 1491 to 44 due to restriction <44>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 240.371 240.371 base.py:1046(run)
1 0.025 0.025 240.361 240.361 base.py:1051(mainLoop)
1406 0.062 0.000 190.230 0.135
selectreactor.py:93(doSelect)
1406 147.383 0.105 147.492 0.105 {select.select}
50882 1.038 0.000 61.047 0.001
traceback.py:264(format_stack)
50882 14.011 0.000 51.913 0.001
traceback.py:273(extract_stack)
1406 0.053 0.000 50.059 0.036 base.py:666(runUntilCurrent)
874 0.054 0.000 49.797 0.057 eventual.py:18(_turn)
25381/20221 0.116 0.000 46.592 0.002 defer.py:233(callback)
25391/20231 0.392 0.000 46.453 0.002
defer.py:291(_startRunCallbacks)
6377 0.277 0.000 45.856 0.007 share.py:188(loop)
6377 0.180 0.000 44.419 0.007 share.py:239(_do_loop)
4672 0.049 0.000 42.653 0.009 log.py:38(callWithLogger)
4672 0.063 0.000 42.597 0.009 log.py:33(callWithContext)
4672 0.035 0.000 42.467 0.009
context.py:58(callWithContext)
4672 0.053 0.000 42.428 0.009
context.py:32(callWithContext)
4672 0.048 0.000 42.358 0.009
selectreactor.py:144(_doReadOrWrite)
5200/5143 0.101 0.000 39.445 0.008 defer.py:84(maybeDeferred)
2212 0.043 0.000 39.311 0.018 tcp.py:345(doRead)
2136 0.038 0.000 38.325 0.018 tcp.py:116(doRead)
6377 0.405 0.000 37.374 0.006 share.py:709(_send_requests)
1893 0.087 0.000 37.301 0.020 banana.py:633(dataReceived)
4999 0.035 0.000 37.245 0.007
referenceable.py:406(callRemote)
1893 1.008 0.001 37.210 0.020 banana.py:701(handleData)
4999 0.273 0.000 37.103 0.007
referenceable.py:418(_callRemote)
4777 0.029 0.000 34.931 0.007 share.py:741(_send_request)
25491 0.333 0.000 31.085 0.001 defer.py:166(__init__)
50890/30329 0.944 0.000 30.678 0.001
defer.py:314(_runCallbacks)
5000 0.100 0.000 30.631 0.006 banana.py:183(send)
5000 0.120 0.000 30.525 0.006 root.py:92(send)
1370097 7.485 0.000 28.704 0.000 linecache.py:43(checkcache)
10106 0.082 0.000 24.185 0.002 defer.py:30(succeed)
5582 0.183 0.000 22.692 0.004 banana.py:1049(handleClose)
4999 0.110 0.000 21.742 0.004 call.py:652(receiveClose)
1370101 21.221 0.000 21.221 0.000 {posix.stat}
5072 0.829 0.000 19.518 0.004 banana.py:191(produce)
10072 0.118 0.000 13.274 0.001 root.py:72(next)
16851 0.085 0.000 12.370 0.001 banana.py:1035(handleToken)
9998 0.095 0.000 12.250 0.001 call.py:628(receiveChild)
50868/44994 0.268 0.000 11.336 0.000 defer.py:172(addCallbacks)
4999 0.026 0.000 9.349 0.002 call.py:673(_done)
4999 0.054 0.000 9.323 0.002 call.py:55(complete)
50882 7.292 0.000 8.031 0.000 traceback.py:27(format_list)
1370097 3.531 0.000 7.492 0.000 linecache.py:13(getline)
}}}
Here are my comments/questions about this:
* Only the first few rows of the tottime list are at all significant. The
total CPU time was 240 seconds, so the 3rd row -- {{{extract_stack}}} --
used about 5% of the CPU time and the 4th row -- {{{checkcache}}} -- used
about 3%.
* Why is {{{extract_stack}}} getting caled 50,882 times!? Here are all of
the callers which called {{{extract_stack}}} in this run:
{{{
>>> p.print_callers('extract_stack')
Ordered by: cumulative time
List reduced from 1491 to 1 due to restriction <'extract_stack'>
Function was called by...
ncalls tottime cumtime
traceback.py:273(extract_stack) <- 50882 14.011 51.913
traceback.py:264(format_stack)
<pstats.Stats instance at 0x1004e0950>
>>> p.print_callers('format_stack')
Ordered by: cumulative time
List reduced from 1491 to 1 due to restriction <'format_stack'>
Function was called by...
ncalls tottime cumtime
traceback.py:264(format_stack) <- 25491 0.591 30.752
defer.py:166(__init__)
25391 0.447 30.295
defer.py:291(_startRunCallbacks)
<pstats.Stats instance at 0x1004e0950>
}}}
Ah, this indicates that {{{twisted.internet.defer.debug=True}}}. Okay. I
wonder how it got turned on?
* Okay, and why is {{{posix.stat}}} invoked more than a million times?
Ah, {{{print_callers()}}} tells me that {{{posix.stat}}} is getting called
by {{{checkcache}}} which is getting called by {{{extract_stack}}}. So all
told, around 23% of the CPU time (as measured in "tottime" terms instead
of "cumtime" terms) was spent in doing the debugging stuff due to
{{{defer.debug=True}}}. This makes me wonder what the overall throughput
would be if {{{defer.debug}}} were turned off.
Next, I turn my attention to the "cumtime" results.
* Measured this way, the CPU time spent in the {{{defer.debug}}} stuff
was 25% of all CPU time. (And this is is a better measure of this than the
tottime result is.)
*
[source:trunk/src/allmydata/immutable/downloader/share.py?annotate=blame&rev=4707#L188
share.py:188(loop)] was called 6377 times and took up a total of 19% of
all CPU time. This might be a smoking gun showing that Brian's analysis in
#1268 is correct and explains the slowdown that this ticket (#1264) is
about.
*
[source:trunk/src/allmydata/immutable/downloader/share.py?annotate=blame&rev=4707#L709
share.py:709(_send_requests)], called 6377 times, total of 37.374 seconds
(15.5% of the total). 6 milliseconds per call. Is this more evidence in
favor of the #1268 theory? What functions does {{{_send_requests}}} call,
and how long do they take?
{{{
>>> p.strip_dirs().print_callees('_send_requests')
Random listing order was used
List reduced from 1491 to 1 due to restriction <'_send_requests'>
Function called...
ncalls tottime cumtime
share.py:709(_send_requests) -> 9554 0.024 0.067
defer.py:189(addCallback)
9554 0.022 0.064
defer.py:197(addErrback)
11154 0.207 0.652
log.py:144(msg)
4777 0.029 34.931
share.py:741(_send_request)
4777 0.083 0.210
spans.py:46(add)
19131 0.132 0.196
spans.py:142(dump)
11154 0.014 0.014
spans.py:152(__iter__)
12754 0.060 0.605
spans.py:170(__sub__)
6377 0.021 0.069
spans.py:256(get_spans)
4777 0.049 0.066
status.py:97(add_request_sent)
11154 0.034 0.093 {repr}
4777 0.002 0.002 {time.time}
}}}
Huh-oh, I think this shows that cProfile is giving us bad data! The
cumulative time spent in {{{_send_request}}}, including waiting for its
callees to return, was 34.931s, and the total time spent in
{{{_send_request}}} not waiting for its callees to return was 0.029s,
which means that it was waiting for its callees to return for almost 35s.
But, the total time spent in all of its callees adds up to about 2s
seconds. That doesn't make sense! :-(
Well, it is obviously time for me to go to bed. Hopefully it will make
more sense when I get back to this next weekend.
One last comment: it is surprising how much time is spent in banana.py and
its children! (That's the code in foolscap to serialize/deserialize
things.)
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1264#comment:9>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-dev
mailing list