[tahoe-dev] [tahoe-lafs] #1264: Performance regression for large values of K
tahoe-lafs
trac at tahoe-lafs.org
Wed Nov 24 16:39:04 UTC 2010
#1264: Performance regression for large values of K
------------------------------+---------------------------------------------
Reporter: francois | Owner:
Type: defect | Status: new
Priority: major | Milestone: soon
Component: code-network | Version: 1.8.0
Resolution: | Keywords: performane regression download
Launchpad Bug: |
------------------------------+---------------------------------------------
Comment (by warner):
zooko says:
> So all told, around 23% of the CPU time (as measured in "tottime" terms
> instead of "cumtime" terms) was spent in doing the debugging stuff due
to
> defer.debug=True. This makes me wonder what the overall throughput would
be
> if defer.debug were turned off.
Whoa, yeah, defer.debug should definitely *not* be turned on. We've
observed
the unit test suite taking like 3x as long when it is. (last time, it was
turned on by some accidentally left-in debugging code). What's going on?
Francois: do you have any local patches which call
twisted.internet.defer.setDebugging(True)? There's nothing in the current
trunk which does that. If you run twistd with -d/--debug, it will turn on
defer.setDebugging, but from the command line you ran it doesn't look like
that's the case.
> share.py:188(loop) was called 6377 times and took up a total of 19% of
all
> CPU time. This might be a smoking gun showing that Brian's analysis in
> #1268 is correct and explains the slowdown that this ticket (#1264) is
> about.
> share.py:709(_send_requests), called 6377 times, total of 37.374 seconds
> (15.5% of the total). 6 milliseconds per call.
Let's see, k=20, 10MB file, assuming default 128KiB segsize, that gives 77
segments, so a total of 144 nodes in the block hash merkle tree for each
share. We fetch half of the leaves and all of the non-leaves, so about 115
nodes per share, and there are 20 shares, so about 2310 block hash nodes
total. There are also 77 data blocks per share, so 1540 of those. We also
fetch maybe a dozen other pieces (offset table, UEB, some portion of the
share hash tree and ciphertext hash tree) for each share, so maybe an
extra
240 fetches there. Eyeballing that is about 4090 fetches. So, it's within
an
order of magnitude of what I'd expect to see.
Each {{{_send_requests}}} call turns into one {{{loop}}} call, yes: that's
the #1268 bug. When that gets fixed, we should see 6377
{{{_send_requests}}}
but far fewer calls to {{{loop}}}, maybe 20*(77+2)=1580.
Re {{{_send_request}}} and recursion: careful, there are two similarly-
named
functions here. {{{_send_requests()}}} (plural) loops over
{{{_send_request()}}} (singular). Does this resolve the confusion?
{{{_send_request()}}} is just a single call to {{{callRemote}}}, and if
we're
spending a lot of time there, it means we're spending a lot of time in the
Foolscap serialization code. Let's see, 4777 calls and 34.931s total, so
7.3ms per call.. that feels high. The arguments being serialized are
really
simple (two ints), but there are a lot of Deferreds involved, so if
defer.setDebugging(True) is really turned on, I'm not as surprised.
We really need a new profiling run with defer.setDebugging turned off.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1264#comment:11>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-dev
mailing list