[tahoe-dev] [tahoe-lafs] #1264: Performance regression for large values of K

Wed Nov 24 16:39:04 UTC 2010

#1264: Performance regression for large values of K
------------------------------+---------------------------------------------
     Reporter:  francois      |       Owner:                                
         Type:  defect        |      Status:  new                           
     Priority:  major         |   Milestone:  soon                          
    Component:  code-network  |     Version:  1.8.0                         
   Resolution:                |    Keywords:  performane regression download
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by warner):

 zooko says:

 > So all told, around 23% of the CPU time (as measured in "tottime" terms
 > instead of "cumtime" terms) was spent in doing the debugging stuff due
 to
 > defer.debug=True. This makes me wonder what the overall throughput would
 be
 > if defer.debug were turned off.

 Whoa, yeah, defer.debug should definitely *not* be turned on. We've
 observed
 the unit test suite taking like 3x as long when it is. (last time, it was
 turned on by some accidentally left-in debugging code). What's going on?
 Francois: do you have any local patches which call
 twisted.internet.defer.setDebugging(True)? There's nothing in the current
 trunk which does that. If you run twistd with -d/--debug, it will turn on
 defer.setDebugging, but from the command line you ran it doesn't look like
 that's the case.

 > share.py:188(loop) was called 6377 times and took up a total of 19% of
 all
 > CPU time. This might be a smoking gun showing that Brian's analysis in
 > #1268 is correct and explains the slowdown that this ticket (#1264) is
 > about.

 > share.py:709(_send_requests), called 6377 times, total of 37.374 seconds
 > (15.5% of the total). 6 milliseconds per call.

 Let's see, k=20, 10MB file, assuming default 128KiB segsize, that gives 77
 segments, so a total of 144 nodes in the block hash merkle tree for each
 share. We fetch half of the leaves and all of the non-leaves, so about 115
 nodes per share, and there are 20 shares, so about 2310 block hash nodes
 total. There are also 77 data blocks per share, so 1540 of those. We also
 fetch maybe a dozen other pieces (offset table, UEB, some portion of the
 share hash tree and ciphertext hash tree) for each share, so maybe an
 extra
 240 fetches there. Eyeballing that is about 4090 fetches. So, it's within
 an
 order of magnitude of what I'd expect to see.

 Each {{{_send_requests}}} call turns into one {{{loop}}} call, yes: that's
 the #1268 bug. When that gets fixed, we should see 6377
 {{{_send_requests}}}
 but far fewer calls to {{{loop}}}, maybe 20*(77+2)=1580.

 Re {{{_send_request}}} and recursion: careful, there are two similarly-
 named
 functions here. {{{_send_requests()}}} (plural) loops over
 {{{_send_request()}}} (singular). Does this resolve the confusion?
 {{{_send_request()}}} is just a single call to {{{callRemote}}}, and if
 we're
 spending a lot of time there, it means we're spending a lot of time in the
 Foolscap serialization code. Let's see, 4777 calls and 34.931s total, so
 7.3ms per call.. that feels high. The arguments being serialized are
 really
 simple (two ints), but there are a lot of Deferreds involved, so if
 defer.setDebugging(True) is really turned on, I'm not as surprised.

 We really need a new profiling run with defer.setDebugging turned off.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1264#comment:11>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage