[tahoe-dev] [tahoe-lafs] #1170: new-downloader performs badly when downloading a lot of data from a file

Wed Aug 25 07:17:40 UTC 2010

#1170: new-downloader performs badly when downloading a lot of data from a file
------------------------------+---------------------------------------------
     Reporter:  zooko         |       Owner:                                           
         Type:  defect        |      Status:  new                                      
     Priority:  critical      |   Milestone:  1.8.0                                    
    Component:  code-network  |     Version:  1.8β                                     
   Resolution:                |    Keywords:  immutable download performance regression
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by warner):

 I did some more testing with those visualization tools (adding some misc
 events like entry/exit of internal functions). I've found one place where
 the
 downloader makes excessive eventual-send calls which appears to cost 250us
 per {{{remote_read}}} call. I've also measured hash-tree operations as
 consuming a surprising amount of overhead.

 * each {{{Share._got_response}}} call queues an eventual-send to
   {{{Share.loop}}}, which checks the satisfy/desire processes. Since a
 single
   TCP buffer is parsed into lots of Foolscap response messages, these are
 all
   queued during the same turn, which means the first {{{loop()}}} call
 will
   see all of the data, and the remaining ones will see nothing. Each of
 these
   empty {{{loop()}}} calls takes about 250us. There is one for each
   {{{remote_read}}} call, which means k*(3/2)*numsegs for the block hash
   trees and an additional k*(3/2)*numsegs for the ciphertext hash tree
   (because we ask each share for the CTHT nodes, rather than asking only
 one
   and hoping they return it so we can avoid an extra roundtrip). For k=3
   that's 2.25ms per segment. The cost is variable: on some segments (in
   particular the first and middle ones) the overhead is maximal, whereas
 on
   every odd segnum there is no overhead. On a 12MB download, this is about
   225ms, and on my local one-CPU testnet, the download took 2.9s, so this
   represents about 8%.

 * It takes my laptop 1.34ms to process a set of blocks into a segment
 (seg2
   of a 96-segment file). 1.19ms of that was checking the ciphertext hash
 tree
   (probably two extra hash nodes), and a mere 73us was spent in FEC. AES
   decryption of the segment took 1.1ms, and accounted for 65% of the 1.7ms
   inter-segment gap (the delay between delivering seg2 and requesting
 seg3).

 I'd like to change the {{{_got_response}}} code to set a flag and queue a
 single call to {{{loop}}} instead of queueing multiple calls. That would
 save
 a little time (and probably remove the severe jitter that I've seen on
 local
 downloads), but I don't think it can explain the 50% slowdown that Zooko's
 observed.

 These visualization tools are a lot of fun. One direction to explore is to
 record some packet timings (with tcpdump) and add it as an extra row: that
 would show us how much latency/load Foolscap is spending before it
 delivers a
 message response to the application.

 I'll attach two samples of the viz output as attachment:viz-3.png and
 attachment:viz-4.png . The two captures are of different parts of the
 download, but in both cases the horizontal ticks are 500us apart. The
 candlestick-diagram-like shapes are the satisfy/desire sections of
 {{{Share.loop}}}, and the lines (actually very narrow boxes) between them
 are
 the "disappointment" calculation at the end of {{{Share.loop}}}, so the
 gap
 before it must be the {{{send_requests}}} routine.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1170#comment:92>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage