[tahoe-dev] [tahoe-lafs] #1264: Performance regression for large values of K

Tue Nov 23 22:25:47 UTC 2010

#1264: Performance regression for large values of K
------------------------------+---------------------------------------------
     Reporter:  francois      |       Owner:                                
         Type:  defect        |      Status:  new                           
     Priority:  major         |   Milestone:  soon                          
    Component:  code-network  |     Version:  1.8.0                         
   Resolution:                |    Keywords:  performane regression download
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by warner):

 zooko said:

 > I don't understand why a few of the share requests would take ten times
 as
 > long as normal. Is the delay on the client, the server, or the network?
 > Brian hypothesized that it had something to do with how the spans data
 > structure gets used more when K is higher.

 Actually my hypothesis is that the time spent between receipt of the block
 request and the transmission of the next block request will be higher for
 larger values of k. This time would not be "charged" against the
 individual
 server: it occurs after the clock has been stopped.

 [http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1170/180c2-viz-
 delays.png This picture]
 (and the other pictures on #1170) show part of this: it's the delay
 between
 the receipt of the last block() request (e.g. at 2.60s) and the
 transmission
 of the next request (at around 2.67s).

 [http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1170/viz-3.png
 Zooming in]
 shows what's happening in that gap. If you could read the "desire" words
 on
 the small boxes (which of course is easier when you can interactively zoom
 the chart, rather than looking at a screenshot), you could see that
 recomputing the desire/satified bitmaps is being done three times, but the
 second and third passes are redundant (no new responses have arrived, so
 the
 computed bitmap is exactly the same, so that's just wasted time).

 If I remember right, the downloader was doing one redundant pass for each
 hash received, and we fetch more hashes for some segments than others
 (peaking for NUMSEGS/2), resulting in an uneven distribution of extra
 delays.
 My comments in http://tahoe-lafs.org/trac/tahoe-
 lafs/ticket/1170#comment:92
 suggest that it's not enough to explain the delays seen here (8% on a 12MB
 file on a one-CPU testnet), but it *would* be worse with more segments.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1264#comment:8>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage