[tahoe-dev] [tahoe-lafs] #1170: does new-downloader perform badly for certain situations (such as today's Test Grid)?

Thu Aug 12 18:15:54 UTC 2010

#1170: does new-downloader perform badly for certain situations (such as today's
Test Grid)?
------------------------------+---------------------------------------------
     Reporter:  zooko         |       Owner:                    
         Type:  defect        |      Status:  new               
     Priority:  major         |   Milestone:  1.8.0             
    Component:  code-network  |     Version:  1.8β              
   Resolution:                |    Keywords:  immutable download
Launchpad Bug:                |  
------------------------------+---------------------------------------------

Comment (by warner):

 yeah, the 32/64-byte reads are hashtree nodes. The spans structure only
 coaleses adjacent/overlapping reads (the 64-byte reads are the result of
 two neighboring 32-byte hashtree nodes being fetched), but all requests
 are pipelined (note the "txtime" column in the "Requests" table, which
 tracks remote-bucket-read requests), and the overhead of each message is
 fairly small (also note the close proximity of the "rxtime" for those
 batches of requests). So I'm not particularly worried about merging these
 requests further.

 My longer-term goal is to extend the Spans data structure with some sort
 of "close enough" merging feature: given a Spans bitmap, return a new
 bitmap with all the small holes filled in, so e.g. a 32-byte gap between
 two hashtree nodes (which might not be strictly needed until a later
 segment is read) would be retrieved early. The max-hole-size would need to
 be tuned to match the overhead of each remote-read message (probably on
 the order of 30-40 bytes): there's a breakeven point somewhere in there.

 Another longer-term goal is to add a {{{readv()}}}-type API to the remote
 share-read protocol, so we could fetch multiple ranges in a single call.
 This doesn't shave much overhead off of just doing multiple pipelined
 {{{read()}}} requests, so again it's low-priority.

 And yes, a cleverer which-share-should-I-use-now algorithm might reduce
 stalls like that. I'm working on visualization tools to show the raw
 download-status events in a Gantt-chart -like form, which should make it
 easier to develop such an algorithm. For now, you want to look at the
 Request table for correlations between reads that occur at the same time.
 For example, at the +1.65s point, I see several requests that take
 1.81s/2.16s/2.37s . One clear improvement would be to fetch shares 0 and 5
 from different servers: whatever slowed down the reads of sh0 also slowed
 down sh5. But note that sh8 (from the other server) took even longer: this
 suggests that the congestion was on your end of the line, not theirs,
 especially since the next segment arrived in less than half a second.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1170#comment:2>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage