[tahoe-lafs-trac-stream] [tahoe-lafs] #1264: Performance regression for large values of K

Thu Sep 15 10:28:08 PDT 2011

#1264: Performance regression for large values of K
-----------------------------+---------------------------------------------
     Reporter:  francois     |      Owner:  francois
         Type:  defect       |     Status:  new
     Priority:  major        |  Milestone:  soon
    Component:  code-        |    Version:  1.8.0
  network                    |   Keywords:  performance regression download
   Resolution:               |
Launchpad Bug:               |
-----------------------------+---------------------------------------------

Comment (by warner):

 I just finshed doing the first batch of comparative performance tests on
 the Atlas Networks hardware, and uploaded my tools and the graphs. I'm
 calling this environment "atlasperf1". It consists of four machines (one
 client, the others are storage servers, two server nodes per machine) on
 fast local connections. The CPUs are dual-core Pentium4s at 3.00GHz, the
 average ping time is 420us. The client is running tahoe-1.8.2 (the servers
 are a mix of versions: 1.8.2, 1.7.1, 1.6.1).

 I uploaded 1MB/10MB/100MB immutable files with various encodings (from
 1-of-1 to 60-of-60, all with k=N). Then I ran a program to download random
 ones over and over. I graphed the k-vs-time curve for about 1700 trials.

 The graph shows a nearly-perfect linear slowdown with increasing k. For a
 100MB file, k=5 yields a 50s download, k=30 takes 250s, and k=60 takes
 about 500s. The same shape holds for 1MB and 10MB files too.

 The next test is to try various segment sizes. I've got a few pet theories
 about the slowdown: the number of foolscap messages being sent, or the
 limited pipelining of small requests. Both would have a correlation with
 number of segments. For a fast network, our current one-segment pipeline
 isn't very big. And with high 'k', the amount of data we're getting from
 each block can get quite small (k=60 means each block is 128k/60=2.2k,
 which is scarcely a whole packet), so we've got a lot of tiny little
 requests flying around.

 Possibly mitigations include:

  * when segsize/k is small, accept the memory-footprint hit and use a
 deeper pipeline, to keep the pipes full
  * implement a {{{readv()}}} method on the storage servers to reduce the
 foolscap overhead (if that's in fact an issue)
  * advise larger segsize on fast networks

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1264#comment:17>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage