#397 new task

increase helper fetch blocksize to 1MB — at Version 3

Reported by: warner Owned by:
Priority: major Milestone: eventually
Component: code-encoding Version: 1.0.0
Keywords: upload-helper Cc:
Launchpad Bug:

Description (last modified by zooko)

We have reports from a user with a fast uplink (but perhaps a long latency) that they are pushing data to the helper very slowly, perhaps 20kBps. We might want this to go faster.

One likely culprit is the helper's non-pipelined fetch protocol. It asks for a 50kB block, waits until that has been received, writes it to disk, then asks for the next one. This imposes a hard limit on the inbound data rate: even with an infinitely fast uplink, we can't fetch faster than 50kB/latency. For a 100ms RTT, this would be about 500kBps.

But even more likely is the helper simply being overloaded, because the real denominator in that fetch-rate equation is the end-to-end latency, from the time that the helper asks for one block to the time it asks for the next one. In addition to the network latency, the helper is busy doing all sorts of other things (like uploading other people's files).

In either case, allowing the client to give us more data per request would increase their throughput. I picked 50kB because it felt like a reasonable value. My notes in source:src/allmydata/offloaded.py#L338 say:

read data in 50kB chunks. We should choose a more considered number here, possibly letting the client specify it. The goal should be to keep the RTT*bandwidth to be less than 10% of the chunk size, to reduce the upload bandwidth lost because this protocol is non-windowing. Too large, however, means more memory consumption for both ends. Something that can be transferred in, say, 10 seconds sounds about right. On my home DSL line (50kBps upstream), that suggests 500kB. Most lines are slower, maybe 10kBps, which suggests 100kB, and that's a bit more memory than I want to hang on to, so I'm going to go with 50kB and see how that works.

But, as we've learned, people have some remarkably fast uplinks these days. The main downside of increasing the blocksize is memory consumption: both client and helper will use about 2xblocksize for each upload operation that's happening in parallel. Zandr tells me to not worry about memory usage on this scale. So a 1MB blocksize could be reasonable.

Change History (3)

comment:1 Changed at 2008-05-29T22:17:54Z by warner

  • Milestone changed from 1.1.0 to 1.2.0

comment:2 Changed at 2009-06-30T12:39:56Z by zooko

  • Milestone changed from 1.5.0 to eventually

comment:3 Changed at 2015-08-16T15:23:13Z by zooko

  • Description modified (diff)
  • Keywords upload-helper added; helper removed
Note: See TracTickets for help on using tickets.