[tahoe-lafs-trac-stream] [Tahoe-LAFS] #397: increase helper fetch blocksize to 1MB
Tahoe-LAFS
trac at tahoe-lafs.org
Sun Aug 16 15:23:15 UTC 2015
#397: increase helper fetch blocksize to 1MB
-------------------------------+---------------------------
Reporter: warner | Owner:
Type: task | Status: new
Priority: major | Milestone: eventually
Component: code-encoding | Version: 1.0.0
Resolution: | Keywords: upload-helper
Launchpad Bug: |
-------------------------------+---------------------------
Changes (by zooko):
* keywords: helper => upload-helper
Old description:
> We have reports from a user with a fast uplink (but perhaps a long
> latency)
> that they are pushing data to the helper very slowly, perhaps 20kBps. We
> might want this to go faster.
>
> One likely culprit is the helper's non-pipelined fetch protocol. It asks
> for
> a 50kB block, waits until that has been received, writes it to disk, then
> asks for the next one. This imposes a hard limit on the inbound data
> rate:
> even with an infinitely fast uplink, we can't fetch faster than
> 50kB/latency.
> For a 100ms RTT, this would be about 500kBps.
>
> But even more likely is the helper simply being overloaded, because the
> real
> denominator in that fetch-rate equation is the end-to-end latency, from
> the
> time that the helper asks for one block to the time it asks for the next
> one.
> In addition to the network latency, the helper is busy doing all sorts of
> other things (like uploading other people's files).
>
> In either case, allowing the client to give us more data per request
> would
> increase their throughput. I picked 50kB because it felt like a
> reasonable
> value. My notes in source:src/allmydata/offloaded.py#L338 say:
>
> read data in 50kB chunks. We should choose a more considered number
> here,
> possibly letting the client specify it. The goal should be to keep the
> RTT*bandwidth to be less than 10% of the chunk size, to reduce the
> upload
> bandwidth lost because this protocol is non-windowing. Too large,
> however,
> means more memory consumption for both ends. Something that can be
> transferred in, say, 10 seconds sounds about right. On my home DSL line
> (50kBps upstream), that suggests 500kB. Most lines are slower, maybe
> 10kBps,
> which suggests 100kB, and that's a bit more memory than I want to hang
> on
> to, so I'm going to go with 50kB and see how that works.
>
> But, as we've learned, people have some remarkably fast uplinks these
> days.
> The main downside of increasing the blocksize is memory consumption: both
> client and helper will use about 2xblocksize for each upload operation
> that's
> happening in parallel. Zandr tells me to not worry about memory usage on
> this
> scale. So a 1MB blocksize could be reasonable.
New description:
We have reports from a user with a fast uplink (but perhaps a long
latency)
that they are pushing data to the helper very slowly, perhaps 20kBps. We
might want this to go faster.
One likely culprit is the helper's non-pipelined fetch protocol. It asks
for
a 50kB block, waits until that has been received, writes it to disk, then
asks for the next one. This imposes a hard limit on the inbound data rate:
even with an infinitely fast uplink, we can't fetch faster than
50kB/latency.
For a 100ms RTT, this would be about 500kBps.
But even more likely is the helper simply being overloaded, because the
real
denominator in that fetch-rate equation is the end-to-end latency, from
the
time that the helper asks for one block to the time it asks for the next
one.
In addition to the network latency, the helper is busy doing all sorts of
other things (like uploading other people's files).
In either case, allowing the client to give us more data per request would
increase their throughput. I picked 50kB because it felt like a reasonable
value. My notes in source:src/allmydata/offloaded.py#L338 say:
read data in 50kB chunks. We should choose a more considered number here,
possibly letting the client specify it. The goal should be to keep the
RTT*bandwidth to be less than 10% of the chunk size, to reduce the upload
bandwidth lost because this protocol is non-windowing. Too large,
however,
means more memory consumption for both ends. Something that can be
transferred in, say, 10 seconds sounds about right. On my home DSL line
(50kBps upstream), that suggests 500kB. Most lines are slower, maybe
10kBps,
which suggests 100kB, and that's a bit more memory than I want to hang on
to, so I'm going to go with 50kB and see how that works.
But, as we've learned, people have some remarkably fast uplinks these
days.
The main downside of increasing the blocksize is memory consumption: both
client and helper will use about 2xblocksize for each upload operation
that's
happening in parallel. Zandr tells me to not worry about memory usage on
this
scale. So a 1MB blocksize could be reasonable.
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/397#comment:3>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list