[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3787: Is the use of Pipeline for write actually necessary?

Mon Oct 4 13:27:33 UTC 2021

#3787: Is the use of Pipeline for write actually necessary?
--------------------------+-----------------------------------
     Reporter:  itamarst  |      Owner:
         Type:  task      |     Status:  new
     Priority:  normal    |  Milestone:  HTTP Storage Protocol
    Component:  unknown   |    Version:  n/a
   Resolution:            |   Keywords:
Launchpad Bug:            |
--------------------------+-----------------------------------

Comment (by itamarst):

 Brian provided this highly detailed explanation:

 {{{

 If my dusty memory serves, the issue was that uploads have a number of
 small writes (headers and stuff) in addition to the larger chunks
 (output of the erasure coding). Also, the "larger" chunks are still
 pretty small. And the code that calls _write() is going to wait for the
 returned Deferred to fire before starting on the next step. So the
 client will send a tiny bit of data, wait a roundtrip for it to be
 accepted, then start on the next bit, wait another roundtrip, etc. This
 limits your network utilization (the percentage of your continuous
 upstream bandwidth that you're actually using): the wire is sitting idle
 most of the time. It gets massively worse with the round trip time.

 The general fix is to use a windowed protocol that optimistically sends
 lots of data, well in advance of what's been acknowledged. But you don't
 want to send too much, because then you're just bloating the transmit
 buffer (it all gets held up in the kernel, or in the userspace-side
 socket buffer). So you send enough data to keep X bytes "in the air",
 unacked, and each time you see another ack, you send out more. If you
 can keep your local socket/kernel buffer from ever draining to zero,
 you'll get 100% utilization of the network.

 IIRC the Pipeline class was a wrapper that attempted to do something
 like this for a RemoteReference. Once wrapped, the caller doesn't need
 to know about the details, it can just do a bunch of tiny writes, and
 the Deferred it gets back will lie and claim the write was complete
 (i.e. it fires right away), when in fact the data has been sent but not
 yet acked. It keeps doing this until the sent-but-not-acked data exceeds
 the size limit (looks like 50kB, OMG networks were slow back then), at
 which point it waits to fire the Deferreds until something actually gets
 acked. Then, at the end, to make sure all the data really *did* get
 sent, you have to call .flush(), which waits until the last real call's
 Deferred fires before firing its own returned Deferred.

 So it doesn't reduce the number of round trips, but it reduces the
 waiting for them, which should increase utilization significantly.

 Or, it would, if the size limit were appropriate for the network speed.
 There's a thing in TCP flow control called "bandwidth delay product"[1],
 I forget the details, but I think the rule is that bandwidth times round
 trip time is the amount of unacked data you can have outstanding "on the
 wire" without 1: buffering anything on your end (consumes memory, causes
 bufferbloat) or 2: letting the pipe run dry (reducing utilization). I'm
 pretty sure the home DSL line I cited in that ticket was about 1.5Mbps
 upstream, and I bet I had RTTs of 100ms or so, for a BxD of 150kbits of
 20kB. These days I've got gigabit fiber, and maybe 50ms latency, for a
 BxD of 6MB.

 As the comments say, we're overlapping multiple shares during the same
 upload, so we don't need to pipeline the full 6MB, but I think if I were
 using modern networks, I'd increase that 50kB to at least 500kB and
 maybe 1MB or so. I'd want to run upload-speed experiments with a couple
 of different networking configurations (apparently there's a macOS thing
 called "Network Link Conditioner" that simulates slow/lossy network
 connections) to see what the effects would be, to choose a better value
 for that pipelining depth.

 And of course the "right" way to do it would be to actively track how
 fast the ACKs are returning, and somehow adjust the pipeline depth until
 the pipe was optimally filled. Like how TCP does congestion/flow
 control, but in userspace. But that sounds like way too much work.
 }}}

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3787#comment:1>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage