[tahoe-dev] Observations on Tahoe performance

Mon Aug 24 12:07:12 PDT 2009

On Wednesday 19 August 2009 02:20:54 pm Brian Warner wrote:
> In the allmydata.com world, we deferred performance analysis
> and improvement for two reasons: most consumer upload speeds were really
> bad (so we could hide behind that), and we don't have any
> bandwidth-management tools to avoid saturating their upstream and thus
> causing problems to other users (so e.g. *not* having a windowing
> protocol could actually be considered a feature, since it left some
> bandwidth available for the HTTP requests to get out).

Yes, that is another issue that will be important for me to address.  At the 
moment, I'm using traffic shaping to solve it.  That works very well, 
especially when the upload is going through a helper, because all immutable 
uploads are routed through a single TCP connection.

> 1: The process of uploading an immutable file involves a *lot* of
> roundtrips, which will hurt small files a lot more than large ones. Peer
> selection is done serially: we ask server #1, wait for a response, then
> ask server #2, wait, etc.

That clearly adds a lot of time overhead, but not traffic overhead, which is 
why parallelizing uploads works so well.

> We're thinking about switching away from Foolscap for share-transfer and
> instead using something closer to HTTP (#510). This would be an
> opportunity to improve the RTT behavior as well: we don't really need to
> wait for an ACK before we send the next block, we just need confirmation
> that the whole share was received correctly, and we need to avoid
> buffering too much data in the outbound socket buffer.

Yeah, let TCP handle making sure the whole share arrives, then hash to verify.  
Why the concern about data buffered in the outbound socket?

> In addition, we 
> could probably trim off an RTT by changing the semantics of the initial
> message, to combine a do-you-have-share query with a
> please-prepare-to-upload query. Or, we might decide to give up on
> grid-side convergence and stop doing the do-you-have-share query first,
> to speed up the must-upload case at the expense of the
> might-not-need-to-upload case.

I really, really like grid-side convergence.  I'd vote for keeping it and 
combining the message semantics.

> 3: Using a nearby Helper might help and might hurt.

I haven't tried using a nearby helper.  My helper is in a co-lo somewhere, on 
a multi-gigabit connection to the backbone.  I've only tried:

1.  My app -> local node -> helper -> grid
2.  My app -> helper (using helper as client) -> grid
3.  My app -> local node -> grid

Option 1 seems to give the best performance.  Option 3 obviously sucks because 
it means pushing the FEC-expanded data up my cable modem.  It's not clear to 
me why 1 is better than 2.  Maybe it's just from spreading the CPU load.

	Shawn.