[tahoe-dev] Perf-related architecture question

Thu Jul 22 06:47:02 UTC 2010

On Thu, Jul 22, 2010 at 12:38 AM, Kyle Markley <kyle at arbyte.us> wrote:
>
> This was exciting to read.  The encrypt+encode/transfer ping-pong
> guarantees that we will either be using CPU, or network, but not both
> simultaneously, leading to low utilization of both.  I'm very handy with
> threading and googled up some information on the Python threading model...
> and then I learned about the GIL, which guarantees very low returns to
> multithreading.  (And this sort of circumstance is best solved by
> multithreading, not multiprocessing.)  I was excited about writing a bit of
> code that would use my threading skills while getting me to learn a new
> language and contribute to a great project, then had my dreams crushed by
> learning that the dominant Python interpreter is thread-hostile... so why
> bother?  :(

This analysis is wrong. Tahoe-LAFS v1.7.1 has low utilization of
network bandwidth, but this has nothing to do with multithreading or
the Python GIL and everything to do with states where the client waits
to hear back from the server before it takes the next step. In other
words, it is all about lack of pipelining in the upload/download
protocols.

At least, that's my assumption. I don't think anybody has yet measured
carefully enough to prove the actual causes of the low network
utilization. Maybe you could help with that! (See e.g. #809, but you
might have better ideas for how to figure this out.)

But although I'm not sure that fully pipelining the upload/download
protocols would by itself achieve high network utilization, I am sure
that multithreading inside the Python interpreter would have little or
no effect on network utilization.

Regards,

Zooko

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/809# Measure how segment
size affects upload/download speed.