[tahoe-dev] Perf-related architecture question

Sat Jul 24 04:58:21 UTC 2010

On Fri, Jul 23, 2010 at 8:18 PM, Kyle Markley <kyle at arbyte.us> wrote:
> It's the weekend now, so I have some time to run benchmarks, both for
> ticket #809 and for other tweaks of interest.

Whoo!

Your overall plan sounds perfect.

Inasmuch as you can script this so that other people can replicate
your benchmark it would be great. I know some of the steps are not
that convenient to script, but maybe you could just insert comments
like "... and here you login to each of your storage servers and
remove the shares.".

Of course we want thorough, specific notes of what you did so that
other people could replicate your benchmark manually.

Ideally we would want to have fully automated benchmarks that we could
run on the buildbots and collect all the results. We used to have a
simple benchmark on the buildbots but it stopped working because the
test grid that it was using died.

Also, why don't you capture at least one of the runs using tcpdump (or
whatever the kids are using nowadays -- wireshark?) to capture the
metadata (but not the bulk data) of all the packets. We can then use a
visualization tool (wireshark) to see when packets were sent in each
direction. We'll probably see the causes of the low network
utilization jump right out of that visualization.

> 1) pipeline_size (in WriteBucketProxy.__init__)
> 2) Segment size (ticket #809) (in class Client?)
>  - I'll try each power of two from 32KiB to 4MiB.
> 3) Any other parameters worth playing with?

I don't think you can test this one on your physical setup, but number
of storage servers and the configuration parameter M (total shares
produced) are two more variables that probably have a large effect.
The bigger M is then the more storage servers a single upload can take
advantage of. (Note: this is assuming that K/M is held constant, so if
you double M then you also double K.) Probably just don't worry about
that one for this experiment.

> When I edit these parameters, is it sufficient to edit them only on the
> client, or do I also need to edit them on the storage node?

Client side only.

> Do I need to benchmark the cross-product of the varying parameters, or if
> not, what value should I use when holding one parameter constant and
> varying the other?

I think the only pipeline sizes that are interesting are:

1. less than segsize/K
2. between segsize/K and 2*segsize/K
3. between 2*segsize/K and 3*segsize/K, etc.

This is because the size of a block is segsize/K and the pipeline
feature only controls the sending or delaying of entire blocks, never
of partial blocks.

So don't bother trying varying pipeline sizes which are in the same
"segsize/K" class as another run you've already measured.

One strategy would be to find an optimal pipeline size for your
network first, by keeping segsize at default and varying pipeline
size, and then once you've found that, start doubling both segsize and
pipeline size for each subsequent run to find the optimal segsize
given that fixed ratio of segsize to pipeline size.

Thanks for doing this! I'm excited about getting some hard empirical
data to inform our performance engineering. If you write up your
results in the form of a text file we might want to check it into the
docs/ directory for future reference.

Regards,

Zooko