[tahoe-dev] Tahoe-LAFS v1.8.0 potentially delayed by performance issue

Brian Warner warner at lothar.com
Fri Aug 13 21:51:01 UTC 2010


On 8/13/10 12:03 PM, Wayne Scott wrote:
> Does it matter what version the machine in the cluster are running?
> 
> -Wayne

Nope. None of the server-side code changed from 1.7 to 1.8.

The expected speedups of the new-downloader code in 1.8 are on small
files (<100KB): specifically 1.8 should have much less per-file startup
overhead (fewer round trips to get things moving). Nathan's tests were,
I believe, on lots of small files.

Zooko's experiences have been on larger files (multiple GB), but the
most specific difference is that he's been using the volunteergrid (or
the pubgrid, or the testgrid.. I've lost track of how many grids we have
these days). In particular, the servers on that this grid vary wildly in
their speeds and bandwidths. There are some servers that take *five
minutes* to respond to the initial "do you have a share?" query.
In addition, the variable reliability of the servers tends to result in
shares "bunching up" on the servers: instead of the ideal
one-share-per-server, many servers have two or three shares.

One difference between the downloaders in 1.7 and 1.8 is the way they
choose which shares (and servers) to use. The 1.7 approach is less
likely to fall into a situation where we pull multiple shares from a
single server, whereas this is fairly common for the 1.8 downloader.
When that server does not have enough outgoing bandwidth, this clearly
won't run as fast. We're considering some code changes that might help
with this, although the full solution is too big for the 1.8 timeframe.
These changes would probably make it behave as well as the 1.7
downloader, but still not ideal.

It is entirely possible that, given the unreliability/slowness of the
grid, Zooko's observed performance problems are a function of the
servers that his download happened to use, more than the version of the
downloader. But perhaps not. So we're still investigating.

If you see unexpected performance problems in 1.8, please grab a copy of
the download status page so we can investigate. From the front "Welcome"
web page, follow the tiny "Recent Uploads and Downloads" link, find the
"download" row that corresponds to your download, and then save the
HTML. This will show us which servers responded and when. The 1.7
download status page has less information, but is still useful, so if
you're doing comparisons, consider saving both.

Ideally we'd like to see a repeatable example of a download on the
pubgrid that is fast/smooth in 1.7 and slow/choppy in 1.8 and uses the
same servers in both: that would be the most-useful bit of evidence.

thanks,
 -Brian


More information about the tahoe-dev mailing list