[tahoe-dev] Recommendations for minimal RAM usage ?

Wed Mar 14 20:42:24 UTC 2012

On 3/7/12 3:37 PM, Johannes Nix wrote:

> What Brian does around Ed25519 looks great.... What are the main speed
> bottlenecks in Tahoe-LAFS now? Could it be grid latencies / network
> bandwidth for fast systems and encoding for slow processors? And would
> a parallelized variant of the encoding make sense for a typical
> multicore consumer laptop?

Upload and download some large files and then look at the "Recent
Upload/Download Status" pages for them (linked from the Welcome Page):
that should give you some idea of how much time is spent for various
operations. Zfec encoding and AES encryption are pretty minimal: they
run at 10s or 100s of megabytes per second, at least on regular
laptop/desktop CPUs. SHA256 hashing is also really fast.

The biggest overhead right now appears to be Foolscap serialization and
underutilization of the network (latency and insufficient pipelining). I
wrote a d3.js-based visualizer for the immutable download side: it shows
that the new downloader, while it transfers fewer bytes than the old
one, runs slower in some circumstances because it's now making a lot of
tiny requests, and the per-request overhead is significant. I'd like to
build a similar visualizer for the upload side to investigate how the
latency between subsequent block writes affects total throughput.

Moving from Foolscap to HTTP is likely to help some of this, as is the
protocol rethinking that will be a part of that effort. Revisiting the
encoding protocol is also on the table: at PyCon last week we talked
about appending the block-hash-tree uncle-chain to each block, instead
of having just one big tree, to allow each block read to use a single
contiguous read() call, at a slight cost in storage overhead.

We're really interested in smaller CPUs, like ARM-based FreedomBox /
PogoPlug / NAS boxes. My first interest is in making these into good
servers, which means minimzing the work that a storage server must
perform. Minimizing RAM usage (perhaps by making some features optional,
so we can load less code into memory) should help make them work better
as clients too. And having something to cross-compile the binary
packages that tahoe needs should help get tahoe installed on those boxes
faster: currently it can take a couple of hours to compile everything
necessary. I'd like to get some buildslaves running that just do
performance-testing of a pre-compiled Tahoe build on small boxes (our
current buildslaves are mostly focussed on the build itself).

cheers,
 -Brian