[tahoe-dev] Real-world Tahoe-LAFS grid deployment

Brian Warner warner at lothar.com
Mon Nov 15 22:57:22 UTC 2010


On 11/14/10 5:48 PM, Nathan Eisenberg wrote:
> 
> No - no RAID - same disk approach as if you were running 24 nodes -
> except the node is configured to use /mnt/disk1, /mnt/disk2,
> /mnt/disk3, etc, instead of having a node process for /mnt/disk1,
> /mnt/disk2, /mnt/disk3, etc.
> 
> In this way, a 24 disk server only shows up to the gateway as a single
> tahoe node, which is desirable.

But you want to see more than one share per machine, right? One machine,
one tahoe node, but up to 24 shares land there, yeah?

Hmm. We'd have to change a couple of remote APIs to make that work:
currently the client uses the nodeid of each server (i.e. the hash of
their Foolscap SSL cert) to figure out which shares to send to them. The
assumption of one-cert-per-server is wired in a bit deeper than it
really ought to be, but it made a lot of the code easier to write and
make secure.

Some of that hardwiredness may get cleaned up in the
signed-announcement-dicts work I'm doing, since I'm trying to lay the
groundwork for a move away from Foolscap and towards signed HTTP
requests instead. If that works, it should get rid of the 'write
enabler' shared secret, and thus remove the need for a confidential
channel between client and server, which is a big part of why we index
storage servers by their cryptographic id.

The easiest way to do multiple-servers-per-process would be to run
multiple Tubs per process (one per server). The win, I suppose, would be
reduced memory usage (sharing the Python interpreter and Tahoe code
between all the servers). The loss would be reduced parallelism: that
one process would be doing blocking disk IO one share at a time, whereas
running 24 separate processes would give the kernel scheduler more IO
parallelism to work with, as well as being able to take advantage of
multiple cores. Tahoe is intentionally single-threaded.

I'd definitely recommend benchmarking the two approaches for comparison.
I'd be worried that single-tahoe-node-for-multiple-disks would be a lot
slower, under load, than running 24 separate tahoe nodes, one per disk.

cheers,
 -Brian


More information about the tahoe-dev mailing list