[tahoe-dev] Real-world Tahoe-LAFS grid deployment
Brian Warner
warner at lothar.com
Tue Nov 16 17:29:13 UTC 2010
On 11/15/10 4:00 PM, Nathan Eisenberg wrote:
>> But you want to see more than one share per machine, right? One
>> machine, one tahoe node, but up to 24 shares land there, yeah?
>
> Nope - one machine, one tahoe node, and one share. At least in grids
> with >10 servers, this makes a lot of sense to me.
Ah, now I get it. So at the OS level you could use LVM or some
equivalent to merge all those disks together into a single volume and
let tahoe see just the one. Or a messier and more manual level you could
symlink e.g. storage/shares/[012]* to /mnt/disk01/shares,
storage/shares/[345]* to /mnt/disk02/shares, etc, and rely upon the
usually-not-too-bad random distribution of storage-index to use the
disks at about the same rate.
The Tahoe-layer enhancement would be to give tahoe a list of directories
(instead of just the single storage/shares/) and have it internally come
up with some mapping from storage-index to directory, or have it search
multiple places on read.
In a sense, by desiring this, you're putting more emphasis on the
unreliability of the server hardware (CPU) itself. (I guess that my
inclination, one tahoe node per disk, shows that I'm putting more
emphasis on the unreliability of the disks, and assuming that the CPU
will survive longer.. the few failures that AllMyData recorded show that
your assumptions are more accurate than mine).
I'd be worried about making one node rely upon 24 disks though: the
combined MTBF of that batch would be uncomfortably short. So you'd want
a merging technique in which a failed disk only loses the shares that
were on that one disk, but lets you keep running with the shares from
the remaining disks. I think that rules out LVM. You'd also want to pay
attention to what the kernel does to the tahoe process when it tries to
read from a now-dead disk: ideally you want it to give up right away,
rather than hanging for a long time.. I guess that depends on the exact
failure mode. Once the disk is unmounted, the
os.path.exists(storageindex) will return False and we'll return a NAK
right away.
cheers,
-Brian
More information about the tahoe-dev
mailing list