[tahoe-dev] Node correlations - [Was] best practice for wanting to setup multiple tahoe instances on a single node

Olaf TNSB still.another.person at gmail.com
Wed Jan 11 02:50:17 UTC 2012


Possibly going slightly OT...

On 11/01/2012 11:31 AM, "Greg Troxel" <gdt at ir.bbn.com> wrote:
> I think the key point is about the redundancy that you have vs the
> redundancy that tahoe perceives - it seems dangerous to have 20 nodes
> that appear independent but are all actually on the same box.  If they
> are on 20 physical disks that are independent enough that if the box
> fails you can reconstitute all but of the nodes, it might be ok, but it
> seems subject to correlated failures.
>
> If there were a way to have the nodes express their correlation groups,
> and the share placement be aware of this, then I think it might be ok.
> But we don't have that yet, and thus it seems to me that if you want the
> survivability properties tahoe gives you, you should run only one node
> per computer.  And perhaps only one node per site, if you want that kind
> of redundancy.

I think this is an area that would really make Tahoe more acceptable in a
wider community.  I've been trying to frame the questions but will jump in
semi-prepared...

The lack of "correlations" (nice conceptual name!) between hosts is a major
limitation for my intended usage.

I *think* I want to be able to define both physical proximity and network
cost. That is, I want to be able to specify that enough shares are on my
local nodes for me to download files (cheap bandwidth) BUT that there are
enough shares elsewhere (i.e. far enough away) that a local disaster
doesn't destroy all copies.

What I was hoping for was to turn some/most/all machines at all our sites
into Tahoe nodes and thus reduce and remove the need for secure, hosted,
offsite backups.

Most machines seem to under utilize their drives, CPU and network so a
Tahoe backup grid seems to be cost effective.

*Of course* there is the issue that getting a file from the network is
location independent so, in my ill-defined ramblings, what does "enough
local nodes" mean?  I can see that for 3 data sites of multiple nodes each
it means 3 shares each (for a 3 of 10 setups), but a generalized solution???

Has anyone run into this or thought about it?

Happy to contribute to discussions to flesh out a general use case.

Olaf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120111/1eeddd9a/attachment.html>


More information about the tahoe-dev mailing list