[tahoe-dev] Benchmark and scaling figures and how to lockdown a tahoe-lafs grid...

Shawn Willden shawn at willden.org
Sun Nov 7 20:30:19 UTC 2010


On Sun, Nov 7, 2010 at 10:20 AM, Jimmy Tang <jcftang at gmail.com> wrote:

>
> How many storage nodes would your system have?
>>
>
> I'm going to start with 3 servers at first, I have about 40 old machines
> that i can re-provision for doing this if I really need to.
>

Even with 40 machines, you shouldn't run into any issues with scalability
from a performance perspective.  The commercial allmydata site had far more
than that.

One thing you'll want to think carefully about is your choice of encoding
parameters M and K.  What you want to make sure won't happen is for M-K
servers to be down simultaneously, because if you have enough files, the
loss of any M-K servers will make some of those files unavailable.  If you
have deep directory trees, it becomes very likely that any given file is
unavailable, because losing any directory node between root and leaf makes
the leaf unavailable.

With only three servers, if the data is important I'd go with K=1, M=3.
That means that each server will store all of the data for all of the
files.  K=2 would allow you to lose only one server, a second problem would
result in the loss of all data.

With 40 servers, I'd set M close to 40.  Maybe 35.  Then I'd set K to about
25.  That means you'd have to lose 10 servers before any healthy files were
lost.  Losing 16 servers would lose all of your data, but it's vanishingly
unlikely that any independent sort of failure mode would take out that many
before you could run a repair, and any failure that affects that many would
probably affect all of them anyway.

If you're paranoid you could go with K=20 or even K=15, but I don't think
your reliability would be any higher in practice.

Note that Tahoe has not been tested much with larger values of K and M.
Nearly all usage has been with the default values of K=3, M=10.  I wouldn't
expect that using larger values would uncover data-loss bugs, but it might
uncover some performance issues.   Probably not, but you'd need to test to
be sure.

One thing you wouldn't need to worry about much, IMO, is performance
degrading as more files are added.  Performance is dependent on bandwidth,
number of servers and encoding parameters, it's not really sensitive to data
volumes or file counts.

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20101107/f688cf37/attachment.html>


More information about the tahoe-dev mailing list