[tahoe-dev] Benchmark and scaling figures and how to lockdown a tahoe-lafs grid...

Jimmy Tang jcftang at gmail.com
Sun Nov 7 22:41:35 UTC 2010


>
> Even with 40 machines, you shouldn't run into any issues with scalability
> from a performance perspective.  The commercial allmydata site had far more
> than that.
>
> One thing you'll want to think carefully about is your choice of encoding
> parameters M and K.  What you want to make sure won't happen is for M-K
> servers to be down simultaneously, because if you have enough files, the
> loss of any M-K servers will make some of those files unavailable.  If you
> have deep directory trees, it becomes very likely that any given file is
> unavailable, because losing any directory node between root and leaf makes
> the leaf unavailable.
>

this may sound like a silly question, I was reading through the
configuration.txt file again to look up the M and K values, but which one is
the M value?

shares.needed = (int, optional) aka "k", default 3
shares.total = (int, optional) aka "N", N >= k, default 10
shares.happy = (int, optional) 1 <= happy <= N, default 7

I get the impression M is the total number of shares (or machines?)


>
> With only three servers, if the data is important I'd go with K=1, M=3.
> That means that each server will store all of the data for all of the
> files.  K=2 would allow you to lose only one server, a second problem would
> result in the loss of all data.
>
> With 40 servers, I'd set M close to 40.  Maybe 35.  Then I'd set K to about
> 25.  That means you'd have to lose 10 servers before any healthy files were
> lost.  Losing 16 servers would lose all of your data, but it's vanishingly
> unlikely that any independent sort of failure mode would take out that many
> before you could run a repair, and any failure that affects that many would
> probably affect all of them anyway.
>
> If you're paranoid you could go with K=20 or even K=15, but I don't think
> your reliability would be any higher in practice.
>
> Note that Tahoe has not been tested much with larger values of K and M.
> Nearly all usage has been with the default values of K=3, M=10.  I wouldn't
> expect that using larger values would uncover data-loss bugs, but it might
> uncover some performance issues.   Probably not, but you'd need to test to
> be sure.
>
> One thing you wouldn't need to worry about much, IMO, is performance
> degrading as more files are added.  Performance is dependent on bandwidth,
> number of servers and encoding parameters, it's not really sensitive to data
> volumes or file counts.
>
>

thanks for the above notes, its certainly has clarified a lot for me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20101107/a37bc1a6/attachment.html>


More information about the tahoe-dev mailing list