[tahoe-dev] Largest Scale of Tahoe grids
sickness
sickness at tiscali.it
Fri Nov 4 14:09:18 UTC 2011
On Fri, Nov 04, 2011 at 07:04:54AM -0600, Shawn Willden wrote:
> On Fri, Nov 4, 2011 at 1:42 AM, Jimmy Tang <jcftang at gmail.com> wrote:
> >
> > Also assuming that I do build a 100tb tahoe-lafs system across say 6 machines
>
> You'd be better off using more machines. ?Larger numbers of storage
> nodes means that you can configure your system to use less redundancy
> in the encoding, and hence "waste" less storage space, for the same
> level of reliability.
>
> Suppose that you build nodes which individually have, say, a 99%
> chance of surviving for a year. ?If you have 6 nodes, you can choose N
> between 1 and 6, and k between 1 and N. Each pair of choices gives
> you an expansion factor of N/k meaning you can store 100/(N/k)
> terabytes and it also gives you a probability p that a given file is
> lost. See my lossmodel paper (in the Tahoe docs) for how to calculate
> p.
>
> Here's a table of the options for a six-node grid. k and N are the
> Tahoe encoding parameters, C is the capacity of the resulting grid and
> p is the probability that a given file is lost, assuming you have a
> direct URI to it (directory trees complicate things and lower
> probability of survival).
>
> k N C p
> = = = =
> 1 1 100 1E-2
> 1 2 50 1E-4
> 2 2 100 2E-2
> 1 3 33 1E-6
> 2 3 67 3E-4
> 3 3 100 3E-2
> 1 4 25 1E-8
> 2 4 50 4E-6
> 3 4 75 6E-4
> 4 4 100 4E-2
> 1 5 20 1E-10
> 2 5 40 5E-8
> 3 5 60 1E-5
> 4 5 80 1E-3
> 5 5 100 5E-2
> 1 6 17 1E-12
> 2 6 33 6E-10
> 3 6 50 1E-7
> 4 6 67 2E-5
> 5 6 83 1E-3
> 6 6 100 6E-2
>
> If you look at the table a little, you'll see that the best
> combinations of capacity and reliability are towards the bottom, with
> N=5 or N=6. If you choose a target reliability level, say p < 1E-6,
> then your best option is k=3, N=6 which only gives you a capacity of
> 50T.
>
> However, if you increase that to 10 nodes, then you can choose k=7,
> N=10 and have a capacity of 71T with p=2E-7. If you go to 20 nodes
> your capacity at that reliability level increases to 80T. And so on.
>
> When you analyze capacity at a given reliability, or reliability at a
> given capacity, you will find that the math always favors larger N,
> and therefore larger numbers of nodes.
>
> Of course, you also have to trade that off against cost. It would be
> easy to factor that into the model as well.
>
> As to your original question; I haven't noticed any file size
> limitations with Tahoe. I've stored files as large as 2 GB. And the
> architecture imposes no limits on total grid size, either. You will
> want to look at what your bandwidth limitations might do, keeping in
> mind that Tahoe limits your upload (and maybe download?) speed to that
> of the slowest node in the grid.
>
> --
> Shawn
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
>
I've stored 8Gb files on a 10nodes k5 n10 grid and it works like a charm :)
More information about the tahoe-dev
mailing list