[tahoe-dev] Largest Scale of Tahoe grids

sickness sickness at tiscali.it
Fri Nov 4 14:09:18 UTC 2011


On Fri, Nov 04, 2011 at 07:04:54AM -0600, Shawn Willden wrote:
> On Fri, Nov 4, 2011 at 1:42 AM, Jimmy Tang <jcftang at gmail.com> wrote:
> >
> > Also assuming that I do build a 100tb tahoe-lafs system across say 6 machines
> 
> You'd be better off using more machines. ?Larger numbers of storage
> nodes means that you can configure your system to use less redundancy
> in the encoding, and hence "waste" less storage space, for the same
> level of reliability.
> 
> Suppose that you build nodes which individually have, say, a 99%
> chance of surviving for a year. ?If you have 6 nodes, you can choose N
> between 1 and 6, and k between 1 and N.  Each pair of choices gives
> you an expansion factor of N/k meaning you can store 100/(N/k)
> terabytes and it also gives you a probability p that a given file is
> lost.  See my lossmodel paper (in the Tahoe docs) for how to calculate
> p.
> 
> Here's a table of the options for a six-node grid.  k and N are the
> Tahoe encoding parameters, C is the capacity of the resulting grid and
> p is the probability that a given file is lost, assuming you have a
> direct URI to it (directory trees complicate things and lower
> probability of survival).
> 
> k      N      C      p
> =      =      =      =
> 1	1	100	1E-2
> 1	2	50	1E-4
> 2	2	100	2E-2
> 1	3	33	1E-6
> 2	3	67	3E-4
> 3	3	100	3E-2
> 1	4	25	1E-8
> 2	4	50	4E-6
> 3	4	75	6E-4
> 4	4	100	4E-2
> 1	5	20	1E-10
> 2	5	40	5E-8
> 3	5	60	1E-5
> 4	5	80	1E-3
> 5	5	100	5E-2
> 1	6	17	1E-12
> 2	6	33	6E-10
> 3	6	50	1E-7
> 4	6	67	2E-5
> 5	6	83	1E-3
> 6	6	100	6E-2
> 
> If you look at the table a little, you'll see that the best
> combinations of capacity and reliability are towards the bottom, with
> N=5 or N=6.  If you choose a target reliability level, say p < 1E-6,
> then your best option is k=3, N=6 which only gives you a capacity of
> 50T.
> 
> However, if you increase that to 10 nodes, then you can choose k=7,
> N=10 and have a capacity of 71T with p=2E-7.  If you go to 20 nodes
> your capacity at that reliability level increases to 80T.  And so on.
> 
> When you analyze capacity at a given reliability, or reliability at a
> given capacity, you will find that the math always favors larger N,
> and therefore larger numbers of nodes.
> 
> Of course, you also have to trade that off against cost.  It would be
> easy to factor that into the model as well.
> 
> As to your original question; I haven't noticed any file size
> limitations with Tahoe.  I've stored files as large as 2 GB.  And the
> architecture imposes no limits on total grid size, either.  You will
> want to look at what your bandwidth limitations might do, keeping in
> mind that Tahoe limits your upload (and maybe download?) speed to that
> of the slowest node in the grid.
> 
> --
> Shawn
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
> 

I've stored 8Gb files on a 10nodes k5 n10 grid and it works like a charm :)


More information about the tahoe-dev mailing list