[volunteergrid2-l] Finally starting to use VG2

Wed Oct 19 12:25:02 PDT 2011

One more note about parameters:

My current settings (7/12/12) are pretty optimistic in one respect:
 Requiring 12 shares when only 13 servers are in the grid means that if a
couple of servers hiccup, or maybe even if they're just too slow to respond,
my uploads fail.

I'm experiencing a failure every hour or so due to this which crashes my
backup process (I'm not using GridBackup right now; it would be more
resilient but there are other issues -- I do plan to use it for my backups,
but I'm just using tahoe backup to get the file data into the grid).

On the bright side, I'm also often seeing upload speeds that are much higher
than what I earlier reported.  I'm getting numbers as high as 350 KB/s,
which when you factor in expansion is basically maxing out my upstream
connection (~50 Mbps).  That's awesome!

On Wed, Oct 19, 2011 at 9:15 AM, Shawn Willden <shawn at willden.org> wrote:

> (Sorry for the flurry of e-mails this morning)
>
> I'm happy to announce that I'm finally starting to actually use VG2 as my
> primary backup system.  I think we have the necessary number of
> highly-reliable storage servers now to make in eminently usable.  Over the
> next month or two (however long it takes), I'll be storing about 200 GB in
> the grid.  I'm getting a per-file upload rate of about 130 KB/s, which is
> pretty decent.  Assuming that were constant, my data would take about 17
> days to upload, but there are pauses between files so I expect it to take
> considerably longer.
>
> My Tahoe redundancy settings are 7/12/12.  Since there are currently 13
> active nodes that means that all of you should be seeing about 17 KB/s of
> traffic from me (130 KB/s * 12/7 = 223 KB/s total upstream traffic from me,
> divided by 13 active nodes = ~17 KB/s).
>
> If we can get up to 20 reliable nodes, I'll probably change my settings to
> 11/17/17, or maybe even 12/18/18.  My rationale for these settings, BTW, is
> based on some calculations done using some utility functions embedded in
> Tahoe.  To get them, I did:
>
> $ *cd tahoe/src*
> $ *python*
> Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> *import allmydata.util.statistics as s*
> >>> *s.pr_file_loss([.95]*12, 7)*
> 1.1107789644043027e-05
> >>> *s.pr_file_loss([.95]*17, 10)*
> 6.3136217078519583e-07
> >>> *s.pr_file_loss([.95]*17, 11)*
> 9.7284215413383693e-06
> >>> *s.pr_file_loss([.95]*18, 11)*
> 1.0862151393128548e-06
> >>> *s.pr_file_loss([.95]*18, 12)*
> 1.5228007433536422e-05
> >>> *s.pr_file_loss([.95]*18, 13)*
> 0.00017196620536118082
>
> The pr_file_loss() function computes the probability that a single file
> will be lost, based on the arguments, which are:
>
>
>    - A list of server reliability probabilities (I'm assuming our servers
>    are 95% reliable)
>    - The number of shares required to reconstruct the file.
>
>
> The list of server reliabilities actually represents the reliabilities of
> the shares deployed, so you should use your shares.happy value.  In earlier
> version of Tahoe-LAFS there was a further complication that multiple shares
> could be delivered to one server and happiness still achieved.  It's
> possible to construct a probability function that models that, but it's a
> little more complicated.  With the newest versions of Tahoe (1.8 and newer,
> I think), that shouldn't be an issue.
>
> Note that this function also effectively assumes that either you have the
> URI of the file or that shares.happy represents all or nearly all the nodes
> in the grid.  If shares.happy is a small percentage of the nodes in the grid
> then there's another complication because the expected reliability of each
> file becomes (arguably) independent of the other files.  Since you typically
> don't have the direct URI of a given file, that situation means you have to
> consider the possibility that the directory nodes between your "root"
> directory and the target file might be lost.  It also means that since you
> really want *all* of your files to survive, you should really choose a
> per-file reliability target that ensures that the probability of all files
> surviving is acceptably high.  But if shares.happy is pretty close to the
> total number of nodes in the grid then all of that complexity goes away,
> because your files will basically all live or die together.
>
> The downside of setting shares.happy to a number very close to the size of
> the grid is that a few nodes being down means you can't upload files
> successfully (i.e. poor write-availability).  But that's why we demand high
> availability of our storage nodes :-)
>
> One other note:  I was intentionally a little vague about the term
> "reliability".  Tahoe devs usually use two terms, "reliability" and
> "availability".  "Availability" represents the probability that your file is
> available at any given time T in the future, which is dependent on the
> availability of the shares needed to recover your file at time T.
>  "Reliability" represents the probability that your file ever becomes
> available at or after some time T in the future.  In other words,
> reliability is kind of a limit function of availability.  In practical
> terms, file availability means that the servers holding the necessary shares
> are up when you look, while reliability means the shares actually still
> exist on nodes that will be up sometime.
>
> As a matter of conservatism, I use the expected availability figure as the
> expected reliability figure, since barring catastrophe reliability is
> strictly higher than availability.  My goal is 99.99% reliability, so I look
> for settings that give me pr_file_loss < 1e-4.
>
> For more details on availability/reliability and how the computations are
> done see my lossmodel paper, which is in the Tahoe source tree (in
> docs/proposed/lossmodel.lyx), or at http://goo.gl/UtDeH
>
> --
> Shawn
>

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20111019/e2e9e3d3/attachment.html>