[volunteergrid2-l] Recommended settings

Sun Jun 26 23:52:53 PDT 2011

Good questions!

On Sun, Jun 26, 2011 at 10:38 PM, Marco Tedaldi <marco.tedaldi at gmail.com>wrote:
>
> Client Settings:
> In the wiki it is recommended to set the "shares total" and the "shared
> happy" to the same value (15).
> As I understand, the network is on 11 servers right now, so a value of
> 15 would not work.
>

Right.  That number was chosen on the assumption of 20 servers.  Right now
we have 9, yours will make 10, and Brad Rupp's second server will make 11.

> Is it still adviseable to set the "shares total" and the "shares happy"
> to the same value?
>

It depends on what you want to accomplish.

If you set shares-happy lower than shares-total, then when there are less
than shares-total nodes available (but >= shares-happy), then your upload
will succeed.  Success is good... but personally for backup usage I would
prefer that the upload fail if I'm not getting the full degree of redundancy
that I want.

If you really are "happy" with one level of redundancy but willing to accept
more, then set shares-total and shares-happy to the appropriate values.  For
me, they're the same thing.

> Which value is recommended (it does not seem adviseable to set this to
> the total number of nodes since with only one node down, I could not
> commit any data anymore)?
>

In practice, given the very high uptime we have you might actually find it's
fine to set it to the total number of nodes.  But it's probably a good idea
to leave a little slack.  With your server we'll have 10, so I'd go 8 or 9.

> What level of redundancy is recommended (shares needed)?
>

That is indeed a fascinating question :-)

My approach to answering it for myself is statistical in nature.  I choose
an acceptable probability of loss, take a (conservative) guess at the
reliability of the nodes in the system and then find a K value that meets
the goal.  To do the calculation, I use a little tool I wrote that's part of
the tahoe package.  It doesn't have a user-friendly interface, but it works.
 To use it, open a shell and go to the tahoe/src directory, then "python" to
get an interactive python interpreter.  To load the tool, run

import allmydata.util.statistics as s

You then have access to a set of functions in the module "s", the most
useful of which is "pr_file_loss", meaning "probability of file loss".  It
takes two parameters.  The first is a list of server reliabilities.  I
usually assume 95%, though we achieve better than that.  The second is a
value of K.  So:

s.pr_file_loss([.95, .95, .95, .95, .95, .95, .95, .95], 3)

will calculate the probability that a file with 8 shares on servers with 95%
reliability (over some time period), any three of which are needed to
recover it, will be lost (within the time period).  The result, by the way
is 4E-7, which is very good.  My goal is 99.99% reliability, so with 8 nodes
I'd probably set K=4, which gives 1.5E-5 probability of loss.  With 9 nodes
I'd probably use K=5 which gives pr_file_loss of 3E-5.

Oh, and though I spelled out the whole list of server reliabilities for
clarity, a shorthand is:

s.pr_file_loss([.95]*8, 3)

Server Settings:
> What settins should be choosen for "pruning"?
>

I assume you mean expiration?  Expiration should be turned on, as specified
in the wiki.

> Would my storage be filled up infinitely with pruning disabled?
>

Yes.  Without expiration, shares stick around forever.

> Will my client automatically start to reupload the data if the
> redundancy is not good enough (below "shares happy") or do I have to
> manually start that process?
>

You need to occasionally run a "deep-check --repair --add-lease" operation.
 This does two important things:  First, it renews your "leases" so that
storage nodes don't delete your shares.  Second, it will repair any files
that have less than shares-total shares deployed (I think... I'm not sure
exactly how that ended up with the changes a while back... someone else want
to comment?).

Usage:
> Is it ok if I start to backup my ~200G photo collection to VG2?
>

Absolutely!

> If I use the backup command, will changed data just be overwritten or
> can there be several versions of a file restored?
>

The older versions will still be in the grid, but I believe backup only
keeps one copy of your directory tree, which means that you won't be able to
find the older versions.  Eventually they'll get garbage-collected.

If I ever get back to my backup tool and finish it, I actually do keep
versions.  But my tool (which uses Tahoe for the storage) isn't in state
where it's really even usable by me, much less anyone else.  When I get my
family settled into our new home, I should have more time...

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110627/178bf764/attachment-0001.html>