[volunteergrid2-l] Recommended settings
Shawn Willden
shawn at willden.org
Sun Jun 26 23:52:53 PDT 2011
Good questions!
On Sun, Jun 26, 2011 at 10:38 PM, Marco Tedaldi <marco.tedaldi at gmail.com>wrote:
>
> Client Settings:
> In the wiki it is recommended to set the "shares total" and the "shared
> happy" to the same value (15).
> As I understand, the network is on 11 servers right now, so a value of
> 15 would not work.
>
Right. That number was chosen on the assumption of 20 servers. Right now
we have 9, yours will make 10, and Brad Rupp's second server will make 11.
> Is it still adviseable to set the "shares total" and the "shares happy"
> to the same value?
>
It depends on what you want to accomplish.
If you set shares-happy lower than shares-total, then when there are less
than shares-total nodes available (but >= shares-happy), then your upload
will succeed. Success is good... but personally for backup usage I would
prefer that the upload fail if I'm not getting the full degree of redundancy
that I want.
If you really are "happy" with one level of redundancy but willing to accept
more, then set shares-total and shares-happy to the appropriate values. For
me, they're the same thing.
> Which value is recommended (it does not seem adviseable to set this to
> the total number of nodes since with only one node down, I could not
> commit any data anymore)?
>
In practice, given the very high uptime we have you might actually find it's
fine to set it to the total number of nodes. But it's probably a good idea
to leave a little slack. With your server we'll have 10, so I'd go 8 or 9.
> What level of redundancy is recommended (shares needed)?
>
That is indeed a fascinating question :-)
My approach to answering it for myself is statistical in nature. I choose
an acceptable probability of loss, take a (conservative) guess at the
reliability of the nodes in the system and then find a K value that meets
the goal. To do the calculation, I use a little tool I wrote that's part of
the tahoe package. It doesn't have a user-friendly interface, but it works.
To use it, open a shell and go to the tahoe/src directory, then "python" to
get an interactive python interpreter. To load the tool, run
import allmydata.util.statistics as s
You then have access to a set of functions in the module "s", the most
useful of which is "pr_file_loss", meaning "probability of file loss". It
takes two parameters. The first is a list of server reliabilities. I
usually assume 95%, though we achieve better than that. The second is a
value of K. So:
s.pr_file_loss([.95, .95, .95, .95, .95, .95, .95, .95], 3)
will calculate the probability that a file with 8 shares on servers with 95%
reliability (over some time period), any three of which are needed to
recover it, will be lost (within the time period). The result, by the way
is 4E-7, which is very good. My goal is 99.99% reliability, so with 8 nodes
I'd probably set K=4, which gives 1.5E-5 probability of loss. With 9 nodes
I'd probably use K=5 which gives pr_file_loss of 3E-5.
Oh, and though I spelled out the whole list of server reliabilities for
clarity, a shorthand is:
s.pr_file_loss([.95]*8, 3)
Server Settings:
> What settins should be choosen for "pruning"?
>
I assume you mean expiration? Expiration should be turned on, as specified
in the wiki.
> Would my storage be filled up infinitely with pruning disabled?
>
Yes. Without expiration, shares stick around forever.
> Will my client automatically start to reupload the data if the
> redundancy is not good enough (below "shares happy") or do I have to
> manually start that process?
>
You need to occasionally run a "deep-check --repair --add-lease" operation.
This does two important things: First, it renews your "leases" so that
storage nodes don't delete your shares. Second, it will repair any files
that have less than shares-total shares deployed (I think... I'm not sure
exactly how that ended up with the changes a while back... someone else want
to comment?).
Usage:
> Is it ok if I start to backup my ~200G photo collection to VG2?
>
Absolutely!
> If I use the backup command, will changed data just be overwritten or
> can there be several versions of a file restored?
>
The older versions will still be in the grid, but I believe backup only
keeps one copy of your directory tree, which means that you won't be able to
find the older versions. Eventually they'll get garbage-collected.
If I ever get back to my backup tool and finish it, I actually do keep
versions. But my tool (which uses Tahoe for the storage) isn't in state
where it's really even usable by me, much less anyone else. When I get my
family settled into our new home, I should have more time...
--
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110627/178bf764/attachment-0001.html>
More information about the volunteergrid2-l
mailing list