[volunteergrid2-l] Recommended settings

Mon Jun 27 10:37:17 PDT 2011

hi

On 27.06.2011 08:52, Shawn Willden wrote:
> Good questions!
> 
I take this as a compliment :-)

> 
> On Sun, Jun 26, 2011 at 10:38 PM, Marco Tedaldi <marco.tedaldi at gmail.com>wrote:
>>
>> Client Settings:
>> In the wiki it is recommended to set the "shares total" and the "shared
>> happy" to the same value (15).
>> As I understand, the network is on 11 servers right now, so a value of
>> 15 would not work.
>>
> 
> Right.  That number was chosen on the assumption of 20 servers.  Right now
> we have 9, yours will make 10, and Brad Rupp's second server will make 11.
> 
Nice... 10 Servers up now.

> 
>> Is it still adviseable to set the "shares total" and the "shares happy"
>> to the same value?
>>
> 
> It depends on what you want to accomplish.
> 
> If you set shares-happy lower than shares-total, then when there are less
> than shares-total nodes available (but >= shares-happy), then your upload
> will succeed.  Success is good... but personally for backup usage I would
> prefer that the upload fail if I'm not getting the full degree of redundancy
> that I want.
> 
Maybe set the "shares total" to something higher than the desirde degree
of redundance to get some additional security if available? Ok... might
be a waste of disk space.

> If you really are "happy" with one level of redundancy but willing to accept
> more, then set shares-total and shares-happy to the appropriate values.  For
> me, they're the same thing.
> 
As you said :-)

> 
>> Which value is recommended (it does not seem adviseable to set this to
>> the total number of nodes since with only one node down, I could not
>> commit any data anymore)?
>>
> 
> In practice, given the very high uptime we have you might actually find it's
> fine to set it to the total number of nodes.  But it's probably a good idea
> to leave a little slack.  With your server we'll have 10, so I'd go 8 or 9.
> 
I'll go with 8 for the moment. Set it to 9 as soon as there are 11
servers (maybe).
> 
>> What level of redundancy is recommended (shares needed)?
>>
> 
> That is indeed a fascinating question :-)
> 
As long as it is not my doctor using the word "fascinating"... ;-)

> My approach to answering it for myself is statistical in nature.  I choose
> an acceptable probability of loss, take a (conservative) guess at the
> reliability of the nodes in the system and then find a K value that meets
> the goal.  To do the calculation, I use a little tool I wrote that's part of
> the tahoe package.  It doesn't have a user-friendly interface, but it works.
>  To use it, open a shell and go to the tahoe/src directory, then "python" to
> get an interactive python interpreter.  To load the tool, run
> 
> import allmydata.util.statistics as s
> 
> 
> You then have access to a set of functions in the module "s", the most
> useful of which is "pr_file_loss", meaning "probability of file loss".  It
> takes two parameters.  The first is a list of server reliabilities.  I
> usually assume 95%, though we achieve better than that.  The second is a
> value of K.  So:
> 
> s.pr_file_loss([.95, .95, .95, .95, .95, .95, .95, .95], 3)
> 
> will calculate the probability that a file with 8 shares on servers with 95%
> reliability (over some time period), any three of which are needed to
> recover it, will be lost (within the time period).  The result, by the way
> is 4E-7, which is very good.  My goal is 99.99% reliability, so with 8 nodes
> I'd probably set K=4, which gives 1.5E-5 probability of loss.  With 9 nodes
> I'd probably use K=5 which gives pr_file_loss of 3E-5.
> 
And "K" is the value for "shares needed" right?

> Oh, and though I spelled out the whole list of server reliabilities for
> clarity, a shorthand is:
> 
> s.pr_file_loss([.95]*8, 3)
> 
That's pretty nice.

> Server Settings:
>> What settins should be choosen for "pruning"?
>>
> 
> I assume you mean expiration?  Expiration should be turned on, as specified
> in the wiki.
> 
Oh right... had no network access during writing that email (mobile
internet is just too expensive).

So yes, I mean expiration. And urm... rtfm to me!

> 
>> Would my storage be filled up infinitely with pruning disabled?
>>
> 
> Yes.  Without expiration, shares stick around forever.
> 
Ok... so as long as everyone knows that he has to run the repair stuff,
everything is fine.

> 
>> Will my client automatically start to reupload the data if the
>> redundancy is not good enough (below "shares happy") or do I have to
>> manually start that process?
>>
> 
> You need to occasionally run a "deep-check --repair --add-lease" operation.
>  This does two important things:  First, it renews your "leases" so that
> storage nodes don't delete your shares.  Second, it will repair any files
> that have less than shares-total shares deployed (I think... I'm not sure
> exactly how that ended up with the changes a while back... someone else want
> to comment?).
> 

Ok... I'll add a cron job for that. Or I will end up with my backup gone!

> Usage:
>> Is it ok if I start to backup my ~200G photo collection to VG2?
>>
> 
> Absolutely!
> 
sweet!

> 
>> If I use the backup command, will changed data just be overwritten or
>> can there be several versions of a file restored?
>>
> 
> The older versions will still be in the grid, but I believe backup only
> keeps one copy of your directory tree, which means that you won't be able to
> find the older versions.  Eventually they'll get garbage-collected.
> 
And I don't think that hard links are supported, right? I loves me my
hand backup-hardlink script :-)

> If I ever get back to my backup tool and finish it, I actually do keep
> versions.  But my tool (which uses Tahoe for the storage) isn't in state
> where it's really even usable by me, much less anyone else.  When I get my
> family settled into our new home, I should have more time...
> 
Isn't there a backup utility that supports tahoe as backend? was ist
nepomuk or deja dup? Did someone already test it or is using it?

what interfaces are you using anyway?

best regards

marco