[volunteergrid2-l] Recommended settings

Shawn Willden shawn at willden.org
Wed Jun 29 16:12:45 PDT 2011


On Wed, Jun 29, 2011 at 1:36 PM, Billy Earney <billy.earney at gmail.com>wrote:

> But if we set in the config file something  like N=90%, which would mean to
> set N=T*90% (where T is the total number of nodes available at time of file
> upload)
>

To ask my next question easily, I need some more notation:

    T_now = total number of servers available at time of upload
    T_max = total number of servers in the grid, including those currently
unavailable.

What would be the point in setting N < T_now?  The reason for possibly
wanting to set N < T_max is because it's possible that T_now < T_max which
would cause uploads to fail if N > T_now, but I don't see why you would want
to get less dispersion than is currently available.



> , then H could equal some minimal number of nodes necessary for an upload,
> calculated from R (R for reliability %).
>
> ** **
>
> If we assume that node availability is 95%, then what would H have to be to
> have a reliability of R?  There’s probably a formula for this in the tahoe
> api.
>

I don't believe a closed-form formula is for calculating that is possible --
at least, I don't know how to do it and I've done more work on this math
than anyone else, AFAIK.  However, there is an efficient way to calculate R
given shares-distributed and shares-required (K).  So it's not difficult to
do a quick search of the space of the parameter you want to optimize.

Actually, though, in the case where you're optimizing dynamically, I think
the need for H disappears entirely. If you know how many servers are
available and accepting shares _now_, then that's the number of shares you
want to distribute.  The parameter you would calculate, then, is K.  You
would do a quick search of the range of possible K values, looking for the
largest K that gives you a reliability that is better than your required
reliability threshold R.

****
>
> My reliability threshold may be 99% for some files (willing to loose
> sometimes), but for other files it could be (99.999%).  Which brings up
> another topic of allowing these to be entered from the command line when
> uploading files, since different files could have different R’s.   Just my
> $0.02. J
>

Yep, it would ultimately be very nice to be able to set different Rs for
different files.  And as long as we're dreaming, this could be used to
compute a required R for dirnodes, based on the required Rs for the files in
it (and, recursively, for dirnodes under it).  This would ensure that
top-level dirnodes automagically become highly, highly reliable, which would
have prevented the worst of the allmydata problem.

But, talk is cheap :-)

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110629/ba38a082/attachment-0001.html>


More information about the volunteergrid2-l mailing list