[volunteergrid2-l] Recommended settings

Wed Jun 29 06:22:30 PDT 2011

Shawn,

I don't think you are developer for tahoe, but I like your suggestion about
the K and H values...

I'd like to see Tahoe move away from people having to specify K and H as
well.  since nodes can be moving on and off the grid very randomly, hard
coding values seems a little silly.  Having a percent reliability (or fault
tolerance) seems more reasonable to me.  something like, what probability do
you want to be able to recover your files?  The user could enter a value in
the config file, and each time a file is uploaded tahoe would calculate the
number of nodes on the network and could see if this threshold is met.  If
more nodes are on the grid, then the file will be distributed to more
nodes.  If fewer nodes are on the network, then fewer nodes will receive a
piece.

In most cases K should be set to the max (or near max) # of nodes each time
a file is uploaded.   I believe this should be automatically generated.  Is
there a function in the tahoe api that returns the number of available nodes
that can store a file?  If so, I'd like to see K not being hard coded in the
configs.   H should be more of a setting that states, if the number of nodes
fall below this amount, we should generate a warning or error, so that the
user knows that retrieving their file sometime in the future probably wont
be reliable.  (H sort of does that now).  maybe there should be an H value
that produces a warning, and an h value (or some other variable name) to
produce an error.

I know there has been a lot of discussion about H and K values, but I don't
think regular users are going to want to mess with them.  They are more
interested in the probability of retrieving their files in the future. (Or
at least I am! )

On Mon, Jun 27, 2011 at 2:33 PM, Shawn Willden <shawn at willden.org> wrote:

> On Mon, Jun 27, 2011 at 11:37 AM, Marco Tedaldi <marco.tedaldi at gmail.com>wrote:
>>
>> On 27.06.2011 08:52, Shawn Willden wrote:
>> > Good questions!
>> I take this as a compliment :-)
>>
> It was intended as one :-)
>
>
>> >> Is it still adviseable to set the "shares total" and the "shares happy"
>> >> to the same value?
>> >>
>> >
>> > It depends on what you want to accomplish.
>> >
>> > If you set shares-happy lower than shares-total, then when there are
>> less
>> > than shares-total nodes available (but >= shares-happy), then your
>> upload
>> > will succeed.  Success is good... but personally for backup usage I
>> would
>> > prefer that the upload fail if I'm not getting the full degree of
>> redundancy
>> > that I want.
>> >
>> Maybe set the "shares total" to something higher than the desirde degree
>> of redundance to get some additional security if available? Ok... might
>> be a waste of disk space.
>>
>
> A discussion on tahoe-dev has got me rethinking my opinion here.
>
> For best file reliability, you want to get your shares dispersed as widely
> as possible.  For the VG2 grid right now, that's 10 shares.  So I think you
> should set shares-total to 10.
>
> On the other hand, that may mean that if one server is down, you can't
> upload... ah, but there's shares-happy.  So I think you should set
> shares-happy to something less than 10.  How much less depends on how you
> want to trade off write-availability against read-availability.  If you set
> shares-happy to be relatively small then you'll always be able to write, but
> a couple more servers going off-line might make your files unreadable.  If
> you set shares-happy to be large (perhaps 10), then you sometimes might not
> be able to write a file... but you'll maximize your chances of being able to
> read it later.
>
> On balance, I think with the current state of the grid I'm going to use
> shares-total=10 and shares-happy=9.  You might want to reduce shares-happy
> to 8.
>
> And "K" is the value for "shares needed" right?
>>
>
> Yes, sorry.  We often use "N" for shares-total and "K" for shares-needed.
>
>
>> Ok... I'll add a cron job for that. Or I will end up with my backup gone!
>>
>
> Yes, very important!  Of course, we've chosen to set the expiration time at
> one year (that might be revised downward in the future, but we decided to
> start conservatively), so you shouldn't have to worry too much about it.
>
>
>> And I don't think that hard links are supported, right? I loves me my
>> hand backup-hardlink script :-)
>>
>
> Sort of.  During backup, the idempotency of immutable files means that
> tahoe backup won't actually upload the file the second (or later) time it
> comes across it.  It will spend time reading and hashing every time it finds
> a hardlink, though.
>
> During restore, tahoe will just write multiple copies of hardlinked files.
>
> My backup tool explicitly notices and handles hardlinks, avoiding both of
> those issues.  Or it would if it worked :)
>
> > If I ever get back to my backup tool and finish it, I actually do keep
>> > versions.  But my tool (which uses Tahoe for the storage) isn't in state
>> > where it's really even usable by me, much less anyone else.  When I get
>> my
>> > family settled into our new home, I should have more time...
>>
>> Isn't there a backup utility that supports tahoe as backend? was ist
>> nepomuk or deja dup? Did someone already test it or is using it?
>>
>
> Hmm, I'm not sure.  Ask on tahoe-dev.
>
>
>> what interfaces are you using anyway?
>
>
> The HTTP API.
>
> --
> Shawn
>
> _______________________________________________
> volunteergrid2-l mailing list
> volunteergrid2-l at tahoe-lafs.org
> http://tahoe-lafs.org/cgi-bin/mailman/listinfo/volunteergrid2-l
> http://bigpig.org/twiki/bin/view/Main/WebHome
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110629/9d84e37e/attachment-0001.html>