[volunteergrid2-l] Recommended settings

Marco Tedaldi marco.tedaldi at gmail.com
Mon Jun 27 21:27:49 PDT 2011


On 27.06.2011 21:33, Shawn Willden wrote:
> On Mon, Jun 27, 2011 at 11:37 AM, Marco Tedaldi <marco.tedaldi at gmail.com>wrote:
>>
>> On 27.06.2011 08:52, Shawn Willden wrote:
> 
>>>> Is it still adviseable to set the "shares total" and the "shares happy"
>>>> to the same value?
>>>>
>>>
>>> It depends on what you want to accomplish.
>>>
>>> If you set shares-happy lower than shares-total, then when there are less
>>> than shares-total nodes available (but >= shares-happy), then your upload
>>> will succeed.  Success is good... but personally for backup usage I would
>>> prefer that the upload fail if I'm not getting the full degree of
>> redundancy
>>> that I want.
>>>
>> Maybe set the "shares total" to something higher than the desirde degree
>> of redundance to get some additional security if available? Ok... might
>> be a waste of disk space.
>>
> 
> A discussion on tahoe-dev has got me rethinking my opinion here.
> 
> For best file reliability, you want to get your shares dispersed as widely
> as possible.  For the VG2 grid right now, that's 10 shares.  So I think you
> should set shares-total to 10.
> 
which also increases space usage and bandwith usage while uploading.

> On the other hand, that may mean that if one server is down, you can't
> upload... ah, but there's shares-happy.  So I think you should set
> shares-happy to something less than 10.  How much less depends on how you
> want to trade off write-availability against read-availability.  If you set
> shares-happy to be relatively small then you'll always be able to write, but
> a couple more servers going off-line might make your files unreadable.  If
> you set shares-happy to be large (perhaps 10), then you sometimes might not
> be able to write a file... but you'll maximize your chances of being able to
> read it later.
> 
Or we set files needed low enough which increases reliability but also
space and bandwith use.

> On balance, I think with the current state of the grid I'm going to use
> shares-total=10 and shares-happy=9.  You might want to reduce shares-happy
> to 8.
> 
Sounds sensible to me.

> And "K" is the value for "shares needed" right?
>>
> 
> Yes, sorry.  We often use "N" for shares-total and "K" for shares-needed.
> 
Ok... learned already something new today than.

> 
>> Ok... I'll add a cron job for that. Or I will end up with my backup gone!
>>
> 
> Yes, very important!  Of course, we've chosen to set the expiration time at
> one year (that might be revised downward in the future, but we decided to
> start conservatively), so you shouldn't have to worry too much about it.
> 
I've been wondering about that. It for sure reduces the maintenance
overhead for the client but might greatly increase the wasted space if
there is short lived data around.

> 
>> And I don't think that hard links are supported, right? I loves me my
>> hand backup-hardlink script :-)
>>
> 
> Sort of.  During backup, the idempotency of immutable files means that tahoe
> backup won't actually upload the file the second (or later) time it comes
> across it.  It will spend time reading and hashing every time it finds a
> hardlink, though.
> 
> During restore, tahoe will just write multiple copies of hardlinked files.
> 
Ok... I don't have a lot of hardlinks in the files I'll backup. But my
backup until now consistet mainly of hardlinks (done with cp and rsync).
This way I have several generations of bachups available on my external
disk. Fast backups and still no space wasted.
It has (theoretical) tradeoffs to do backups this way... .-)

I won't backup my backup anyway.

> My backup tool explicitly notices and handles hardlinks, avoiding both of
> those issues.  Or it would if it worked :)
> 
nice! I'm looking forward to see it one day!
I'm just thinking about ways to use rsync and tar to encapsulate such
stuff. But I have no idea for solutions at the moment .-(

backup without too much waste of space is quite tricky...


>> If I ever get back to my backup tool and finish it, I actually do keep
>>> versions.  But my tool (which uses Tahoe for the storage) isn't in state
>>> where it's really even usable by me, much less anyone else.  When I get
>> my
>>> family settled into our new home, I should have more time...
>>
>> Isn't there a backup utility that supports tahoe as backend? was ist
>> nepomuk or deja dup? Did someone already test it or is using it?
>>
> 
> Hmm, I'm not sure.  Ask on tahoe-dev.
> 
Seems a good idea to join there.

> 
>> what interfaces are you using anyway?
> 
> 
> The HTTP API.
> 
Ok.
Is this recommended? I personally think that the command line tool looks
quite nice (when I could just wrap my head around why I need aliases. Is
this a bit like different filesystems insid tahoe?)

When I use tahoe backup what would I need to restore the data in case of
a failure with full data loss? Or better: What data do I need to backup
outside of tahoe to be able to restore my data from there?
Is it adviseable to use some other online storage (loke dropbox or
wuala) for these data?

best regards

Marco


More information about the volunteergrid2-l mailing list