[volunteergrid2-l] Recommended settings

Tue Jun 28 18:16:58 PDT 2011

On Mon, Jun 27, 2011 at 10:27 PM, Marco Tedaldi <marco.tedaldi at gmail.com>wrote:

> > For best file reliability, you want to get your shares dispersed as
> widely
> > as possible.  For the VG2 grid right now, that's 10 shares.  So I think
> you
> > should set shares-total to 10.
>
> which also increases space usage and bandwith usage while uploading.
>

Indeed it does.  If that's a concern, you can reduce the expansion by
increasing K -- but if H (shares-happy) isn't enough larger than K then you
may not have enough redundancy to be sure you can get your files later.  But
if you set H == N so you have good redundancy, you may not have good write
availability.

In all cases, I think it makes sense to try to distribute shares to as many
servers as possible, so set N to the number of nodes in the grid.  Then you
can choose H and K to pick your tradeoffs between write availability, read
availability and upload time.

BTW, the complexity of this analysis is one of the reasons that I would like
to see Tahoe move away from having the user specify H, K and N.  Instead,
I'd like to see Tahoe offer the user the ability to choose read availability
probability and then dynamically compute N and K (H would disappear) based
on the available nodes in the grid and their estimated (or assumed)
reliabilities.  I envision a "Tahoe won't lose my files probability" slider
with an adjacent "expansion factor" field that changes as you move the
slider back and forth.  I think users could make better decisions about the
part of the tradeoff they care about.

But I haven't cared enough to actually implement anything like that :-)  I
did care enough to do a lot of the mathematical modeling to lay the
groundwork, but stopped there.

> If
> > you set shares-happy to be large (perhaps 10), then you sometimes might
> not
> > be able to write a file... but you'll maximize your chances of being able
> to
> > read it later.
> >
> Or we set files needed low enough which increases reliability but also
> space and bandwith use.
>

Right.  You can lower H to increase write availability at the expense of
reducing read availability, then you can reduce K to recover read
availability at the expense of increasing expansion.

BTW, there is one way to reduce bandwidth use:  Use a helper.  I think we
have one or two.  This would be especially good for you since you have a
relatively slow upstream connection.

When your node is configured to use a helper, it does the encryption of the
file locally and then uploads it to the helper, which does the erasure
coding and delivers the shares to the storage nodes for you.  That way the
expanded use of bandwidth is done by the helper, which presumably has a fast
network connection.  This doesn't change storage consumption, obviously, but
it does partially work around your low bandwidth.

> Yes, very important!  Of course, we've chosen to set the expiration time
> at
> > one year (that might be revised downward in the future, but we decided to
> > start conservatively), so you shouldn't have to worry too much about it.
>
> I've been wondering about that. It for sure reduces the maintenance
> overhead for the client but might greatly increase the wasted space if
> there is short lived data around.
>

Yes.  If the storage nodes start getting full, one of the first things we'll
do is ask everyone to lower their expiration timeout some.

>> what interfaces are you using anyway?
> >
> > The HTTP API.
> >
> Ok.
> Is this recommended?

The HTTP API is the recommended API for tools to use to talk to a Tahoe
node.  Writing software that constantly spawned shells to run the tahoe
command-line tool would be painful, and the HTTP API gives you the ability
to use a Tahoe node on a different machine, basically for free.

> I personally think that the command line tool looks
> quite nice (when I could just wrap my head around why I need aliases.

Aliases just keep you from having to remember/type long rootcap strings.

> Is
> this a bit like different filesystems insid tahoe?)
>

I'm not sure what you mean by this question.

> When I use tahoe backup what would I need to restore the data in case of
> a failure with full data loss? Or better: What data do I need to backup
> outside of tahoe to be able to restore my data from there?
> Is it adviseable to use some other online storage (loke dropbox or
> wuala) for these data?
>

You need your rootcap -- the actual URI, not your alias.  That's it.  It's a
very good idea to make a copy and put it somewhere safe and secure.  If you
put it in dropbox or something, I'd encrypt it first because it's the key to
all of your data.  Another common recommendation is to print a copy on a
piece of paper and store it somewhere safe.

-- 
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/cgi-bin/mailman/private/volunteergrid2-l/attachments/20110628/2d9021ec/attachment.html>