[volunteergrid2-l] Lots of remarks and questions ;-)

Christoph Langguth christoph at rosenkeller.org
Thu Dec 8 18:12:54 UTC 2011


Hi everyone,

first off, good news! The new disk has finally been set up yesterday, so 
I'll be joining very soon with an additional 1.5 TB :-) Still 
test-driving with my local introducer, in order not to muck around too 
much with the production infrastructure...

... And sorry -- this mail will probably get rather long and detailed, 
but I really appreciate any feedback :-)

So, during the last few weeks I was fiddling around with the VG2 website 
and with tahoe-lafs, and trying to figure out how to use it in the best 
possible way. Since I've got quite a few things to say, I'll try to 
structure them somehow by topic (but not necessarily by relevance). Here 
we go :-)

A -- VG2 website
================

a1) Website security: After Jody added me as a user, I can now 
successfully log in to bigpig.org and modify my settings. I have the 
SSLPasswdWarning extension ( 
https://addons.mozilla.org/en-US/firefox/addon/sslpasswdwarning/ ) 
installed in Firefox, which alerted me that my credentials are being 
sent over plain HTTP. I realize VG2 is not a high-security classified 
thing, so it's not a huge issue. Still, I don't like unencrypted 
connections for sensitive data, so how about setting up 
https://bigpig.org/ for secure communications?

I have no idea about where it is hosted, what the plan is, or whether 
HTTPS is possible at all in the current setting. Anyway, one thing I 
keep hearing is that setting up encryption is complicated, or too 
expensive. Neither of that is really true -- for the former, I'll gladly 
help out with some advice, for the latter: certified SSL certificates 
are available completely free from https://www.startssl.com/ , I've been 
using them for 3 years now and can only recommend them as the best SSL 
CA I've ever seen. Reviews here: 
http://www.sslshopper.com/startcom-certificate-authority-reviews.html . 
OK, I'll stop spamming now (No, I'm not paid, just a really happy 
customer -- seriously, consider them if you're into anything SSL).

So, to sum up: I'd like to see have a secure login to pigpig.org. I'll 
be glad to help out in case that's wanted.

a2) Installation instructions lead to 404: On the main page ( 
http://bigpig.org/twiki/bin/view/Main/WebHome ), the link to tahoe-lafs 
installation instructions goes to 
http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html . 
This page doesn't exist. I'd guess that there have been some internal 
changes on tahoe-lafs.org, which won't produce the HTML version anymore. 
Climbing up and down the directory structure there, I found that 
https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/quickstart.rst 
is probably the correct URL. Since I'm not able to directly edit the 
page (access denied), could someone please fix the link?

a3) Classified Settings unaccessible: even after logging in, I can't 
access neither http://bigpig.org/twiki/bin/view/Main/ClassifiedSettings 
nor http://bigpig.org/twiki/bin/view/Main/WebGateways . All I keep 
getting is a message like "Attention: Access check on 
Main.ClassifiedSettings failed. Action "VIEW": access not allowed on 
topic. "

For the most relevant settings like the introducer FURL, Shawn has 
already sent me the relevant information by private mail, so I guess 
there's no urgent need for me to access the pages. Still, I think that 
there's a misconfiguration somewhere, so someone should probably look 
into it.

(This may be related to the second question as well, maybe I'm simply 
lacking some privileges... can I get them please? *g*)



B -- tahoe-lafs itself (and VG2 configuration)
==============================================

I have been trying out various settings locally, which led me to some 
further questions. I have already successfully installed and configured 
a node using the VG2 settings, but for the time being, that 
configuration is still (knowingly) left as a node which does *not* 
provide storage capacity yet -- it will very soon, once I'm sure of all 
the settings. Till then, some more questions ;-)

b1) What is the stats_gatherer used for? I have no objections to 
configure my node to "report" there, but is there a way to also see that 
information? I would think that such information would include things 
like "total storage space, individual nodes usage and availability" 
etc., or am I totally wrong here?

b2) What is the actual meaning of "shares.needed", "shares.happy", and 
"shares.total" ? I have been playing around with these using my (only) 
local node, and here are my observations so far:
- each node ( = connected tahoe-lafs instance) provides multiple shares 
for storing files (how many? where can that be configured?)
- shares.total is the total number of shares (over all connected nodes) 
that an upload will fill (regardless of the node where the file is 
uploaded -- in other words: some of them may be on the same host)
- shares.needed is the minimum number of shares a file needs to be 
"distributed" to. However, these shares may all be on the same node.
- shares.happy is the number of *different* nodes that must hold a copy 
of (part of) the file.

Can someone confirm that this is correct? At least this is what I 
understand, when using these settings:

shares.needed = 3
shares.happy = 2
shares.total = 5

I get this error message: "We were asked to place shares on at least 2 
server(s) such that any 3 of them have enough shares to recover the 
file. (placed all 5 shares, want to place shares on at least 2 servers 
such that any 3 of them have enough shares to recover the file, sent 2 
queries to 1 peers, 2 queries placed some shares, 0 placed none (...)"

b3) Does it make sense to use a "helper" service, and if so, is there 
any use in making its FURL publicly available? I guess it won't harm to 
leave it on locally, but could other folks in the grid also benefit 
somehow if they knew "my" helper? From what I understand, the purpose of 
the helper is to reduce upload delays by providing quick uploads and 
caching, and only then distributing data into the grid (as stated 
earlier, our host is connected pretty decently -- GBit Ethernet just 2 
hops from the backbone).




C -- Using Tahoe LAFS for Backups
=================================

I'm currently thinking about using duplicity for backups, as I have good 
experiences with it and think that because of its smart design, it 
should be relatively "lightweight" in terms of additional traffic caused 
by incremental backups (i.e.: rsync would essentially need to download a 
file to check it for changes, which is expensive; duplicity keeps small 
metadata files for this purpose, and caches them locally as well).

c1) Any objections or bad experiences with duplicity on tahoe? Or any 
further experiences or hints that may be helpful?

c2) Assuming a total disaster where everything goes up in smoke locally, 
the only backup remaining would be the one in VG2. But it would be in a 
directory (Tahoe-URI) which is totally incomprehensible and impossible 
to remember. So as a safety net, would it be ok to post here and/or on 
the pigpig homepage a file -- encrypted, of course :-) -- with the tiny 
bits of vital information so that I could get them back from the HP, or 
from one of you guys, in case of disaster?

c3) And the final question: My understanding is that orphaned files are 
meant to be avoided by expiring files after 365 days. My worst-case 
scenario now goes like this: disaster strikes, the backup is required -- 
and some files have been purged on all nodes because they have been 
sitting around too long without being touched. I recall that there was 
some method to "touch" these files, but I can't remember or find it in 
the docs now. Can someone give any advice on how this is done in 
general, or possibly in particular with duplicity?


OK, this has grown far too long, so thanks again for reading. Any and 
all comments / suggestions are really appreciated!

Cheers from Basel
Chris


More information about the volunteergrid2-l mailing list