[volunteergrid2-l] Lots of remarks and questions ;-)
Christoph Langguth
christoph at rosenkeller.org
Thu Dec 8 18:12:54 UTC 2011
Hi everyone,
first off, good news! The new disk has finally been set up yesterday, so
I'll be joining very soon with an additional 1.5 TB :-) Still
test-driving with my local introducer, in order not to muck around too
much with the production infrastructure...
... And sorry -- this mail will probably get rather long and detailed,
but I really appreciate any feedback :-)
So, during the last few weeks I was fiddling around with the VG2 website
and with tahoe-lafs, and trying to figure out how to use it in the best
possible way. Since I've got quite a few things to say, I'll try to
structure them somehow by topic (but not necessarily by relevance). Here
we go :-)
A -- VG2 website
================
a1) Website security: After Jody added me as a user, I can now
successfully log in to bigpig.org and modify my settings. I have the
SSLPasswdWarning extension (
https://addons.mozilla.org/en-US/firefox/addon/sslpasswdwarning/ )
installed in Firefox, which alerted me that my credentials are being
sent over plain HTTP. I realize VG2 is not a high-security classified
thing, so it's not a huge issue. Still, I don't like unencrypted
connections for sensitive data, so how about setting up
https://bigpig.org/ for secure communications?
I have no idea about where it is hosted, what the plan is, or whether
HTTPS is possible at all in the current setting. Anyway, one thing I
keep hearing is that setting up encryption is complicated, or too
expensive. Neither of that is really true -- for the former, I'll gladly
help out with some advice, for the latter: certified SSL certificates
are available completely free from https://www.startssl.com/ , I've been
using them for 3 years now and can only recommend them as the best SSL
CA I've ever seen. Reviews here:
http://www.sslshopper.com/startcom-certificate-authority-reviews.html .
OK, I'll stop spamming now (No, I'm not paid, just a really happy
customer -- seriously, consider them if you're into anything SSL).
So, to sum up: I'd like to see have a secure login to pigpig.org. I'll
be glad to help out in case that's wanted.
a2) Installation instructions lead to 404: On the main page (
http://bigpig.org/twiki/bin/view/Main/WebHome ), the link to tahoe-lafs
installation instructions goes to
http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/quickstart.html .
This page doesn't exist. I'd guess that there have been some internal
changes on tahoe-lafs.org, which won't produce the HTML version anymore.
Climbing up and down the directory structure there, I found that
https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/quickstart.rst
is probably the correct URL. Since I'm not able to directly edit the
page (access denied), could someone please fix the link?
a3) Classified Settings unaccessible: even after logging in, I can't
access neither http://bigpig.org/twiki/bin/view/Main/ClassifiedSettings
nor http://bigpig.org/twiki/bin/view/Main/WebGateways . All I keep
getting is a message like "Attention: Access check on
Main.ClassifiedSettings failed. Action "VIEW": access not allowed on
topic. "
For the most relevant settings like the introducer FURL, Shawn has
already sent me the relevant information by private mail, so I guess
there's no urgent need for me to access the pages. Still, I think that
there's a misconfiguration somewhere, so someone should probably look
into it.
(This may be related to the second question as well, maybe I'm simply
lacking some privileges... can I get them please? *g*)
B -- tahoe-lafs itself (and VG2 configuration)
==============================================
I have been trying out various settings locally, which led me to some
further questions. I have already successfully installed and configured
a node using the VG2 settings, but for the time being, that
configuration is still (knowingly) left as a node which does *not*
provide storage capacity yet -- it will very soon, once I'm sure of all
the settings. Till then, some more questions ;-)
b1) What is the stats_gatherer used for? I have no objections to
configure my node to "report" there, but is there a way to also see that
information? I would think that such information would include things
like "total storage space, individual nodes usage and availability"
etc., or am I totally wrong here?
b2) What is the actual meaning of "shares.needed", "shares.happy", and
"shares.total" ? I have been playing around with these using my (only)
local node, and here are my observations so far:
- each node ( = connected tahoe-lafs instance) provides multiple shares
for storing files (how many? where can that be configured?)
- shares.total is the total number of shares (over all connected nodes)
that an upload will fill (regardless of the node where the file is
uploaded -- in other words: some of them may be on the same host)
- shares.needed is the minimum number of shares a file needs to be
"distributed" to. However, these shares may all be on the same node.
- shares.happy is the number of *different* nodes that must hold a copy
of (part of) the file.
Can someone confirm that this is correct? At least this is what I
understand, when using these settings:
shares.needed = 3
shares.happy = 2
shares.total = 5
I get this error message: "We were asked to place shares on at least 2
server(s) such that any 3 of them have enough shares to recover the
file. (placed all 5 shares, want to place shares on at least 2 servers
such that any 3 of them have enough shares to recover the file, sent 2
queries to 1 peers, 2 queries placed some shares, 0 placed none (...)"
b3) Does it make sense to use a "helper" service, and if so, is there
any use in making its FURL publicly available? I guess it won't harm to
leave it on locally, but could other folks in the grid also benefit
somehow if they knew "my" helper? From what I understand, the purpose of
the helper is to reduce upload delays by providing quick uploads and
caching, and only then distributing data into the grid (as stated
earlier, our host is connected pretty decently -- GBit Ethernet just 2
hops from the backbone).
C -- Using Tahoe LAFS for Backups
=================================
I'm currently thinking about using duplicity for backups, as I have good
experiences with it and think that because of its smart design, it
should be relatively "lightweight" in terms of additional traffic caused
by incremental backups (i.e.: rsync would essentially need to download a
file to check it for changes, which is expensive; duplicity keeps small
metadata files for this purpose, and caches them locally as well).
c1) Any objections or bad experiences with duplicity on tahoe? Or any
further experiences or hints that may be helpful?
c2) Assuming a total disaster where everything goes up in smoke locally,
the only backup remaining would be the one in VG2. But it would be in a
directory (Tahoe-URI) which is totally incomprehensible and impossible
to remember. So as a safety net, would it be ok to post here and/or on
the pigpig homepage a file -- encrypted, of course :-) -- with the tiny
bits of vital information so that I could get them back from the HP, or
from one of you guys, in case of disaster?
c3) And the final question: My understanding is that orphaned files are
meant to be avoided by expiring files after 365 days. My worst-case
scenario now goes like this: disaster strikes, the backup is required --
and some files have been purged on all nodes because they have been
sitting around too long without being touched. I recall that there was
some method to "touch" these files, but I can't remember or find it in
the docs now. Can someone give any advice on how this is done in
general, or possibly in particular with duplicity?
OK, this has grown far too long, so thanks again for reading. Any and
all comments / suggestions are really appreciated!
Cheers from Basel
Chris
More information about the volunteergrid2-l
mailing list