[tahoe-dev] Potential use for personal backup
Zooko Wilcox-O'Hearn
zooko at zooko.com
Mon Jun 4 04:31:09 UTC 2012
On Sat, Jun 2, 2012 at 1:12 PM, Saint Germain <saintger at gmail.com> wrote:
> On Wed, 23 May 2012 00:39:23 +0200, Guido Witmond <guido at wtmnd.nl>
> wrote :
>
>> I can offer you a 25 GB storage node for testing for a short while, say a month. It's on my home-adsl with 10Mb/1Mb up- and download speeds. Don't expect miracles but it should give you an idea what to expect.
That's very cool that Guido Witmond offers free storage service to
random people from the mailing list so they can test. :-) I'm hereby
offering you free storage service too -- see below.
> One interesting point is that I started with a 1-out-of-2 scheme (the other storage node was local on my computer) and that Guido's storage node was unavailable. So when I copied a file on the "grid", it was pretty quick.
>
> So that leads to one interesting question. Next time I will connect with Guido's node available, will it upload the files I have uploaded on my local storage node ?
As Terrell correctly answered, it will not automatically do this
repair/rebalance/fix-up behavior.
>From one perspective, this is a feature rather than a bug. It means
the behavior of the system is simpler and easier to predict.
See the FAQ "Will this thing run only when I tell it to? Will it use
up a lot of my network bandwidth, CPU, or RAM?":
https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ#Q18_unobtrusive_software
Of course, from another perspective this is missing functionality.
May I make a suggestion? Start simple!
Set K=H=N=1 and use a single storage server.
Then do your experiments and measurements. For one thing, this will
avoid complexity and interesting failure modes, and for another thing
it will make the system more comparable to other open source backup
tools that you are evaluating like obnam ¹ and backshift ² and so on.
Those systems typically have a single server.
I think the erasure-coding and the way that Tahoe-LAFS automatically
uses "the current set of servers" (instead of requiring an
administrator to tweak some configuration whenever a server comes or
goes) are really cool features. But, they should probably be regarded
as "advanced features" that are not necessary for a standard backup
operation, and which inevitably bring complications.
Now, David-Sarah Hopwood, Zancas Wilcox, and I have founded a startup
company to commercialize Tahoe-LAFS storage service. Our business
model is "ciphertext storage service". Customers pay us a metered
price ($1.00/GB/month) and we operate a Tahoe-LAFS storage server on
behalf of that customer. The storage server writes the ciphertext to
Amazon S3 instead of to a local disk. The name of our company is Least
Authority Enterprises (https://leastauthority.com).
I hope to make Least Authority Enterprises into "the simplest way to
get started using Tahoe-LAFS".
Least Authority Enterprises hereby offers you a free trial account.
We'll leave it running until the first month where we get a bill from
Amazon.com and we say "Holy crap! We're still paying for Saint
Germain's experiment!". Then we'll turn it off.
You *can* combine the Least Authority Enterprises (LAE) server with
Guido Witmond's server, in a 1-out-of-1 scenario (each file will be
randomly allocated to one server or the other) or in a 1-out-of-2
(each file will be replicated to both servers). But like I say, you
should probably start simple and do your basic experiment before you
try something like that.
The LAE storage server will have faster network that Guido's home ADSL
line, but the speed of the network probably doesn't matter much.
The biggest performance win will be when you run "tahoe backup" and it
sees that the timestamps on a file haven't changed, so it skips over
the file.
The second-biggest performance win will be when you upload a file (by
"tahoe backup" or otherwise) and after transferring the file contents
to your Tahoe-LAFS gateway, and after the gateway encrypts and hashes
it, the gateway talks to the storage servers and finds out that a copy
of this identical file has already been uploaded, so it does not need
to transfer the contents to the storage server. (This is only a
significant win if the distance from your client to your gateway is
shorter than the distance from your gateway to your server. This is
another reason -- besides confidentiality and integrity -- why it is a
good idea to run the gateway locally on the same computer as the
client.)
The biggest performance *lose* will be when you've made a small change
to a large file, such as a minor modification to a 2 GB virtual
machine image, and tahoe re-uploads the entire 2 GB file instead of
computing a delta. :-)
Regards,
Zooko
¹ http://liw.fi/obnam/
² http://stromberg.dnsalias.org/~dstromberg/backshift/
More information about the tahoe-dev
mailing list