[tahoe-dev] newbie questions ++
warner-tahoe at allmydata.com
Wed Sep 24 11:35:58 PDT 2008
> Some more questions (we are paranoiac, and afraid of losing everything)
> - Is there a known "single point of failure" (initiator server... ?).
> If yes, is it possible to workaround it ?
Not for data reliability.
For data *availability*, however, it depends upon how you're accessing the
grid. The most obvious single-point-of-anything is the Introducer, which in
the current release is a singleton: one Introducer per grid (see ticket #295
for our plans to change that). For any given client, a connection to the
Introducer is necessary to get started: clients will only use servers that
they've learned from the Introducer. Once this initial batch of announcements
is processed, however, the client will continue to use those servers even if
the Introducer connection is lost. If your clients tend to stay up for long
periods of time, and you aren't adding servers on a regular basis, then you
probably wouldn't even notice the Introducer going away. The only impact is
that clients which come up while the Introducer is offline will not be able
to upload or download any files until it comes up and they get their
(note that clients use a random-exponential-backoff scheme to limit their
connection rate, so if the introducer is offline for e.g. 10 minutes, the
client might not attempt to connect for roughly another 20 minutes. This
backoff is capped to make at least one attempt per hour).
Data availability also depends upon your client being available. This is
obvious when you're running your own client, but there are use cases in which
you may not be. Many allmydata.com customers access their data through the
(a bunch of regular client nodes on which we've enabled and published the
'webapi' port): for these users, if our webapi servers go down, they won't be
able to get to their files. (this happened for about two hours on monday, see
http://www.allmydata.com/blog/ and tickets #521 and #287 for details). For
those customers, this was a loss of availability (although they could have
gotten at their data if they were running their own tahoe client nodes).
There are a couple of bugs that can cause a client node to get stuck, when
the servers they are talking to get really confused (on monday it was a
hardware problem that our software didn't manage very well, again see #521
and #287). These can cause availability failures.
Oh, and some quick definitions:
reliability: the chance you can get your file back if you're willing to wait
an infinite amount of time
availability: the chance you can get your file back in a given finite amount
clearly, this is really a function of the amount of time you're willing to
wait. The mathematical analysis we've attempted on Tahoe's reliability is
centered around "p = F(t,wait)", where "p" is the probability that you can
recover the file, "t" is the amount of time since you uploaded it, and "wait"
is the amount of time you're willing to put into the download attempt. (and
actually the function is parameterized by k, N, the server
reliability/availability behavior, and the checker/repairer configuration. If
you know a math grad student who is looking for a thesis topic, I'll buy them
lunch in exchange for some analysis work :).
> - What kind of tests should we do to push the system to the limits, and
> check how robust it is ?
> We think of "normal" tests:
> * concurrent read/write of different files
> * use server with different available free space.
> * filling filesystems to 100%
> * Maybe we can try with a big number of machines (100~1000 mainly windows)
Those are good ones. We don't currently automatically handle disk-full
situations: you have to switch the node to "readonly" mode before it gets
full, to avoid problems in clients running earlier releases (they didn't
handle the error very well), and we haven't fully addressed the problem in
the current release either. Be aware that "readonly" mode still allows writes
of mutable files (but these are usually only used for directories, which tend
to be pretty small, so the data rate is small too).
Our procedure at allmydata.com is to flip the readonly switch when there is
10 or 20GB left (when the disks are about 98% full). Eventually we'll fix the
code and upgrade the clients who can't handle the disk-full errors. We also
plan to add code to switch the node to read-only when the remaining disk
space drops below a configurable amount, to make this procedure automatic
(not in the upcoming release, but probably in the next one).
We haven't tried using a grid larger than about 50 machines before, so we'd
be very interested in your results with 100-1000 nodes. We've designed Tahoe
to work well with low to medium numbers of servers: 100 should be fine, 1000
might work, I wouldn't be surprised if problems crop up with 10k nodes, and I
would be surprised if it continued to work at all at 1M nodes. We aren't
using Chord or any of the clever O(ln(N)) schemes, so the scaling problems
that I anticipate are:
* we maintain connections to all servers, so memory usage in the TCP stack,
and bandwith for keepalive messages to everybody
* the Introducer tells all subscribers about all servers (so N*M scaling)
* immutable download and checker/verifier ask all servers about each file
(we plan to fix this for download, #287 will include this fix)
* all file operations require hash-permuting the entire peer list (memory+CPU)
We have some thoughts about fixing these (ticket #235), but the most
immediate use cases (allmydata.com and friendnet) involve 10-100 servers
rather than thousands, so they're relatively low-priority.
> Btw , i found this link about GPU for AES encryption (0.1 ~ 8 Gbits/s)
> http://www.manavski.com/downloads/PID505889.pdf so if it is performance
> critical there is room for improvements :-)
Heh.. it would be wonderful if we could get the rest of the system to run
fast enough that 1Gbps of AES could make a noticeable speedup :-).
More information about the tahoe-dev