[tahoe-dev] Getting acquainted
Jeremy Fitzhardinge
jeremy at goop.org
Wed Mar 24 14:48:58 PDT 2010
On 03/24/2010 01:29 PM, Hamilton, Jessica wrote:
>
> I work at Massey University (New Zealand), and am planning to use our
> lab fleet to build a distributed storage grid. I imagine I could
> probably get about 100TB out of the system.
>
> We will have about 1,500 PCs, with somewhere between 400GB and 800GB
> of available disk space per machine to utilise for the storage grid.
>
That sounds like a fun setup to play with.
>
> I have read about Chord/DHash which uses DHTs; but it looks like
> Tahoe-LAFS isn’t a DHT, and am concerned about nodes leaving/joining
> often. Given that you only need 3 nodes out of 10 to reproduce
> content, I imagine hardware failures wouldn’t be much of an issue.
>
Tahoe isn't particularly good with nodes leaving and joining at a high
rate. It assumes that you have a relatively stable pool of storage nodes
which are all connected together (so the grid is fully connected). When
you upload a file, the uploading node chooses a set of nodes out of the
available connected set and stores to them. A new node joining the grid
needs to get in contact with an introducer machine, which is a single
point of failure, but once introduced they nodes can function without it
(hm, I guess they get told by the introducer when a new node joins).
There are some tickets filed (#295, and #444 for scaling to very large
grids) concerning adding a decentralized introduction protocol, and it
seems likely to me that implementing it would involve some form of DHT.
Whether or not a DHT has a further role in Tahoe's functioning, such as
in node selection, is point of debate (I think the current state is
"doesn't look like it would work well").
> Also, is there any existing work that can provide an NFS/SMB interface
> to Tahoe-LAFS?
>
Tahoe's mutable files can only be completely rewritten, not
incrementally updated in a block-like way; they're also pretty expensive
in computation and memory to deal with. So at the moment there's a
pretty deep mismatch with SMB/NFS, which would probably have to be dealt
with somehow in the NFS/SMB server (a local cache of modified files, for
example). A read-only implementation should be much easier, and just
come down to how to map Tahoe's metadata to your target protocol.
As is the case with a lot of Tahoe, there are a number of proposals for
more efficient mutable files (such as Tickets #217 and #393), but I
don't know how close any of them are to implementation.
> I’m sure much of this is in Trac, there’s just a **lot** of
> reading/research/testing to do… :P
>
There's a lot of stuff in there. Don't overlook the tickets; much of the
interesting discussion is in there.
J
More information about the tahoe-dev
mailing list