[tahoe-dev] Getting acquainted

Wed Mar 24 14:48:58 PDT 2010

On 03/24/2010 01:29 PM, Hamilton, Jessica wrote:
>
> I work at Massey University (New Zealand), and am planning to use our 
> lab fleet to build a distributed storage grid. I imagine I could 
> probably get about 100TB out of the system.
>
> We will have about 1,500 PCs, with somewhere between 400GB and 800GB 
> of available disk space per machine to utilise for the storage grid.
>

That sounds like a fun setup to play with.
>
> I have read about Chord/DHash which uses DHTs; but it looks like 
> Tahoe-LAFS isn’t a DHT, and am concerned about nodes leaving/joining 
> often. Given that you only need 3 nodes out of 10 to reproduce 
> content, I imagine hardware failures wouldn’t be much of an issue.
>

Tahoe isn't particularly good with nodes leaving and joining at a high 
rate. It assumes that you have a relatively stable pool of storage nodes 
which are all connected together (so the grid is fully connected). When 
you upload a file, the uploading node chooses a set of nodes out of the 
available connected set and stores to them. A new node joining the grid 
needs to get in contact with an introducer machine, which is a single 
point of failure, but once introduced they nodes can function without it 
(hm, I guess they get told by the introducer when a new node joins).

There are some tickets filed (#295, and #444 for scaling to very large 
grids) concerning adding a decentralized introduction protocol, and it 
seems likely to me that implementing it would involve some form of DHT. 
Whether or not a DHT has a further role in Tahoe's functioning, such as 
in node selection, is point of debate (I think the current state is 
"doesn't look like it would work well").

> Also, is there any existing work that can provide an NFS/SMB interface 
> to Tahoe-LAFS?
>

Tahoe's mutable files can only be completely rewritten, not 
incrementally updated in a block-like way; they're also pretty expensive 
in computation and memory to deal with. So at the moment there's a 
pretty deep mismatch with SMB/NFS, which would probably have to be dealt 
with somehow in the NFS/SMB server (a local cache of modified files, for 
example). A read-only implementation should be much easier, and just 
come down to how to map Tahoe's metadata to your target protocol.

As is the case with a lot of Tahoe, there are a number of proposals for 
more efficient mutable files (such as Tickets #217 and #393), but I 
don't know how close any of them are to implementation.

> I’m sure much of this is in Trac, there’s just a **lot** of 
> reading/research/testing to do… :P
>

There's a lot of stuff in there. Don't overlook the tickets; much of the 
interesting discussion is in there.

J