[tahoe-dev] Alternative deployment model - thoughts?

Sat Feb 12 20:19:17 PST 2011

On 1/25/11 12:53 AM, Ravi Pinjala wrote:

> I've been thinking a bit lately about alternate ways to deploy Tahoe
> (partly since the model used by Allmydata seems not to have worked
> out). I was wondering what everybody else here thought about this.

Good thoughts!

> Users would see one reliable, high-capacity storage server, instead of
> many small unreliable ones. The biggest advantage here is performance,
> since the user uploads data to a given cluster once, and any further
> replication happens inside the cluster.

Yeah, this is sort of like treating a copy in S3 as being "rather
reliable" (knowing that AWS is doing internal replication to make it
better than a single disk, but perhaps still to centralized for your
tastes) and putting other copies elsewhere to get enough diversity to
make you comfortable.

> * Every user should have their own introducer.

> , and the storage service would join the specified storage clusters to
> that introducer. Then, when the user connects to their introducer,
> they'll see only servers which they've authorized as part of their
> storage grid, and they can use multiple services (and switch between
> them) seamlessly.

I think those two would end up behaving similarly to the way we're
currently looking at Accounting. Basically, the client hears about lots
of storage servers, but only actually pays attention to the ones that
they've got some sort of relationship with: the ones where they're both
willing and allowed to upload shares. We want the Introducer to have a
smaller role (really it's just facilitating a broadcast channel), and
for something else (pubkeys, transitive invitations, etc) to dictate
which servers you actually talk to.

> * An enterprising storage provider could expand by allowing other
> people to offer storage through them, and run their own storage
> clusters.

Yeah. It's like the provider is subcontracting out some of their load to
other providers. Perfectly reasonable if they can continue to meet their
reliability/performance targets.

> People would still be able to use the p2p model, they'd just have some
> very large and reliable peers as an option. I'm really interested to
> hear what people think of this - good idea? terrible idea? old idea
> that I'm accidentally rehashing?

They're interesting ideas. I'd like to see Tahoe get to a place where
it's equally easy to roll out a friendnet, use commercial storage
providers, and provide commercial storage service. We should be able to
treat a commercial storage provider in the same way we treat a random
peer (except maybe we treat the professional host as more reliable).

The Accounting work I've been doing around "Invitations" is related, in
particular the "rent-a-friend" idea where you can get points (which lets
you upload your own data) in exchange for either providing storage space
yourself, or for hiring some professional provider to contribute space
on your behalf.

> Tahoe's network protocols, since they're both undocumented and
> impossible to implement in anything other than Python+Twisted (unless
> there are impementations of Foolscap for other languages that I'm not
> aware of). That's unlikely to change in the short term,

Yeah, that's a continual sticking point. We *are* working to change it,
but it's slow-going (my "#466 state-of-the-patch" message from earlier
today is tackling one aspect of that effort). Foolscap itself is
reasonably well documented[1], as are Tahoe's share formats[2] (more or
less). But Foolscap is complex enough we're likely to finish moving
Tahoe to HTTP before anybody gets around to porting Foolscap to the
language of your choice.

cheers,
 -Brian

[1]: https://github.com/warner/foolscap/tree/master/doc/specifications
[2]: http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/specifications