[tahoe-dev] [tahoe-lafs] #999: support multiple storage backends, including amazon s3

tahoe-lafs trac at allmydata.org
Wed Mar 31 10:17:57 PDT 2010


#999: support multiple storage backends, including amazon s3
-----------------------------+----------------------------------------------
 Reporter:  zooko            |           Owner:           
     Type:  enhancement      |          Status:  new      
 Priority:  major            |       Milestone:  undecided
Component:  code-storage     |         Version:  1.6.0    
 Keywords:  gsoc backend s3  |   Launchpad_bug:           
-----------------------------+----------------------------------------------

Old description:

> (originally I incorrectly posted this to #917)
>
> The way to do it is to make a variant of
> [source:src/allmydata/storage/server.py] which doesn't read from local
> disk in its [source:src/allmydata/storage/server.py at 4164#L359
> _iter_share_files()], but instead reads the files using its backend
> protocol, e.g. from its S3 bucket (it is an S3 client and a Tahoe-LAFS
> storage server). Likewise variants of
> [source:src/allmydata/storage/shares.py at 3762 storage/shares.py],
> [source:src/allmydata/storage/immutable.py at 3871#L39
> storage/immutable.py], and
> [source:src/allmydata/storage/mutable.py at 3815#L34 storage/mutable.py]
> which write their data out using the backend protocol, instead of to
> their local filesystem.
>
> Probably one should first start by abstracting out just the "does this go
> to local disk, S3, Rackspace Cloudfiles, etc" part from all the other
> functionality in those four files...  :-)

New description:

 The focus of this ticket is (now) adapting the existing codebase to use
 multiple backends, rather than supporting any particular backend.

 We already have one backend -- the filesystem backend -- which I think
 should be a plugin in the same sense that the others will be plugins
 (i.e.: other code in tahoe-lafs can interact with a filesystem plugin
 without caring very much about how or where it is storing its files --
 otherwise it doesn't seem very extensible). If you accept this, then we'd
 need to figure out what a backend plugin should look like.

 There is backend-independent logic in the current server implementation
 that we wouldn't want to duplicate in every other backend implementation.
 To address this, we could start by refactoring the existing code that
 reads or writes shares on disk, to use a local backend implementation
 supporting an IStorageProvider interface (probably a fairly simplistic
 filesystem-ish API).

 (This involves changing the code in
 [source:src/allmydata/storage/server.py] that reads from local disk in its
 [source:src/allmydata/storage/server.py at 4164#L359 _iter_share_files()]
 method, and also changing [source:src/allmydata/storage/shares.py at 3762
 storage/shares.py], [source:src/allmydata/storage/immutable.py at 3871#L39
 storage/immutable.py], and
 [source:src/allmydata/storage/mutable.py at 3815#L34 storage/mutable.py] that
 write shares to local disk.)

 At this point all the existing tests should still pass, since we haven't
 actually changed the behaviour.

 Then we have to add the ability to configure new storage providers. This
 involves figuring out how to map user configuration choices to what
 actually happens when a node is started, and how the credentials needed to
 log into a particular storage backend should be specified. The skeletal
 RIStorageServer would instantiate its IStorageProvider based on what the
 user configured, and use it to write/read data, get statistics, and so on.

 Naturally, all of this would require a decent amount of documentation and
 testing, too.

 Once we have all of this worked out, the rest of this project (probably to
 be handled in other tickets) would be identifying what other backends we'd
 want in tahoe-lafs, then documenting, implementing, and testing them. We
 already have Amazon S3 and Rackspace as targets -- users of tahoe-lafs
 will probably have their own suggestions, and more backends will come up
 with more research.

--

Comment(by davidsarah):

 Update description to reflect kevan's suggested approach.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/999#comment:5>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list