[tahoe-dev] [tahoe-lafs] #999: support multiple storage backends, including amazon s3
tahoe-lafs
trac at allmydata.org
Wed Mar 31 10:17:57 PDT 2010
#999: support multiple storage backends, including amazon s3
-----------------------------+----------------------------------------------
Reporter: zooko | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-storage | Version: 1.6.0
Keywords: gsoc backend s3 | Launchpad_bug:
-----------------------------+----------------------------------------------
Old description:
> (originally I incorrectly posted this to #917)
>
> The way to do it is to make a variant of
> [source:src/allmydata/storage/server.py] which doesn't read from local
> disk in its [source:src/allmydata/storage/server.py at 4164#L359
> _iter_share_files()], but instead reads the files using its backend
> protocol, e.g. from its S3 bucket (it is an S3 client and a Tahoe-LAFS
> storage server). Likewise variants of
> [source:src/allmydata/storage/shares.py at 3762 storage/shares.py],
> [source:src/allmydata/storage/immutable.py at 3871#L39
> storage/immutable.py], and
> [source:src/allmydata/storage/mutable.py at 3815#L34 storage/mutable.py]
> which write their data out using the backend protocol, instead of to
> their local filesystem.
>
> Probably one should first start by abstracting out just the "does this go
> to local disk, S3, Rackspace Cloudfiles, etc" part from all the other
> functionality in those four files... :-)
New description:
The focus of this ticket is (now) adapting the existing codebase to use
multiple backends, rather than supporting any particular backend.
We already have one backend -- the filesystem backend -- which I think
should be a plugin in the same sense that the others will be plugins
(i.e.: other code in tahoe-lafs can interact with a filesystem plugin
without caring very much about how or where it is storing its files --
otherwise it doesn't seem very extensible). If you accept this, then we'd
need to figure out what a backend plugin should look like.
There is backend-independent logic in the current server implementation
that we wouldn't want to duplicate in every other backend implementation.
To address this, we could start by refactoring the existing code that
reads or writes shares on disk, to use a local backend implementation
supporting an IStorageProvider interface (probably a fairly simplistic
filesystem-ish API).
(This involves changing the code in
[source:src/allmydata/storage/server.py] that reads from local disk in its
[source:src/allmydata/storage/server.py at 4164#L359 _iter_share_files()]
method, and also changing [source:src/allmydata/storage/shares.py at 3762
storage/shares.py], [source:src/allmydata/storage/immutable.py at 3871#L39
storage/immutable.py], and
[source:src/allmydata/storage/mutable.py at 3815#L34 storage/mutable.py] that
write shares to local disk.)
At this point all the existing tests should still pass, since we haven't
actually changed the behaviour.
Then we have to add the ability to configure new storage providers. This
involves figuring out how to map user configuration choices to what
actually happens when a node is started, and how the credentials needed to
log into a particular storage backend should be specified. The skeletal
RIStorageServer would instantiate its IStorageProvider based on what the
user configured, and use it to write/read data, get statistics, and so on.
Naturally, all of this would require a decent amount of documentation and
testing, too.
Once we have all of this worked out, the rest of this project (probably to
be handled in other tickets) would be identifying what other backends we'd
want in tahoe-lafs, then documenting, implementing, and testing them. We
already have Amazon S3 and Rackspace as targets -- users of tahoe-lafs
will probably have their own suggestions, and more backends will come up
with more research.
--
Comment(by davidsarah):
Update description to reflect kevan's suggested approach.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/999#comment:5>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list