[tahoe-dev] proposal for an HTTP-based storage protocol

Ravi Pinjala ravi at p-static.net
Sun Sep 26 05:35:01 UTC 2010


There have been some noises on this list about replacing the
foolscap-based storage protocol with something HTTP-based and easier
to work with. I'd like to throw in my own work on an extensible
HTTP-based storage protocol as a starting point.

I've been working on something for a little while now that I've been
calling webfs [1][2]. It started out as an attempt to implement cloud
storage that could be hosted on my dreamhost account, but (as often
happens XD) the protocol is a lot more general than that. The core
idea is that there's a single dataset, which is exposed through
multiple interfaces. The interfaces are listed in a discovery
document, along with short names which are used in the URL to access
the data through the given interface.

The goal here is managed extensibility - if you want to add a new
module later on, you can do so by adding a new interface to the data
at a completely new path, so compatibility is maintained.

An example, before I get too far ahead of myself:

* discovery document URL: http://server.address/

* discovery document contents:
<webfs>
	<module path="data" interface="http://p-static.net/webfs/data/1.0">
		<feature name="max-directory-depth" value="0" />
		<feature name="max-data-size" value="1048576" />
	</module>
	<module path="metadata" interface="http://p-static.net/webfs/metadata/1.0" />
</webfs>

* "data" interface URL: http://server.address/data/

* URL of a document stored on the server: http://server.address/data/foo/bar

* URL of the metadata for said document: http://server.address/metadata/foo/bar

* Example of direct access to a metadata key:
http://server.address/metadata/foo/bar?mtime

You can see that, because the interface is added at the beginning of
the path portion of the URL, it's simple and unambiguous to extend
this with arbitrary new functionality. (I've actually got an instance
of this online right now, but I don't want to risk it getting
bombarded. Email my off-list if you want to mess around with it, but
don't want to run your own copy for some reason.)


The modules I've implemented so far are a RESTful data module
(GET/PUT/DELETE on a path do exactly what you'd expect) and metadata
module (lets you associate arbitrary key-value metadata with a file,
also uses GET/PUT/DELETE in an intuitive way). If my understanding of
how a storage node works is correct, this is enough to implement a
storage node. These interfaces can be implemented in a few hundred
lines of python, and can run as a CGI script (so that any webhost
could become a storage node, if we could figure out a clean way to
tell the introducer about it).

(I've also got the beginnings of a protocol test suite [3], so even
though implementing this in Tahoe would probably be a new
implementation, it wouldn't be starting from scratch.)

Questions/feedback? I'm especially interested in knowing if my
thinking about storage nodes is correct - is access to raw share data
+ key-value metadata enough to implement a storage node?

[1] http://bitbucket.org/pstatic/webfs
[2] I think somebody sent a mail to this list a little while ago about
a similar idea for a web-based filesystem? I don't have the email
anymore, but whoever it was, we should talk. :D
[3] http://bitbucket.org/pstatic/webfs/src/tip/test/

--Ravi


More information about the tahoe-dev mailing list