[tahoe-dev] proposal for an HTTP-based storage protocol

Kevin Reid kpreid at switchb.org
Sun Sep 26 11:51:09 UTC 2010


On Sep 26, 2010, at 1:35, Ravi Pinjala wrote:

> There have been some noises on this list about replacing the
> foolscap-based storage protocol with something HTTP-based and easier
> to work with. I'd like to throw in my own work on an extensible
> HTTP-based storage protocol as a starting point....
[...]

I'd like to offer some criticism of this protocol from a web- 
architecture/REST perspective.

> * discovery document URL: http://server.address/

Let this be an arbitrary URL, not required to be the server root.

> * discovery document contents:
> <webfs>
> 	<module path="data" interface="http://p-static.net/webfs/data/1.0">
> 		<feature name="max-directory-depth" value="0" />
> 		<feature name="max-data-size" value="1048576" />
> 	</module>
> 	<module path="metadata" interface="http://p-static.net/webfs/metadata/1.0 
> " />
> </webfs>

Place these elements in an XML namespace.

Perhaps even let the XML namespace serve for interface and feature  
identification:

<webfs xmlns="http://p-static.net/webfs/1.0">
	<module xmlns="http://p-static.net/webfs/data/1.0"
	        path="data">
		<max-directory-depth>0</max-directory-depth>
		<max-data-size>1048576</max-data-size>
	</module>
	<module xmlns="http://p-static.net/webfs/metadata/1.0"
	        path="metadata" />
</webfs>

> * "data" interface URL: http://server.address/data/

This URL should be constructed from resolving path="data" as a  
relative URL against the discovery document URL. Then, use xlink:href=  
instead of path= as the attribute.

The goal of all these changes is to make the XML contain more  
semantics that are already understood by general XML/web tools,  
reducing the amount of application-specific interpretation logic  
needed (thus reducing the chances that someone will casually implement  
the interpretation incorrectly).

> * URL of a document stored on the server: http://server.address/data/foo/bar
>
> * URL of the metadata for said document: http://server.address/metadata/foo/bar
>
> * Example of direct access to a metadata key:
> http://server.address/metadata/foo/bar?mtime

It should be explicitly part of the definition of the data and  
metadata modules that they define these path patterns (underneath the  
path= URL).

> The modules I've implemented so far are a RESTful data module
> (GET/PUT/DELETE on a path do exactly what you'd expect) and metadata
> module (lets you associate arbitrary key-value metadata with a file,
> also uses GET/PUT/DELETE in an intuitive way). If my understanding of
> how a storage node works is correct, this is enough to implement a
> storage node.

What it doesn't have that a storage node should have is verifying of  
what's uploaded; it should check that the name of an uploaded share is  
the appropriate function of its contents (I don't know offhand what  
that actually is), so that clients can't upload obviously bogus shares.

IIRC, this is one of the reasons we haven't just implemented 'WebDAV  
server as storage node', even though WebDAV does have the GET/PUT/ 
DELETE and arbitrary-metadata functionality.

(Ah, that raises another question: What are the advantages of your  
protocol over WebDAV? I've implemented WebDAV, and while it does have  
a certain amount of architecture bloat, it doesn't seem -- at the  
moment -- worth using a different protocol just for that.)

-- 
Kevin Reid                                  <http://switchb.org/kpreid/>





More information about the tahoe-dev mailing list