[tahoe-dev] up with filesystems! up with the web!

David-Sarah Hopwood david-sarah at jacaranda.org
Thu Dec 31 13:32:26 PST 2009


Chimpy McSimian IV, Esq. wrote:
> I don't understand the problem. To me, filesystems and the web are
> fundamentally different. While they are both built using, in part, the
> same data structure (directed, cyclic-by-various-means graphs), the
> similarities seem to end there.

The comparison below, after correcting some details, does not support the
conclusion that the similarities end there.

> Web:
> 
> * many object formats

  * absolute URIs identify vertices
  * object names (URI and IRI references) refer to vertices

> * vertices embedded in files

Correction:
  * object names embedded in files

> * vertices encoded in a single document format; i.e. non-HTML object
>   must be leaf nodes

That isn't accurate. Other document formats can encode URI/IRI references.
A particular web client can extract the references from formats that it
'understands' (possibly with the help of plug-ins etc.), and can't extract
them from formats it doesn't understand. HTML has no distinguished status
other than as a format that is widely understood.

> Filesystem:
> 
> * many object formats
> * object names are vertices

Corrections:
  * inode numbers identify vertices (in a given filesystem)
  * object names (paths) refer to vertices

This is similar to the web, but with inode numbers in place of absolute
URIs, and paths in place of URI/IRI references. An important difference
is that inodes are relative to a filesystem (and paths are dependent on
the configuration of mount points), but that's not a difference in the
semantics within a filesystem.

> * vertices not encoded in documents; or, not particularly powerfully

That isn't accurate. Many document formats can encode paths.
A particular application can extract the references from formats that
it 'understands' (possibly with the help of plug-ins etc.), and can't
extract them from formats it doesn't understand.

Of course, the majority of files in most filesystems are not in a single
format that allows the paths to be identified (although it's quite
possible to have a filesystem where the distribution of file formats
is the same as on the web). This is a deficiency of filesystems; it
would be extremely useful if that were possible. Actually, it's also
a deficiency of the web that there is no generic way to extract the
references from new formats.

Note that HTTP servers, and servers for other web protocols, do not
rely in any way on it being possible to extract all the links within
web documents. It's only the higher-level functionality of web clients
and crawlers that relies on that.

In Tahoe, object names are representations of capabilities, rather than
paths. If <http://allmydata.org/trac/tahoe/ticket/432> is fixed, then
they will be URIs, so a Tahoe filesystem will then be a part of the web.
That doesn't mean that it isn't still a filesystem.

> Also, I think users *do* understand filesystems pretty well.

They understand some subset of filesystem semantics, but most users
don't understand the edge cases. Also, many programmers don't understand
them, which leads them to write code that is insecure when these cases
occur (e.g. symlink race conditions).

> If anything, it sounds like you should stick with exposing a tree
> structure to users. Arguably, you could leave out any capability for a
> many-to-one name --> object relationship (i.e. no symlinks or multiple
> hard links), and disappoint only a few nerds while avoiding confusing
> the majority of users.

This would be incompatible with supporting fine-grained sharing: if
I send you a capability for a directory that I've created, you would
not be able to link it into another directory that you created. This
would be a severe regression in functionality that is relied on by
many Tahoe users, not just "a few nerds". In any case, it's not
even implementable -- there is no way of knowing, when a capability
is linked into a directory, whether it has already been linked elsewhere.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
Url : http://allmydata.org/pipermail/tahoe-dev/attachments/20091231/0040da7f/attachment.pgp 


More information about the tahoe-dev mailing list