[tahoe-dev] down with filesystems! up with the web! -- Re: [tahoe-lafs] #776: users are confused by "tahoe rm"
Zooko Wilcox-O'Hearn
zooko at zooko.com
Mon Dec 28 11:59:38 PST 2009
On Sunday, 2009-12-27, at 19:19 , Shawn Willden wrote:
> Indeed, the files are just as much deleted as they are in any Unix
> file system. The only difference is that in a Tahoe grid garbage
> collection is much slower (really slow if the storage nodes have GC
> turned off).
It's true that this same issue is present in any unix file system,
but the speed of garbage collection is not the only difference. An
important difference is that every unix filesystem disallows hard
links to directories. (An exception that proves this rule is that
Apple recently extended HFS to allow hardlinks to directories, but
only with some specific limitations intended to prevent cycles, and
only to support Time Machine backups.) Also non-unix filesystems
such as Windows and pre-unix Mac disallow hardlinks to directories,
and even hardlinks to files. This makes me suspicious that the
designers of those systems had good reasons for this, and the fact
that Tahoe-LAFS gaily allows hardlinks to any object is probably an
example of fools rushing in where angels fear to tread. That is:
users are inherently confused by a "path-based filesystem"
abstraction or a "folders-and-documents" abstraction built on top of
an arbitrary directed graph. The most successful filesystem products
try to hide the arbitrary graph layer as much a possible, where Tahoe-
LAFS tries to expose it as much as possible.
Further cause for concern: many Unix users, even "power users", try
to avoid the use of hardlinks whenever possible, considering them a
confusing and error-prone feature.
Pretty gloomy picture. But there is hope: The Web!
Suppose instead of thinking of their Tahoe-LAFS-hosted files and
their Tahoe-LAFS directories as being part of a "folders-and-
documents" abstraction, and instead of them being part of a unixy
path-based "filesystem", they thought of them as a collection of web
pages which could have hyperlinks to one another. Then there is no
more "impedance mismatch" between the abstraction in the user's head
and the underlying graph structure. No user is ever surprised that
multiple web pages can point to the same web page, or that following
a series of hyperlinks can take you in a circle. No software
intended for the Web assumes that the set of web pages that it will
visit forms a perfectly hierarchical tree structure without cycles or
converging links.
Basically, the Web has proven to be both a more powerful and a more
user-friendly abstraction for managing collections of documents than
the old path-based filesystem abstraction or the old folders-and-
documents abstraction.
Regards,
Zooko
P.S. My brother Nejucomo says that it should be named "tahoe unlink"
instead of "tahoe rm". I think that that would be a good usability
improvement.
P.P.S. My wife Amber says that the only reason people limited
filesystems to a tree structure is that many important algorithms
that you might want to use on your filesystem would be inefficient on
non-tree structures, but now that the Web has formed itself as a non-
tree structure we have been forced to develop heuristics and work
around such inefficiencies anyway.
P.P.P.S. See Mark Bernstein's blog entry on how Engelbart's vision
for what is now The Web had a hierarchical principle and Nelson's had
that "everything is deeply intertwingled": http://
www.markbernstein.org/Feb0301/Engelbart.html
More information about the tahoe-dev
mailing list