[tahoe-dev] notes about the pycon paper
zooko
zooko at zooko.com
Fri Mar 14 03:34:58 PDT 2008
Dear Brian:
http://allmydata.org/~warner/tahoe.html
Way to go on the pycon paper! Here are a few notes. They may sound
negative, but only because I'm not taking the time to mention all the
positive things -- all of the paper that I've read so far is chock
full of good stuff. Here are a few minor complaints.
* zfec isn't just a Python wrapper around Rizzo's fec library -- I
also changed the fec implementation itself in C.
* Some people will probably assume that the word "DHT" implies
scalable algorithms. They may subsequently be disappointed if they
learn that Tahoe's doesn't have a scalable DHT.
* Something about the term "Virtual Drive" bugs me, but I can't
quite put my finger on it. I conceive of a "drive" as being a
container holding a monolithic bundle of data which is accessed
through a single mount point (which is a "drive letter" on Windows).
This just doesn't fit with my conception of the Tahoe decentralized
filesystem. I call the middle layer "the decentralized filesystem
layer".
Hm, except that I see that my terminology of "decentralized
filesystem layer" is broader -- I sometimes think of the mutable and
immutable slots as being part of the "filesystem", but you
distinguish those from the "virtual drive layer" and call them the
"DHT layer". I guess the part that you call the "virtual drive
layer" is what I call "directories". :-)
* "Each client stores a specific 'root directory'". This is not
strictly true -- the "virtual drive" (or "directories") layer has no
conception of root directories; root directories are a concept of the
application layer, and not all apps have them (wui's don't and cli's
don't unless manually set up to do so).
I think this is actually related to your and my difference of
terminology about "the vdrive layer". You define a "vdrive" (in the
pycon paper) as the transitive closure of the filesystem which is
reachable from a given directory (called a "root directory"). I
think this notion is appealing but misleading -- thinking of the
transitive closure of directories and files from a certain root
directory as being in a single container is likely to confuse you,
because other people can have some of those same directories and
files in their containers. Furthermore, you don't have only one root
directory -- you sometimes want to consider the transitive closure of
directories and files reachable from other starting directories, so
therefore you can have some of those same directories and files in
other containers yourself!
Most people would be confused if there were two different
"drives" (i.e. two different drive letters on their Windows machine)
which had some of the same files and directories inside them. (Note:
I mean the same files and directories -- not identical copies, and
not shortcuts or symlinks.) People think of "drives" as being
separate, monolithic and tree-structured (plus shortcuts/symlinks).
Tahoe is not like that. There are no "drives" in Tahoe.
Obviously if you forbid all sharing and you never mount any
starting directory but a distinguished one, then you can think of the
transitive closure of files and directories as being a "virtual
drive", but this is not a layer of Tahoe -- this is one possible
application of Tahoe, and not a very good one.
Actually we have a design for an application which allows sharing
but which deliberately breaks shared links to mutable objects by
performing deep copies whenever the user drags a node out of a
friend's drive. I'm not sure, but I think maybe in that hypothetical
application, then there would be a virtual drive. But again in
Tahoe, there is no virtual drive layer.
Regards,
Zooko
More information about the tahoe-dev
mailing list