[tahoe-dev] Tahoe performance

Brian Warner warner-tahoe at allmydata.com
Wed Feb 18 19:44:51 PST 2009


On Wed, 18 Feb 2009 19:12:26 -0700
Shawn Willden <shawn-tahoe at willden.org> wrote:

> The man page says it goes into whole-file mode when both paths are
> local.

Ah, good, so it's merely a matter of getting the FUSE bindings to do the
right thing with metadata, rather than working magic within Tahoe itself.

> Hmm.  It wouldn't be that difficult to add a new sort of dirnode
> which is an revlog-ish index referencing all of the revisions (some
> full, some deltas) for a particular file.  With that in place,
> including a URI syntax for referencing a particular revision, and
> core Tahoe support for extracting the requested revision, it would be
> easy to create a navigable dirnode tree.

True, but the reason our MDMF plans are different (storing the full/deltas in
the file object, rather than the directory) is to allow someone to share a
single mutable file, without sharing the directory that contains it.

We've also considered versioned directories, which could just be a versioned
list of immutable filecaps.. this would be an alternate presentation of a
backup system (sort of like the way I heard that ClearCase lets you pivot
into individual files to see their histories: you can do "ls main.c@@" and
see main.c@@1, main.c@@2, and other versions.. then you can "diff main.c@@2
main.c@@HEAD" instead of needing a special tool like "cvs diff").

On the other hand, versioned directories as a backup tool would make some
operations, like directory moves, a bit more confusing. When I ask "what was
in my /home directory last monday?", but I swapped /home and /home.B on
tuesday, which directory am I talking about?

Of course, a non-versioned directory that references versioned mutable files
is another form of that, but since directories are added and moved and
removed over time, it still might not be as good a backup container as the
timestamped-tree-of-immutable-objects that "tahoe backup" and Time Machine
use.

Also, one design for a versioned file (mutable or immutable) would be to have
a "master file" which contains a list of immutable filecaps, some of which
point to full versions, others which point to deltas, and the master would
have instructions on how to reassemble the pieces (the simplest form wouldn't
even use deltas, just full versions). But we don't plan to do it that way.
The reason we're thinking of revlog-ish for LDMF instead is the segmentation
problem: the reliability/availability drops sharply when the overall file is
composed of many separate immutable files, since you need them all to be
recoverable to recover the aggregate. Tahoe immutable files are segmented
before FEC (and each share gets one block from all segments) specifically to
avoid this problem.

> I started down that path (the revlog-ish index) at one point, and
> then veered off a different direction.  Now I'm not sure why I
> did :-)  I need to go back and look at my notes to see if the
> decision was for some compelling reason. If not, it may be worth
> veering back.

I'd guess that you weren't including fine-grained sharing in the use case..
that's driven a lot of our design, into different directions than we were
originally expecting. That, or you wisely decided to tackle a solvable
problem instead of something large and insane :-).

cheers,
 -Brian


More information about the tahoe-dev mailing list