[tahoe-dev] Keeping local file system and Tahoe store in sync

Brian Warner warner-tahoe at allmydata.com
Mon Feb 2 19:00:40 PST 2009


On Tue, 3 Feb 2009 12:20:21 +1100
Andrej Falout <andrej at falout.org> wrote:

> It appears that 'standard' command "tahoe cp -r <source> <dest>" performs a
> straight copy of files, without removing files in <dest> deleted from the
> <source>.

Yes, that's right. "tahoe cp" will never delete anything, so the target
directory will end up being a union of the source directory and the previous
contents of the target directory (with errors if a file replaces a directory,
and vice versa).

> I need a solution that will keep <source> and <dest> in sync, like
> rsync, or even better, Unison.

There are a couple of different ways to interpret "in sync".. which are you
thinking about? There are two tasks in our queue that seem relevant.. please
take a look at tickets #597 ("tahoe mirror") and #601 ("tahoe sync").

The "tahoe mirror" command is intended to do as little work as possible to
wind up with the target directory looking exactly like the source directory
(including deleting things from the target directory). It will not modify the
source directory. This is similar to what rsync will do if you give it the
"--archive" and "--delete" options.

The "tahoe sync" command is intended to make the source and the target
directories look the same, using the "newest" files from each (for some
definition of "newest"). It is allowed to modify both directories. This might
be what Unison does, but I don't know how that works. Bidirectional
modification is tricky, because you have to somehow tell the difference
between a file being deleted on side A (and therefore should also be deleted
on side B), and that same file being added on side B (and therefore should
also be added on side A). It also has to deal with conflicts caused by the
user modifying both sides.

> Is that possible using Tahoe CLI? If not, is it feasible to use such a tool
> to sync with a FUSE mounted Tahoe FS?

Running rsync against a FUSE-based Tahoe backend would work, but it would
probably be a lot slower that you'd hope: the "remote" side would have to
read the whole contents of the file out of Tahoe, compute the rsync checksums
against it, then possibly write a new version of the file back into Tahoe.
Since Tahoe uses immutable files for everything except directories, rsync
doesn't win you very much.

Ticket #78 is about making Tahoe more rsync-friendly. It's non-trivial,
though, and will probably need to wait until we've implemented larger mutable
files.

You might also want to look into #598, the "tahoe backup" command. I'm
working on that one right now, and expect to have it committed in some form
in the next few days. "tahoe backup" is a form of "tahoe mirror" that keeps
multiple versions of the old directories.

The current priority ordering for these tickets (a combination of how badly
we want it and how long it is likely to take) is probably #598 first, then
#597, then #601, then #78 last.


cheers,
 -Brian

Tickets mentioned in this message:

 #78 "cater to rsync": http://allmydata.org/trac/tahoe/ticket/78
 #597 "tahoe mirror": http://allmydata.org/trac/tahoe/ticket/597
 #598 "tahoe backup": http://allmydata.org/trac/tahoe/ticket/598
 #601 "tahoe sync": http://allmydata.org/trac/tahoe/ticket/601


More information about the tahoe-dev mailing list