[tahoe-dev] tahoe backup re-uploads old files

Brian Warner warner at lothar.com
Thu Mar 1 19:09:40 UTC 2012


On 2/29/12 4:32 PM, Greg Troxel wrote:

> I think it's a serious bug that 'man tahoe' followed by 'man
> tahoe-foo' (as directed by 'man tahoe') doesn't have basic usage
> instructions

Yeah, our man pages are not very complete. If we were going to overhaul
the docs system, I'd use git as an example:

 "git help": gives you a list (with one-line synopsis) of the most
             common commands
 "git help foo": detailed docs on "foo", with examples, man-page style
 "git foo --help": same
 "git foo -h": short summary of the main options
 "git help --all": full list of all commands

git-help also has --web and --info options for alternate formats. To
pull all this off, I think they put some intermediate form (.rst or
maybe .nroff) in a /usr/share/ -sort of place, and then do the
formatting at runtime. This approach also has room for more tutorial- or
concept- centric docs, e.g. "git config --help" happens to document the
"git config" command but is also a general catalog of config options.

A lot of the material tahoe has in the docs/ directory could be exposed
this way. Roughly a third of it is tied to a specific command, the
majority is conceptual or about architecture, but having them visible at
runtime would still be a good idea.

> I have avoided tahoe backup because 1) it failed to back up for me
> once and 2) I think the filesystem and the backup control system
> should be orthogonal, and I haven't seen a good argument why tahoe
> doing a roll-your-own backup program that is tied to the filesystem is
> a big enough win to overcome the cost of the coupling.

Yeah, that's a fair argument. I built "tahoe backup" because it seemed
the best way to take advantage of tahoe's unique features. The
orthogonal way to handle backups, as implemented in a zillion existing
programs, generally expects a POSIX-like backend filesystem. Tahoe is
both more and less than that:

* it has immutable files and directories, which can safely be shared
  between subsequent backups
* modifying files is expensive, and new files should be written
  all-at-once
* tahoe files need to be checked/repaired/renewed every once in a while

Using "cp -r" into a FUSE-mounted Tahoe filesystem would miss all of
this: each pass would try to re-copy pre-existing files (unless you
build a backupdb to avoid it), each pass would duplicate existing
directories, and the FUSE layer would add a lot of overhead. (I've never
really been content with FUSE-over-Tahoe, it basically works, but the
impedance mismatch is just too great to make it a happy experience).

Of course, it's also there because of historical Tahoe's origins in a
backup-centric company.

FWIW, "tahoe backup" is basically a standalone program that speaks the
tahoe webapi to achieve backup tasks, that just happens to use bin/tahoe
as an entry point, and is distributed along with the rest of tahoe. With
some architectural changes, it could be a plugin (sort of like how "git
foo" vectors off to a program named "git-foo", so adding shallow plugins
is as easy as dropping a git-foo executable into your $PATH). If you
were to write an independent backup program that took advantage of
tahoe's unique features (instead of targeting a POSIX filesystem), it
would probably look a lot like src/allmydata/scripts/tahoe_backup.py .

There are some other, similar tools that I'd like to have: "tahoe
mirror" to do one-way syncing of local-fs to tahoe-fs, "tahoe sync" to
do a bidirectional sync (ala Dropbox). And then I'd like "tahoe backup"
to be more integrated into the tahoe daemon (or into an "agent", as we
discussed at the last Summit), to be run periodically and safely without
me having to set up a cronjob for it. And "tahoe sync" could be driven
by inotify/fseventsd-style events. But, I'd expect to need to make
similar arguments about why such features should go into Tahoe itself,
rather than being implemented in standalone tools, before putting
serious time into writing them.

cheers,
 -Brian


More information about the tahoe-dev mailing list