[tahoe-dev] Keeping local file system and Tahoe store in sync

Brian Warner warner-tahoe at allmydata.com
Tue Feb 3 15:47:15 PST 2009

On Mon, 2 Feb 2009 21:46:26 -0700
Shawn Willden <shawn-tahoe at willden.org> wrote:

> Thanks. It looks like his approach is sufficiently different from mine that
> I'm going to just keep going as I am.

Yeah, that's the conclusion I came to as well.

My focus is on:

 * flat backup (not aggregated): the backed up data can be read and shared
   directly with standard Tahoe tools, not requiring an additional database
   or tools to interpret. This is the biggest difference.
 * a workload (typical delta between snapshots) that consists mainly of the
   addition of new files and the modification of small files. My target
   workload does not include modification of large files.
 * getting the tool written and committed in a few days

A few notes about the differences that Shawn mentioned:

> 4.  Backup of metadata in addition to file contents. Permissions, ACLs,
> resource forks, etc. My ultimate goal is to be able to do whole-system
> backups and restores, so this is essential.

I'd like to get these included in "tahoe backup".. if you run into some code
which can extract these extra pieces of metadata, please let me know.. I've
got a stubbed out function waiting for it. I don't know about resource forks,
though, they should be attached to the filenode, not the parent directory
entry. But, with read-only snapshots, I guess there isn't much difference.

> 6.  A focus on the issue of initial, large uploads. A backup session can be
> terminated and resumed, and reasonable timestamping of backups is
> maintained to facilitate a future "Time Machine"-like view.

Eeeyah, that's a good point. In my scheme, if you interrupt a backup process
(including the original one), you wind up with nothing in your Tahoe-side
Backups/ directory, since the new snapshot is only attached to Backups/ at
the very end. This also may make progress harder to track: some people want
their Backups/ directory to get noticably larger as the process cranks away.
On the other hand, all the files that are being uploaded will get stashed in
the backupdb, and eventually the directories too, so you won't lose any
actual progress by interrupting the snapshot and starting a new one.


More information about the tahoe-dev mailing list