[tahoe-dev] convert-backups-to-shared-dirs tool

Brian Warner warner-tahoe at allmydata.com
Mon Jan 26 13:45:50 PST 2009


Here are some thoughts I had on a tool to convert a series of "cp -r"
-style backups into a form that uses fewer dirnodes.

For background, the allmydata.com windows backup tool works by basically
performing a "cp -r LOCAL TAHOE:Backups/$DATE" every day. The tool is smart
enough to maintain a mapping from local file to tahoe filecap, so it can
avoid some of the work associated with files that haven't changed since the
previous backup. (if that doesn't work, then Tahoe will hash the file and
then discover that it's already present in the grid, which is still less work
than uploading the whole thing a second time). However, the tool isn't smart
enough to share directories with a previous backup. If your local filesystem
hasn't changed since the last backup, you'll get a new batch of directories
each day, but the total number of unique files will remain the same.

The problem we're starting to see is that, for a large local filesystem (say
5000 directories), this means each allmydata.com account is gaining an extra
5k dirnodes per day. Let this run for three months and you've got 500k
dirnodes. The manifest/repairer tools that we're running now must perform a
recursive traversal of the entire account (to build up a list of objects that
need checking or repairing), which takes time that is roughly linear to the
number of directories it must visit. The walker can visit about 2-10 dirnodes
per second (depending on load), so that 500k dirnode account takes anywhere
from 50k seconds to 250ks to traverse (half a day to four days). That's
annoying.

What we'd prefer is that the backups take advantage of the fact that most of
the filesystem is probably the same as it was the day before, and most of the
directory nodes will be unchanged. Ideally we'd like to use immutable
dirnodes for this, but we haven't implemented those yet, so instead we'll use
merely read-only dirnodes. This is effectively the same way that apple's
"Time Machine" backup system works.

Each backup [N] will then consist of either references to dirnodes in the
[N-1] backup (for things have have not been changed), or new dirnodes (for
things that did change). The top-level Backups/ directory will be a
write-cap, but all of the $DATE directories underneath it will only contain a
read-cap.

At some point, the windows backup tool will create directories like this all
the time. But for now, and to reduce the number of dirnodes present in
existing backups, we need a tool to convert the old backup structure into the
new shared form.

The first pass of this tool will take the [N-1] readcap and the read-only
form of the [N] writecap, and emit the new [N] readcap. A parent tool should
take the Backups/ writecap and walk through all the timestamped backups
therein, running this tool on each pair, until all backups have been
converted.

The tool should work as follows:

 1: define OLD to be the [N-1] tree, and NEW to be the [N] tree
 2: start with the NEW root, so PATH=[]

  recurse(PATH, DIRCAP): this will eventually return a dircap
  3: OLDFILES = listdir(OLD/PATH), NEWFILES = listdir(NEW/PATH), DIRTY=False
     OLDDIRCAP = dircap(OLD/PATH)
  4: for each directory child in NEWFILES, recurse(PATH/childname, childcap)
  5: compare the new dircap returned by recurse() against the one that was
     passed in
   5a: if they are the same, that means this subtree is identical. pass.
   5b: if they are different, that means that this subtree is different.
       update NEWFILES[childname]=newdircap, DIRTY=True
  6: when done with all directory children, compare NEWFILES against OLDFILES
   6a: if they are the same, then this dirnode is the same as the one in OLD,
       so we can share the old one. RETURN OLDDIRCAP.
   6b: if they are different, then we cannot share the old one.
    6ba: if DIRTY=True, we shared some children (but not all), and must
         update this dirnode. set_children(DIRCAP, NEWFILES). RETURN DIRCAP.
    6bb: if DIRTY=False, everything in the subtree was new, so nothing was
         shared and nothing must be updated. RETURN DIRCAP.

This routine will hold childnames and caps for every ancestor and uncle-node,
so the peak memory usage will be roughly log(N), more like B*log(N) where "B"
is the branching factor (average number of objects per directory).

As far as runtime goes, each pass will need to do two recursive walks.
There's a way to cut this in half by retaining the directory contents for the
whole NEW tree, but I don't yet know how much memory this would represent for
our largest Backups/ directory, so I'm not sure if it would be a good
tradeoff or not. The directory-contents data structure would probably want to
be indexed by PATH, and contain both the dircap and the child list for each
dirnode.

cheers,
 -Brian


More information about the tahoe-dev mailing list