[tahoe-dev] what's the impact of [4272]

Brian Warner warner at lothar.com
Sun Jun 6 22:46:13 PDT 2010


Zooko O'Whielacronx wrote:
> 
> How should I describe the impact of this patch in the NEWS or
> ChangeLog?
> 
> http://tahoe-lafs.org/trac/tahoe-lafs/changeset/4272/

It fixes a fairly rare bug that I happened to run into on my personal
backup system. The bug occurs when two unusual things happen at once:
local metadata has changed (i.e. you've updated a file timestamp), and a
directory must be re-uploaded (because the Checker failed, probably
because a server is offline). The symptom of the bug is that each
execution of 'tahoe backup' re-uploads the same directory, causing more
work than it is supposed to perform.

Here's the scenario that was previously broken:

 * use 'tahoe backup' on a local directory, let it complete successfully
   (call the directory it creates "v1"). This dircap goes into the
   backupdb.
 * touch one of the children (changing the directory metadata)
 * wait a month
 * take one of the servers offline, or delete a share of that directory
 * use 'tahoe backup' again. The month of delay means it must Check each
   file/directory instead of re-using the old copy. The unhealthy dircap
   means that it will re-upload the directory. The changed metadata
   means that the new directory it creates will be different than the
   previous one, so it will have a new dircap. (call the directory it
   creates "v2"). This dircap is supposed to go into the backupdb.
 * use 'tahoe backup' a third time

What's supposed to happen is that the second 'tahoe backup' should
upload v2, and the third backup should not upload anything, because the
second one replaced the backupdb entry with the v2 dircap. The bug was
that the v2 dircap didn't make it into the backupdb (I misinterpreted
the 'index is already there' condition for a harmless race condition,
and didn't recover correctly). As a result, the third backup (and
fourth, and all subsequent ones) would upload the directory again and
again, never converging on the "null backup" (where two successive
backups with no local changes in between them should result in no
network activity).

This bug also happens if you upload files through a Helper which can see
more servers than your Checker can (the Checker thinks the file is
unhealthy, but the Uploader+Helper sees all the shares in the right
place). This is what I was experiencing, because one of my servers was
behind a NAT box, and the Helper was on a public IP.

For the NEWS file, I'd describe it as follows:

 A rare bug causing unnecessary work in 'tahoe backup' was fixed, which
 was triggered when a previously backed-up directory flunked a Checker
 examination (causing it to be re-uploaded) and local metadata had
 changed. The backupdb was not updated correctly, so each subsequent
 'tahoe backup' invocation would re-upload the directory, instead of
 correctly re-using the previously uploaded version. This also occurred
 when using a Helper that can see more servers than your local client.


cheers,
 -Brian


More information about the tahoe-dev mailing list