[tahoe-dev] [tahoe-lafs] #632: backupdb: override pathname root for snapshots

Shawn Willden shawn-tahoe at willden.org
Wed Feb 18 13:36:54 PST 2009


On Wednesday 18 February 2009 11:07:34 am tahoe-lafs wrote:
>  By the way, I had a good idea for a technique to avoid missed or
>  inconsistent backups of a file at the cost of having delayed (even
>  indefinitely delayed) backup of that file.  It is written in this message:
>
>  http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html

My code does something similar to that.  I also wanted to handle the case of 
constantly-changing files, and the case of files that change often enough 
that the delay between scanning and uploading might mean they never get 
uploaded.

The basic concept is that if a file is "unstable", you have to make a copy of 
it and upload from there.  The process works like this:

1.  If the file is in the unstable list, go to step #4.

2.  If age < mtime_granularity, sleep for mtime_granularity (currently 
hardcoded as one second).

3.  After doing hashing, signature generation and (possibly) delta generation, 
the file is lstat'd again and the new file metadata is compared with the 
previous metadata.  If they differ the computed data is discarded and the 
file is considered unstable (but not added to the unstable list).

4.  The unstable file is copied to a storage area.  Hashing, signature 
generation and (possibly) delta generation are done on this file.

4.a.  If a delta was generated, the file is deleted from the storage area and 
only the delta is kept.

4.b.  If a delta was not generated, the file is kept in the storage area for 
the uploader to use.

5.  When the storage area reaches a configured size limit, the oldest 
files/deltas in storage are deleted to reduce it below the size limit.  For 
each deleted file, the job queue is scanned to see if there are any other 
upload jobs referring to the same pathname.  If so, nothing more is done; 
newer versions will be uploaded so we'll simply lose this older one.  If 
there are no other upload jobs referring to the file, the file is added to 
the unstable list.

6.  When the uploader tries to process a job for a file that is not in the 
storage area, but notices that the file no longer matches the metadata in the 
job, the uploader first checks the job queue to see if there are any other 
jobs for this pathname.  If not, it adds the file to the unstable list.

In case it's not clear the "unstable list" is a list of files that must be 
enqueud as new backup jobs during the next scan, and must be copied to 
storage.

Files remain on the unstable list until the uploader removes them.  It does 
that when it uploads a file that is on the list and finds that the copy in 
the file system is the same as the one in the storage area.

	Shawn.


More information about the tahoe-dev mailing list