[tahoe-dev] [tahoe-lafs] #632: backupdb: override pathname root for snapshots
Shawn Willden
shawn-tahoe at willden.org
Wed Feb 18 13:36:54 PST 2009
On Wednesday 18 February 2009 11:07:34 am tahoe-lafs wrote:
> By the way, I had a good idea for a technique to avoid missed or
> inconsistent backups of a file at the cost of having delayed (even
> indefinitely delayed) backup of that file. It is written in this message:
>
> http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html
My code does something similar to that. I also wanted to handle the case of
constantly-changing files, and the case of files that change often enough
that the delay between scanning and uploading might mean they never get
uploaded.
The basic concept is that if a file is "unstable", you have to make a copy of
it and upload from there. The process works like this:
1. If the file is in the unstable list, go to step #4.
2. If age < mtime_granularity, sleep for mtime_granularity (currently
hardcoded as one second).
3. After doing hashing, signature generation and (possibly) delta generation,
the file is lstat'd again and the new file metadata is compared with the
previous metadata. If they differ the computed data is discarded and the
file is considered unstable (but not added to the unstable list).
4. The unstable file is copied to a storage area. Hashing, signature
generation and (possibly) delta generation are done on this file.
4.a. If a delta was generated, the file is deleted from the storage area and
only the delta is kept.
4.b. If a delta was not generated, the file is kept in the storage area for
the uploader to use.
5. When the storage area reaches a configured size limit, the oldest
files/deltas in storage are deleted to reduce it below the size limit. For
each deleted file, the job queue is scanned to see if there are any other
upload jobs referring to the same pathname. If so, nothing more is done;
newer versions will be uploaded so we'll simply lose this older one. If
there are no other upload jobs referring to the file, the file is added to
the unstable list.
6. When the uploader tries to process a job for a file that is not in the
storage area, but notices that the file no longer matches the metadata in the
job, the uploader first checks the job queue to see if there are any other
jobs for this pathname. If not, it adds the file to the unstable list.
In case it's not clear the "unstable list" is a list of files that must be
enqueud as new backup jobs during the next scan, and must be copied to
storage.
Files remain on the unstable list until the uploader removes them. It does
that when it uploads a file that is on the list and finds that the copy in
the file system is the same as the one in the storage area.
Shawn.
More information about the tahoe-dev
mailing list