[tahoe-dev] tahoe backup re-uploads old files

Brian Warner warner at lothar.com
Thu Mar 1 00:08:36 UTC 2012


On 2/29/12 4:03 PM, Greg Troxel wrote:
> 
> Brian Warner <warner at lothar.com> writes:
> 
>> It depends upon the timing involved, but "tahoe backup" will check up
>> on unchanged files that haven't been checked in a while. If the file
>> was last uploaded or checked within a month, it will assume that the
>> shares are still ok (so 0% chance of doing a filecheck). Starting at
>> one month old, the probability of doing a filecheck grows, until it
>> reaches 100% at two months (i.e. if the file hasn't been checked for
>> over two months, it will *always* do a filecheck). If the filecheck
>> reports any problems, the file is re-uploaded.
> 
> I had no idea.  "man tahoe" didn't explain this :-)

Yeah, we should make that more visible. docs/backupdb.rst explains it,
in the "Upload Operation" section:

https://tahoe-lafs.org/trac/tahoe-lafs/browser/git/docs/backupdb.rst

  A "random early check" algorithm should be used, in which a check is
  performed with a probability that increases with the age of the
  previous results. E.g. files that were last checked within a month are
  not checked, files that were checked 5 weeks ago are re-checked with
  25% probability, 6 weeks with 50%, more than 8 weeks are always
  checked. This reduces the "thundering herd" of
  filechecks-on-everything that would otherwise result when a backup
  operation is run one month after the original backup. If a filecheck
  reveals the file is not healthy, it is re-uploaded.

  If the filecheck shows the file is healthy, or if the filecheck was
  skipped, the client gets to skip the upload, and uses the previous
  filecap (from the 'caps' table) to add to the parent directory.

For the curious, the 1 month / 2 month constants are in
scripts/backupdb.py, here:
https://tahoe-lafs.org/trac/tahoe-lafs/browser/git/src/allmydata/scripts/backupdb.py#L164

cheers,
 -Brian


More information about the tahoe-dev mailing list