[tahoe-lafs-trac-stream] [tahoe-lafs] #1937: back up the content of a file even if the content changes without changing mtime

tahoe-lafs trac at tahoe-lafs.org
Wed Mar 27 18:31:06 UTC 2013


#1937: back up the content of a file even if the content changes without changing
mtime
-------------------------------------------------+-------------------------
 Reporter:  zooko                                |          Owner:
     Type:  defect                               |         Status:  new
 Priority:  normal                               |      Milestone:
Component:  code                                 |  undecided
 Keywords:  tahoe-backup reliability             |        Version:  1.9.2
  preservation                                   |  Launchpad Bug:
-------------------------------------------------+-------------------------
 From [//pipermail/tahoe-dev/2008-September/000809.html].

 If an application writes to a file twice in quick succession, then the
 operating system may give that file the same {{{mtime}}} value both times.
 {{{mtime}}} granularity varies between OSes and filesystems, and is often
 coarser than you would wish:

 ¹ http://www.infosec.jmu.edu/documents/jmu-infosec-tr-2009-002.pdf

 ² http://msdn.microsoft.com/en-
 us/library/windows/desktop/ms724290%28v=vs.85%29.aspx

 * Linux/ext3 - 1 sec ![¹]
 * Linux/ext4 - 1 nanosec ![¹]; actually 1 millisec (observed by my
 experiment just now on linux 3.2, ext4)
 * FreeBSD/UFS - 1 sec ![¹]
 * Mac - 1 sec ![¹]
 * Windows/FAT - 2 sec, no timezone, when DST changes it is off by one hour
 until next reboot: ![¹]
 * Windows/NTFS - 100 nanosec: ![¹]; possibly actually 1.6 microsec ![²]?
 * Windows/* - {{{mtime}}} isn't necessarily updated until the filehandle
 is closed [¹, ²]

 Note that FAT is the standard filesystem for removable media (isn't it?),
 so it is actually very common.

 Now the problem is, what happens if

 1. an application writes some data, `D1` into a file, and the timestamp
 gets updated to `T1`, and then

 2. {{{tahoe backup}}} reads `D1`, and then

 3. the app writes some new data, `D2`, and the timestamp doesn't get
 updated because steps 2 and 3 happened within the filesystem's
 granularity?

 What happens is that {{{tahoe backup}}} has saved `D1`, but from then on
 it will never save `D2`, since it falsely believes it already saved it
 since its timestamp is still `T1`. If this were to happen in practice, the
 effect for the user would be that when they go to read the file from
 Tahoe-LAFS, they find the previous version of its contents — `D1` — and
 not the most recent version — `D2`. This unfortunately user would probably
 not have any way to figure out what happened, and would justly blame
 Tahoe-LAFS for being unreliable.

 The same problem can happen if the timestamp of a file gets reset to an
 earlier value, such as with the {{{touch -t}}} unix command, or by the
 system clock getting moved. (The system clock getting moved happens
 surprisingly often in the wild.)

 A user can avoid this problem by passing {{{--ignore-timestamps}}} to
 {{{tahoe backup}}}, which will cause that run of {{{tahoe backup}}} to
 reupload every file. That is very expensive in terms of time, disk, and
 CPU usage (even if the files get deduplicated by the servers).

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1937>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list