[tahoe-lafs-trac-stream] [Tahoe-LAFS] #897: "tahoe backup" thinks "ctime" means "creation time"

Tahoe-LAFS trac at tahoe-lafs.org
Fri Jun 27 00:23:43 UTC 2014


#897: "tahoe backup" thinks "ctime" means "creation time"
-------------------------+-------------------------------------------------
     Reporter:  zooko    |      Owner:  warner
         Type:  defect   |     Status:  new
     Priority:  major    |  Milestone:  soon
    Component:  code-    |    Version:  1.6.1
  frontend-cli           |   Keywords:  forward-compatibility docs tahoe-
   Resolution:           |  backup time
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Old description:

> backupdb seems to think "ctime" means "creation time", which it does, but
> only on Windows.
>
> This means ~~there is an incorrect statement in the documentation,~~ that
> "tahoe backup" is unnecessarily re-uploading files in the case that the
> ownership or permission bits have changed but the file contents haven't,
> and that "tahoe backup" is incorrectly mapping between "unix change time"
> and "file creation time" when used on Windows.  So this ticket is for
> ~~three~~ two bugs, but they are all closely related and should probably
> be fixed at once.
>
> I noticed in [source:docs/backupdb.txt at 4111#L84] that the backupdb docs
> mention "creation time".  POSIX doesn't provide a "creation time" but it
> does provide a "change time", abbreviated "ctime", which most people
> mistakenly think is a "creation time".  Windows ''does'' provide a
> "creation time", and unfortunately Python provides unix "change time" and
> Windows "creation time" in the same slot -- the {{{st_ctime}}} slot of
> the {{{stat}}} module.  Here is my [http://bugs.python.org/issue5720 bug
> report] saying that the Python stdlib is wrong to do this, and that any
> Python code which uses the Python stdlib is wrong unless it immediately
> disambiguates.
>
> In particular, it is a bug for any Tahoe-LAFS code to read the
> {{{st_ctime}}} member without immediately switching on whether the
> current platform is Windows or not.  If you read the {{{st_ctime}}}
> member and do not use the current platform to disambiguate, then you have
> a value whose semantics are uninterpretable without guessing what
> platform that value was generated on.
>
> In particular, for "tahoe backup" purposes, it is probably a mistake to
> say that a new {{{ctime}}} means that the file needs to be uploaded
> again.  Unix and Windows both guarantee that the {{{mtime}}} will be
> changed if the file contents have changed, and therefore if {{{mtime}}}
> is unchanged then the file contents are unchanged, even if the
> {{{ctime}}} has changed.  On the other hand the {{{ctime}}} changes on
> Unix even when the file contents have not changed, such as if ownership
> or permission bits have changed.  So if only the {{{ctime}}} has changed
> then "tahoe backup" might want to set the new {{{ctime}}} value on the
> link leading to that file, but it should not reupload the file contents.
>
> In addition, I think "tahoe backup" should disambiguate between "unix
> change time" and "creation time" in the metadata that it stores.  Why not
> change the name of the metadata stored in the tahoe-lafs filesystem edge
> from the ambiguous and widely misunderstood "ctime" to something like
> "unix change time", and then if you are on non-Windows you can set that
> from the local filesystem's {{{ctime}}} on upload and set the local
> filesystem's {{{ctime}}} from that on download.  On the other hand if you
> are on Windows then it is a bug to set the "unix change time" from the
> local filesystem's {{{ctime}}}, although it would be correct to set a
> different metadata entry named {{{file creation time}}} from the local
> filesystem's {{{ctime}}}.
>
> See also #628, which is about the same issue in "tahoe cp", includes a
> taxonomy of filesystem "ctime" semantics, and includes a satisfactory
> backward-compatible solution that was shipped in Tahoe-LAFS v1.4.1.
>
> I'm tagging this ticket with "forward-compatibility" because we'll
> eventually have to clarify these semantics and the longer we ship a tool
> that uploads ambiguous data the harder it will be to fix.

New description:

 backupdb seems to think "ctime" means "creation time", which it does, but
 only on Windows.

 This means ~~there is an incorrect statement in the documentation,~~ that
 "tahoe backup" is unnecessarily re-uploading files in the case that the
 ownership or permission bits have changed but the file contents haven't,
 and that "tahoe backup" is incorrectly mapping between "unix change time"
 and "file creation time" when used on Windows.  So this ticket is for
 ~~three~~ two bugs, but they are all closely related and should probably
 be fixed at once.

 I noticed in [source:docs/backupdb.txt at 4111#L84] that the backupdb docs
 mention "creation time".  POSIX doesn't provide a "creation time" but it
 does provide a "change time", abbreviated "ctime", which most people
 mistakenly think is a "creation time".  Windows ''does'' provide a
 "creation time", and unfortunately Python provides unix "change time" and
 Windows "creation time" in the same slot -- the {{{st_ctime}}} slot of the
 {{{stat}}} module.  Here is my [http://bugs.python.org/issue5720 bug
 report] saying that the Python stdlib is wrong to do this, and that any
 Python code which uses the Python stdlib is wrong unless it immediately
 disambiguates.

 In particular, it is a bug for any Tahoe-LAFS code to read the
 {{{st_ctime}}} member without immediately switching on whether the current
 platform is Windows or not.  If you read the {{{st_ctime}}} member and do
 not use the current platform to disambiguate, then you have a value whose
 semantics are uninterpretable without guessing what platform that value
 was generated on.

 In particular, for "tahoe backup" purposes, it is probably a mistake to
 say that a new {{{ctime}}} means that the file needs to be uploaded again.
 Unix and Windows both guarantee that the {{{mtime}}} will be changed if
 the file contents have changed, and therefore if {{{mtime}}} is unchanged
 then the file contents are unchanged, even if the {{{ctime}}} has changed.
 On the other hand the {{{ctime}}} changes on Unix even when the file
 contents have not changed, such as if ownership or permission bits have
 changed.  So if only the {{{ctime}}} has changed then "tahoe backup" might
 want to set the new {{{ctime}}} value on the link leading to that file,
 but it should not reupload the file contents.

 In addition, I think "tahoe backup" should disambiguate between "unix
 change time" and "creation time" in the metadata that it stores.  Why not
 change the name of the metadata stored in the tahoe-lafs filesystem edge
 from the ambiguous and widely misunderstood "ctime" to something like
 "unix change time", and then if you are on non-Windows you can set that
 from the local filesystem's {{{ctime}}} on upload and set the local
 filesystem's {{{ctime}}} from that on download.  On the other hand if you
 are on Windows then it is a bug to set the "unix change time" from the
 local filesystem's {{{ctime}}}, although it would be correct to set a
 different metadata entry named {{{file creation time}}} from the local
 filesystem's {{{ctime}}}.

 See also #628, which is about the same issue in "tahoe cp", includes a
 taxonomy of filesystem "ctime" semantics, and includes a satisfactory
 backward-compatible solution that was shipped in Tahoe-LAFS v1.4.1.

 I'm tagging this ticket with "forward-compatibility" because we'll
 eventually have to clarify these semantics and the longer we ship a tool
 that uploads ambiguous data the harder it will be to fix.

--

Comment (by warner):

 Zooko reminded me of this ticket in IRC today, so I re-read
 everything. I think we have the following tasks to finish for this
 ticket:

 * achieve consensus upon the inclusion of {{{st_platform}}} in the
   {{{tahoe-backup}}} metadata
 * achieve consensus upon the spelling of {{{posix-change-time}}} in
   the {{{tahoe-backup}}} metadata
 * change {{{tahoe-backup}}} to record the keys described in
   comment:4 (modification-time, windows-creation-time or
   posix-change-time)
 * argue and achieve consensus on the when-to-re-upload question

 and then a separate ticket can be created to build some sort of
 restore command (maybe an option for {{{tahoe cp}}}, maybe a
 separate {{{tahoe restore}}} that reads this metadata and applies
 it to the resulting files.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/897#comment:20>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list