[tahoe-lafs-trac-stream] [Tahoe-LAFS] #897: "tahoe backup" thinks "ctime" means "creation time"
Tahoe-LAFS
trac at tahoe-lafs.org
Fri Jun 27 00:23:43 UTC 2014
#897: "tahoe backup" thinks "ctime" means "creation time"
-------------------------+-------------------------------------------------
Reporter: zooko | Owner: warner
Type: defect | Status: new
Priority: major | Milestone: soon
Component: code- | Version: 1.6.1
frontend-cli | Keywords: forward-compatibility docs tahoe-
Resolution: | backup time
Launchpad Bug: |
-------------------------+-------------------------------------------------
Old description:
> backupdb seems to think "ctime" means "creation time", which it does, but
> only on Windows.
>
> This means ~~there is an incorrect statement in the documentation,~~ that
> "tahoe backup" is unnecessarily re-uploading files in the case that the
> ownership or permission bits have changed but the file contents haven't,
> and that "tahoe backup" is incorrectly mapping between "unix change time"
> and "file creation time" when used on Windows. So this ticket is for
> ~~three~~ two bugs, but they are all closely related and should probably
> be fixed at once.
>
> I noticed in [source:docs/backupdb.txt at 4111#L84] that the backupdb docs
> mention "creation time". POSIX doesn't provide a "creation time" but it
> does provide a "change time", abbreviated "ctime", which most people
> mistakenly think is a "creation time". Windows ''does'' provide a
> "creation time", and unfortunately Python provides unix "change time" and
> Windows "creation time" in the same slot -- the {{{st_ctime}}} slot of
> the {{{stat}}} module. Here is my [http://bugs.python.org/issue5720 bug
> report] saying that the Python stdlib is wrong to do this, and that any
> Python code which uses the Python stdlib is wrong unless it immediately
> disambiguates.
>
> In particular, it is a bug for any Tahoe-LAFS code to read the
> {{{st_ctime}}} member without immediately switching on whether the
> current platform is Windows or not. If you read the {{{st_ctime}}}
> member and do not use the current platform to disambiguate, then you have
> a value whose semantics are uninterpretable without guessing what
> platform that value was generated on.
>
> In particular, for "tahoe backup" purposes, it is probably a mistake to
> say that a new {{{ctime}}} means that the file needs to be uploaded
> again. Unix and Windows both guarantee that the {{{mtime}}} will be
> changed if the file contents have changed, and therefore if {{{mtime}}}
> is unchanged then the file contents are unchanged, even if the
> {{{ctime}}} has changed. On the other hand the {{{ctime}}} changes on
> Unix even when the file contents have not changed, such as if ownership
> or permission bits have changed. So if only the {{{ctime}}} has changed
> then "tahoe backup" might want to set the new {{{ctime}}} value on the
> link leading to that file, but it should not reupload the file contents.
>
> In addition, I think "tahoe backup" should disambiguate between "unix
> change time" and "creation time" in the metadata that it stores. Why not
> change the name of the metadata stored in the tahoe-lafs filesystem edge
> from the ambiguous and widely misunderstood "ctime" to something like
> "unix change time", and then if you are on non-Windows you can set that
> from the local filesystem's {{{ctime}}} on upload and set the local
> filesystem's {{{ctime}}} from that on download. On the other hand if you
> are on Windows then it is a bug to set the "unix change time" from the
> local filesystem's {{{ctime}}}, although it would be correct to set a
> different metadata entry named {{{file creation time}}} from the local
> filesystem's {{{ctime}}}.
>
> See also #628, which is about the same issue in "tahoe cp", includes a
> taxonomy of filesystem "ctime" semantics, and includes a satisfactory
> backward-compatible solution that was shipped in Tahoe-LAFS v1.4.1.
>
> I'm tagging this ticket with "forward-compatibility" because we'll
> eventually have to clarify these semantics and the longer we ship a tool
> that uploads ambiguous data the harder it will be to fix.
New description:
backupdb seems to think "ctime" means "creation time", which it does, but
only on Windows.
This means ~~there is an incorrect statement in the documentation,~~ that
"tahoe backup" is unnecessarily re-uploading files in the case that the
ownership or permission bits have changed but the file contents haven't,
and that "tahoe backup" is incorrectly mapping between "unix change time"
and "file creation time" when used on Windows. So this ticket is for
~~three~~ two bugs, but they are all closely related and should probably
be fixed at once.
I noticed in [source:docs/backupdb.txt at 4111#L84] that the backupdb docs
mention "creation time". POSIX doesn't provide a "creation time" but it
does provide a "change time", abbreviated "ctime", which most people
mistakenly think is a "creation time". Windows ''does'' provide a
"creation time", and unfortunately Python provides unix "change time" and
Windows "creation time" in the same slot -- the {{{st_ctime}}} slot of the
{{{stat}}} module. Here is my [http://bugs.python.org/issue5720 bug
report] saying that the Python stdlib is wrong to do this, and that any
Python code which uses the Python stdlib is wrong unless it immediately
disambiguates.
In particular, it is a bug for any Tahoe-LAFS code to read the
{{{st_ctime}}} member without immediately switching on whether the current
platform is Windows or not. If you read the {{{st_ctime}}} member and do
not use the current platform to disambiguate, then you have a value whose
semantics are uninterpretable without guessing what platform that value
was generated on.
In particular, for "tahoe backup" purposes, it is probably a mistake to
say that a new {{{ctime}}} means that the file needs to be uploaded again.
Unix and Windows both guarantee that the {{{mtime}}} will be changed if
the file contents have changed, and therefore if {{{mtime}}} is unchanged
then the file contents are unchanged, even if the {{{ctime}}} has changed.
On the other hand the {{{ctime}}} changes on Unix even when the file
contents have not changed, such as if ownership or permission bits have
changed. So if only the {{{ctime}}} has changed then "tahoe backup" might
want to set the new {{{ctime}}} value on the link leading to that file,
but it should not reupload the file contents.
In addition, I think "tahoe backup" should disambiguate between "unix
change time" and "creation time" in the metadata that it stores. Why not
change the name of the metadata stored in the tahoe-lafs filesystem edge
from the ambiguous and widely misunderstood "ctime" to something like
"unix change time", and then if you are on non-Windows you can set that
from the local filesystem's {{{ctime}}} on upload and set the local
filesystem's {{{ctime}}} from that on download. On the other hand if you
are on Windows then it is a bug to set the "unix change time" from the
local filesystem's {{{ctime}}}, although it would be correct to set a
different metadata entry named {{{file creation time}}} from the local
filesystem's {{{ctime}}}.
See also #628, which is about the same issue in "tahoe cp", includes a
taxonomy of filesystem "ctime" semantics, and includes a satisfactory
backward-compatible solution that was shipped in Tahoe-LAFS v1.4.1.
I'm tagging this ticket with "forward-compatibility" because we'll
eventually have to clarify these semantics and the longer we ship a tool
that uploads ambiguous data the harder it will be to fix.
--
Comment (by warner):
Zooko reminded me of this ticket in IRC today, so I re-read
everything. I think we have the following tasks to finish for this
ticket:
* achieve consensus upon the inclusion of {{{st_platform}}} in the
{{{tahoe-backup}}} metadata
* achieve consensus upon the spelling of {{{posix-change-time}}} in
the {{{tahoe-backup}}} metadata
* change {{{tahoe-backup}}} to record the keys described in
comment:4 (modification-time, windows-creation-time or
posix-change-time)
* argue and achieve consensus on the when-to-re-upload question
and then a separate ticket can be created to build some sort of
restore command (maybe an option for {{{tahoe cp}}}, maybe a
separate {{{tahoe restore}}} that reads this metadata and applies
it to the resulting files.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/897#comment:20>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list