[tahoe-dev] [tahoe-lafs] #897: "tahoe backup" thinks "ctime" means "creation time"
tahoe-lafs
trac at allmydata.org
Tue Jan 12 15:36:42 PST 2010
#897: "tahoe backup" thinks "ctime" means "creation time"
-----------------------------------------------------+----------------------
Reporter: zooko | Owner: nobody
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: unknown | Version: 1.5.0
Keywords: forward-compatibility docs tahoe-backup | Launchpad_bug:
-----------------------------------------------------+----------------------
backupdb seems to think "ctime" means "creation time", which it does, but
only on Windows.
This means there is an incorrect statement in the documentation, that
"tahoe backup" is unnecessarily re-uploading files in the case that the
ownership or permission bits have changed but the file contents haven't,
and that "tahoe backup" is incorrectly mapping between "unix change time"
and "file creation time" when used on Windows. So this ticket is for
three bugs, but they are all closely related and should probably be fixed
at once.
I noticed in [source:docs/backupdb.txt at 4111#L84] that the backupdb docs
mention "creation time". POSIX doesn't provide a "creation time" but it
does provide a "change time", abbreviated "ctime", which most people
mistakenly think is a "creation time". Windows ''does'' provide a
"creation time", and unfortunately Python provides unix "change time" and
Windows "creation time" in the same slot -- the {{{st_ctime}}} slot of the
{{{stat}}} module. Here is my [http://bugs.python.org/issue5720 bug
report] saying that the Python stdlib is wrong to do this, and that any
Python code which uses the Python stdlib is wrong unless it immediately
disambiguates.
In particular, it is a bug for any Tahoe-LAFS code to read the
{{{st_ctime}}} member without immediately switching on whether the current
platform is Windows or not. If you read the {{{st_ctime}}} member and do
not use the current platform to disambiguate, then you have a value whose
semantics are uninterpretable without guessing what platform that value
was generated on.
In particular, for "tahoe backup" purposes, it is probably a mistake to
say that a new {{{ctime}}} means that the file needs to be uploaded again.
Unix and Windows both guarantee that the {{{mtime}}} will be changed if
the file contents have changed, and therefore if {{{mtime}}} is unchanged
then the file contents are unchanged, even if the {{{ctime}}} has changed.
On the other hand the {{{ctime}}} changes on Unix even when the file
contents have not changed, such as if ownership or permission bits have
changed. So if only the {{{ctime}}} has changed then "tahoe backup" might
want to set the new {{{ctime}}} value on the link leading to that file,
but it should not reupload the file contents.
In addition, I think "tahoe backup" should disambiguate between "unix
change time" and "creation time" in the metadata that it stores. Why not
change the name of the metadata stored in the tahoe-lafs filesystem edge
from the ambiguous and widely misunderstood "ctime" to something like
"unix change time", and then if you are on non-Windows you can set that
from the local filesystem's {{{ctime}}} on upload and set the local
filesystem's {{{ctime}}} from that on download. On the other hand if you
are on Windows then it is a bug to set the "unix change time" from the
local filesystem's {{{ctime}}}, although it would be correct to set a
different metadata entry named {{{file creation time}}} from the local
filesystem's {{{ctime}}}.
See also #628, which is about the same issue in "tahoe cp", includes a
taxonomy of filesystem "ctime" semantics, and includes a satisfactory
backward-compatible solution that was shipped in Tahoe-LAFS v1.4.1.
I'm tagging this ticket with "forward-compatibility" because we'll
eventually have to clarify these semantics and the longer we ship a tool
that uploads ambiguous data the harder it will be to fix.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/897>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list