[tahoe-dev] [tahoe-lafs] #897: "tahoe backup" thinks "ctime" means "creation time"
tahoe-lafs
trac at allmydata.org
Tue Jan 12 23:47:07 PST 2010
#897: "tahoe backup" thinks "ctime" means "creation time"
-----------------------------------------------------+----------------------
Reporter: zooko | Owner: warner
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: unknown | Version: 1.5.0
Keywords: forward-compatibility docs tahoe-backup | Launchpad_bug:
-----------------------------------------------------+----------------------
Comment(by warner):
Ok, Zooko and I had a long discussion about this in IRC. There's a bit of
tension between three goals:
1. preserving information, even if it is confusing or badly labeled, so
that
future developers can figure out where the timestamps came from
2. not confusing busy developers by perpetuating ambiguous labels like
"ctime"
3. hiding irrelevant platform details, making life easier for developers
Goal 1 is about not trying to be too clever. The original problem here is
that Python tries to be too clever and reports a windows os.stat field
(named
{{{ftCreationTime}}} in the underlying API) as {{{st_ctime}}}, the same
way
that POSIX's st_ctime is reported. This decision was probably based on
mistakenly believing that they have the same semantics, and a desire to
hide
irrelevant platform details from developers who shouldn't have to care.
However, if they hadn't done that (i.e. report {{{st_creationtime}}} on
windows and {{{st_ctime}}} on unix), then we'd have less-convenient but
less-ambiguous os.stat results.
Systems which try to hide details from developers can cause frustration,
especially if the developers understand the quirks and foibles of the
underlying system, because then the "helpful" intermediate layers are
really
just getting in the way.
To implement goal 1, we would copy all of the {{{os.stat()}}} fields into
the
metadata as-is, and probably include an extra field (perhaps labeled
{{{st_platform}}}) as a hint to cyber-historians who know better than we
do
what os.stat returns on various platforms, and how to interpret it.
Goal 2 would be accomplished by never using the word "ctime" in our
metadata,
even though it's used in two other places ({{{os.stat}}} return value, and
POSIX's stat(2) call). Evidence suggests that the majority of developers
believe the wrong thing about what POSIX's ctime means (and I've certainly
been in this camp). So giving them a word other than "ctime" will either
be
more meaningful (e.g. if we called it posix-metadata-change-time) or will
force them to look up our actual definition (e.g. if we called it
tahoe-bagel-kumquat and dared them to search webapi.txt for details).
Goal 3 would be accomplished by using a common, easy-to-understand word
like
"changetime" or "creationtime" for all platforms, despite whatever name is
used by the underlying system call. POSIX and windows return "mtime"
values
with (as far as I've been told) the same semantics. So it's probably fair
to
say that the fact that (A: POSIX stat() returns it in st_mtime, while B:
windows returns it in ftModificationTime or something) is an "irrelevant
platform detail", and that developers lives are easier if this distinction
is
hidden from them.
So, as a compromise between these goals, we settled on the following keys:
* unix: (st_platform, st_dev, st_mode, st_ino.., modification-time,
posix-change-time)
* windows: (st_platform, st_dev, st_mode, st_ino.., modification-time,
windows-creation-time)
The synthetic "st_platform" key will contain {{{sys.platform}}}, so
something
like "linux2" or "darwin" or "windows". The hope is that this is a cheap
way
to provide some useful information to future developers and cyber-
historians
to interpret the rest of the st_* fields in some meaningful way.
st_dev, st_mode, etc, will be copied directly from the os.stat call. Other
attributes (perhaps platform-specific fields like OS-X's st_creator and
st_type) will be copied here too.
{{{modification-time}}} will be copied from st_mtime on all platforms,
based
on the conclusion that it represents the same concept on all platforms:
the
most recent time that the file's contents have been modified.
{{{posix-change-time}}} will be present for files that came from a POSIX
filesystem, and will be copied from st_ctime.
{{{windows-creation-time}}} will be present for files that came from a
windows filesystem, and will be copied from st_ctime.
Having longer and more-detailed names for the ctime values will help with
goal 2 (help developers correctly interpret this field). Not calling them
"ctime" will help developers who would otherwise misinterpret
{{{posix-change-time}}} as if it were the mythical "posix-creation-time"
that
everyone really wants. We cannot provide goal 3 here, because there is no
common semantic between POSIX and windows.
(note for future discussion: some POSIX-ish filesystems do provide
creation-time, in the form of OS-X's st_birthtime, and supposedly
something
that ZFS offers. If we can determine that the semantics of these are the
same, it could be argued that windows-creation-time should be renamed
{{{creation-time}}}, and only populated on platforms that offer it, which
would be st_birthtime from HFS+/OS-X, st_ctime on windows, and something
else
on ZFS)
(and note that, if we *cannot* determine that the semantics are the same,
then we should probably refrain from trying to coerce them into the same
field, lest we make the same mistake that Python's os.stat did, making
life
more difficult for somebody in the future who is trying to figure out
whether
a given file's so-called "creation-time" was really the ZFS notion, or the
HFS+ notion, or whatever).
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/897#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list