[tahoe-dev] [tahoe-lafs] #897: "tahoe backup" thinks "ctime" means "creation time"

tahoe-lafs trac at allmydata.org
Tue Jan 12 23:47:07 PST 2010


#897: "tahoe backup" thinks "ctime" means "creation time"
-----------------------------------------------------+----------------------
 Reporter:  zooko                                    |           Owner:  warner   
     Type:  defect                                   |          Status:  new      
 Priority:  major                                    |       Milestone:  undecided
Component:  unknown                                  |         Version:  1.5.0    
 Keywords:  forward-compatibility docs tahoe-backup  |   Launchpad_bug:           
-----------------------------------------------------+----------------------

Comment(by warner):

 Ok, Zooko and I had a long discussion about this in IRC. There's a bit of
 tension between three goals:

  1. preserving information, even if it is confusing or badly labeled, so
 that
     future developers can figure out where the timestamps came from
  2. not confusing busy developers by perpetuating ambiguous labels like
     "ctime"
  3. hiding irrelevant platform details, making life easier for developers

 Goal 1 is about not trying to be too clever. The original problem here is
 that Python tries to be too clever and reports a windows os.stat field
 (named
 {{{ftCreationTime}}} in the underlying API) as {{{st_ctime}}}, the same
 way
 that POSIX's st_ctime is reported. This decision was probably based on
 mistakenly believing that they have the same semantics, and a desire to
 hide
 irrelevant platform details from developers who shouldn't have to care.
 However, if they hadn't done that (i.e. report {{{st_creationtime}}} on
 windows and {{{st_ctime}}} on unix), then we'd have less-convenient but
 less-ambiguous os.stat results.

 Systems which try to hide details from developers can cause frustration,
 especially if the developers understand the quirks and foibles of the
 underlying system, because then the "helpful" intermediate layers are
 really
 just getting in the way.

 To implement goal 1, we would copy all of the {{{os.stat()}}} fields into
 the
 metadata as-is, and probably include an extra field (perhaps labeled
 {{{st_platform}}}) as a hint to cyber-historians who know better than we
 do
 what os.stat returns on various platforms, and how to interpret it.

 Goal 2 would be accomplished by never using the word "ctime" in our
 metadata,
 even though it's used in two other places ({{{os.stat}}} return value, and
 POSIX's stat(2) call). Evidence suggests that the majority of developers
 believe the wrong thing about what POSIX's ctime means (and I've certainly
 been in this camp). So giving them a word other than "ctime" will either
 be
 more meaningful (e.g. if we called it posix-metadata-change-time) or will
 force them to look up our actual definition (e.g. if we called it
 tahoe-bagel-kumquat and dared them to search webapi.txt for details).

 Goal 3 would be accomplished by using a common, easy-to-understand word
 like
 "changetime" or "creationtime" for all platforms, despite whatever name is
 used by the underlying system call. POSIX and windows return "mtime"
 values
 with (as far as I've been told) the same semantics. So it's probably fair
 to
 say that the fact that (A: POSIX stat() returns it in st_mtime, while B:
 windows returns it in ftModificationTime or something) is an "irrelevant
 platform detail", and that developers lives are easier if this distinction
 is
 hidden from them.

 So, as a compromise between these goals, we settled on the following keys:

  * unix: (st_platform, st_dev, st_mode, st_ino.., modification-time,
    posix-change-time)
  * windows: (st_platform, st_dev, st_mode, st_ino.., modification-time,
    windows-creation-time)

 The synthetic "st_platform" key will contain {{{sys.platform}}}, so
 something
 like "linux2" or "darwin" or "windows". The hope is that this is a cheap
 way
 to provide some useful information to future developers and cyber-
 historians
 to interpret the rest of the st_* fields in some meaningful way.

 st_dev, st_mode, etc, will be copied directly from the os.stat call. Other
 attributes (perhaps platform-specific fields like OS-X's st_creator and
 st_type) will be copied here too.

 {{{modification-time}}} will be copied from st_mtime on all platforms,
 based
 on the conclusion that it represents the same concept on all platforms:
 the
 most recent time that the file's contents have been modified.

 {{{posix-change-time}}} will be present for files that came from a POSIX
 filesystem, and will be copied from st_ctime.

 {{{windows-creation-time}}} will be present for files that came from a
 windows filesystem, and will be copied from st_ctime.

 Having longer and more-detailed names for the ctime values will help with
 goal 2 (help developers correctly interpret this field). Not calling them
 "ctime" will help developers who would otherwise misinterpret
 {{{posix-change-time}}} as if it were the mythical "posix-creation-time"
 that
 everyone really wants. We cannot provide goal 3 here, because there is no
 common semantic between POSIX and windows.

 (note for future discussion: some POSIX-ish filesystems do provide
 creation-time, in the form of OS-X's st_birthtime, and supposedly
 something
 that ZFS offers. If we can determine that the semantics of these are the
 same, it could be argued that windows-creation-time should be renamed
 {{{creation-time}}}, and only populated on platforms that offer it, which
 would be st_birthtime from HFS+/OS-X, st_ctime on windows, and something
 else
 on ZFS)

 (and note that, if we *cannot* determine that the semantics are the same,
 then we should probably refrain from trying to coerce them into the same
 field, lest we make the same mistake that Python's os.stat did, making
 life
 more difficult for somebody in the future who is trying to figure out
 whether
 a given file's so-called "creation-time" was really the ZFS notion, or the
 HFS+ notion, or whatever).

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/897#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list