[tahoe-dev] [tahoe-lafs] #897: "tahoe backup" thinks "ctime" means "creation time"
tahoe-lafs
trac at allmydata.org
Tue Jan 12 18:22:14 PST 2010
#897: "tahoe backup" thinks "ctime" means "creation time"
-----------------------------------------------------+----------------------
Reporter: zooko | Owner: warner
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: unknown | Version: 1.5.0
Keywords: forward-compatibility docs tahoe-backup | Launchpad_bug:
-----------------------------------------------------+----------------------
Comment(by warner):
Hrm.
== using ctime/mtime in backupdb ==
So, first, let's make the docs (source:docs/backupdb.txt#L84) clearer,
by replacing the reference to "creation time, and modification time"
with just "ctime/mtime". The backupdb does not care about the semantics
of these timestamps. All it cares about is having a cheap
sometimes-false-positive proxy for detecting changes to file contents.
In particular, I'm not worried about trying to avoid re-uploading in the
face of user-triggered changes to metadata that doesn't actually change
file contents. If someone does a "chown" or "chmod" or "touch" on a
bunch of files, I think they'll accept the fact that "tahoe backup" will
subsequently do more work on those files than if they had not gone and
run those commands.
So I think that comparing the (size/ctime/mtime) tuple (specifically the
{{{(stat.ST_SIZE, stat.ST_MTIME, stat.ST_CTIME)}}} tuple) will serve
this purpose, regardless of what {{{os.stat(fn)[stat.ST_CTIME]}}}
actually means. We could change the backupdb to record more
semantically-accurate fields, and fill in some but not others depending
upon which platform we were using, but since we're only comparing this
data against itself, I don't see enough value in adding that complexity.
== putting timestamp metadata into backups created by "tahoe backup" ==
As a separate issue, I guess I'm +0 on changing the metadata that "tahoe
backup" creates to have more accurate names. Thanks to the patch from
#628, "tahoe backup" is actually the only place that even reads local
filesystem metadata (i.e. {{{find src -name '*.py' |xargs grep os.stat}}}
is almost all tahoe internal files). "tahoe backup" currently
does the simplistic thing of copying {{{stat.st_ctime}}} into
{{{metadata["ctime"]}}}, etc.
I'm not sure how to value timestamps (or other metadata) in backups.
When you restore from a backup, do you expect all of the files to have
the same creation/modification timestamps as they did on the original
disk? The same permission bits? The same owner? The same inode numbers?
The same {{{atime}}}? (I'd guess a survey would show users expecting
these properties in descending order, from like 70% or users for
timestamps to 1% of users for atime).
But I think most users of a "tahoe cp" tool would expect the
newly-generated local files to have all timestamps set to the present
moment (as /bin/cp does), and for permission bits/owner to be set by the
current umask setting/login.
Other tools that I use for backup purposes (like version-control
systems) don't record this metadata, because it doesn't generally make
sense to restore it (when I do an 'svn update', I really don't want the
timestamps of the newly-modified files to wind up in the past, because
then my builds will get messed up. Likewise, changing the mode bits,
other than sometimes the execute bit, is probably a bad idea).
So this suggests that we'd need a special "tahoe restore" (or maybe an
option on "tahoe cp", like /bin/cp's --preserve) to use this extended
metadata. And then, if we had that, it would make sense for "tahoe
backup" to record more accurate information about platform-specific
timestamps, such that "tahoe cp --preserve tahoe:backups/Latest
./local-restore" could take your Unix-generated backup and copy it onto
your windows box and reset as much metadata as made sense.
Eh, I dunno.
Incidentally, part of the "timestamps are unimportant" philosophy
described above is embedded in "tahoe backup"'s design: if the local
timestamps have changed but file contents have not, we won't upload
anything new, so the backup snapshot will continue to have the same
timestamps from the original upload. This may mean that you shouldn't
put too much trust in the tahoe-side timestamp metadata anyways. We
could change this to upload more frequently, but personally I prefer the
performance wins of sharing directories between snapshots.
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/897#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list