Opened at 2017-06-22T05:37:54Z
Closed at 2020-06-30T13:42:16Z
#2882 closed defect (somebody else's problem)
Magic-folder uploads/downloads do not preserve filesystem metadata
Reported by: | cypher | Owned by: | daira |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-frontend-magic-folder | Version: | 1.12.1 |
Keywords: | magic-folder metadata | Cc: | |
Launchpad Bug: |
Description (last modified by cypher)
Unlike the tahoe backup command, magic-folder uploads/downloads do not preserve filesystem metadata (such as mtime, file attributes, etc.); when Bob's shared magic-folder downloads a file uploaded by Alice, that file's mtime will be set to the time at which the download completed rather than the (arguably more desirable) times at which that file was originally modified/created on Alice's computer.
Preserving this metadata would be desirable not only for traditional archival purposes (since, e.g., the date that users may have taken certain pictures or written certain documents is often important) but also for the purposes of aiding users to resolve conflicts produced by magic-folder itself in a collaborative scenario (since, e.g., with the current implementation, Bob has no straightforward way of knowing whether the conflicting file produced on his computer by Alice was the one she edited last week or this morning).
(Also, maybe this should be an enhancement instead of defect?)
Change History (11)
comment:1 Changed at 2017-06-22T05:38:19Z by cypher
- Description modified (diff)
comment:2 Changed at 2017-06-22T14:23:26Z by cypher
- Keywords metadata added
comment:3 Changed at 2017-09-19T19:40:19Z by cypher
- Priority changed from normal to major
comment:4 Changed at 2018-01-05T00:27:08Z by meejah
- Description modified (diff)
comment:5 Changed at 2018-01-05T19:08:30Z by cypher
I'm not convinced (trying to) preserve UID or GID is a good idea; these IDs will often vary between computers (even if the very same human set them up). Even if the same human did set them up, they'd probably want the symbolic names preserved/set (i.e. "alice" or "a_group" rather than the 1000 or whatever)
Yes, you're completely right. My bad; I'm not sure why I even included "uid/gid" here originally.. I'll remove that from the ticket..
A different reason people might look at the timestamps could be to answer the question "when did magic-folder update this file" rather than "when did Alice update this file"; do we have any real-user feedback indicating which one is least surprising?
Not from our sessions, no, however this is a great question that certainly warrants further exploration and user-testing..
For what it's worth, other file-sync applications that I've used/tried in the past (Dropbox, Seafile, Syncthing) have indeed preserved/propagated the mtime of the original file (i.e., setting mtime to "when Alice updated the file" instead of "when $APPLICATION updated the file") such that when I originally tried magic-folder I was surprised and disappointed to learn that it did not work the same way (and could not use it because I depended heavily on mtime to provide additional time/date context for academic research notes, in-progress papers, student assignments, etc.). Because of this, for most of my use-cases, I'd rather use rsync -a to synchronize directories across machines than magic-folder. Obviously my own needs don't necessarily reflect those of others but I'd be very curious/interested to hear additional perspectives on what the preferred/least-surprising usage should be..
It is worth probably pointing out, however, that there's a major difference in expectations between the single-user case (i.e., Bob syncing a folder from his desktop to his laptop) vs. the multi-user case (i.e., Bob syncing a folder from his laptop to Alice's). I'd argue that, in the single-user case, knowing when *magic-folder* updated my files is largely irrelevant: I already know that I am the only person editing the files and so I would much rather have the metadata on my machines reflect the record of *my* own actions -- not the actions of magic-folder (which I, as an end-user, don't really care about); the "Documents" folder on my laptop should magically look and work just like the "Documents" folder on my desktop (and this includes being able to be sorted in the same way). If, for some reason, I ever really need to know when magic-folder updated it, I can/should be able get that information from magic-folder/tahoe itself.
Admittedly, the multi-user scenario is more complicated, but I'm not still not convinced that setting the mtime for a given file to when *magic-folder* updated it is really what I (as a non-technical end-user) would want. When collaborating with others, what I really want to know (in a broader knowledge sense) is *that* somebody who is not me made some change to a shared folder/file (and, ideally, *who* that person is, if that information can be provided). mtimes alone, obviously, do not provide this sort of information; an ls -l by itself can't tell me whether it was *magic-folder* that made a change or some other application on my computer, or whether it was me who made the change or some other person (which are all, arguably, important pieces of information in multi-user/shared situations). That said, in the multi-user situation, what I probably really want is an immediate notification telling me that a file was updated (e.g., "Alice updated 'Proposal.doc'") wand/or the ability to see a log/listing of such events should I need it later (e.g., via the CLI, WUI, or something like Gridsync). Dropbox and Seafile both work roughly in this way (though they both have stronger, "account"-centric notions of identity in comparison to magic-folder).
All that being said, I guess my overall point here is that if the end-user's goal is to know when $APPLICATION updates a file, they should get that information from $APPLICATION itself; we should not rely on filesystem timestamps to provide a record of this information since doing so is neither reliable nor convenient for this purpose (and, as suggested above, in some cases, altering mtimes in such a way or having them mean something different in comparison to other applications can become confusing or inconvenient).
comment:6 Changed at 2018-01-05T19:09:17Z by cypher
- Description modified (diff)
comment:7 Changed at 2018-01-08T21:22:23Z by meejah
From the "earth dragons" part of the magic-folder spec, it *seems* we should be fine to set a modified-time "at least as old as (now - T) seconds" (but must be careful not to blindly write timestamp, as Alice's computer could be in the future).
However, Tahoe's notion of the "creation time" is actually "when it got uploaded into Tahoe" (which of course makes some sense) so there's not currently any metadata about "what time the client computer thought the modified time was when the file was uploaded"...
comment:8 Changed at 2018-01-09T08:57:20Z by meejah
comment:9 Changed at 2018-01-11T23:10:18Z by meejah
Needs a review of the PR.
comment:10 Changed at 2019-01-29T15:57:52Z by cypher
- Component changed from unknown to code-frontend-magic-folder
- Owner set to daira
comment:11 Changed at 2020-06-30T13:42:16Z by exarkun
- Resolution set to somebody else's problem
- Status changed from new to closed
magic-folder has been split out into a separate project - https://github.com/leastauthority/magic-folder
I'm not convinced (trying to) preserve UID or GID is a good idea; these IDs will often vary between computers (even if the very same human set them up). Even if the same human did set them up, they'd probably want the symbolic names preserved/set (i.e. "alice" or "a_group" rather than the 1000 or whatever).
I *think* trying to preserve mtime should work, but we should double-check that magic-folder doesn't depend on these when searching for updates. *Can* we actually can mess with ctime directly..?
A different reason people might look at the timestamps could be to answer the question "when did magic-folder update this file" rather than "when did Alice update this file"; do we have any real-user feedback indicating which one is least surprising?
Consider an example: Alice adds a new file to her magic-folder that hasn't been modified for a year (but she happens to "cp" it into her magic directory). Thus, from Linux's perspective, the mtime will be "now" (while Alice might *except* it to be "a year ago" -- because she forgot to do "cp -a"). Which one does Bob want to see as "mtime"? (One of three choices: when his Tahoe client downloaded it; Alice's "now"; or "a year ago").
We can't actually choose "a year ago" in the above example, because Alice accidentally destroyed that metadata. So, presuming she didn't (by using cp -a) then we could chose that as "the" answer.