Opened at 2007-10-16T22:01:28Z
Closed at 2009-07-28T16:19:34Z
#183 closed defect (fixed)
file upload timestamps
Reported by: | zooko | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | eventually |
Component: | code | Version: | 0.6.1 |
Keywords: | mutability metadata filesystem | Cc: | |
Launchpad Bug: |
Description
Dmitri is integrating Tahoe with Facebook and one of the requirements given him is to sort files by date of upload.
It would be nice for him if such a timestamp were included in the file metadata, which currently looks like this:
[ "filenode", { "ro_uri": "URI:CHK:fjcyrbtrpgzrx75tpj4gn3smze:x4jmb1cxn7ekgnq1nc5qap1kywfmryug4oy7rm8fspf6rx3f6pyo:3:10:524288000", "size": 524288000 } ]
However, there are a host of issues with this, including that the same file contents are uploaded by different people who aren't (and shouldn't be) aware of the timestamps of when other people uploaded it, that the CHK-URI is deterministically derived from the contents of the file (for Single Instance Storage), which won't work if a timestamp is included in the contents that determine the CHK-URI, and that we don't assume clock synchronization among separate computers.
Another question is if there should be a mechanism for general, user-determined metadata associated with a file. Here is an example mechanism, just to get us thinking:
Suppose you have the current api as documented in webapi.txt and nothing else. Now suppose you create a directory, and in it you make a subdirectory to hold metadata about your file. You could name the subdirectory after the file's location index if you want. In the metadata subdirectory you could store upload timestamps as well as all other sorts of information.
Change History (5)
comment:1 Changed at 2007-10-16T23:14:30Z by warner
comment:2 Changed at 2007-10-19T23:08:53Z by zooko
We're working on this stuff right now, but I don't know if it will be in v0.6.2.
comment:3 Changed at 2008-02-12T04:09:06Z by warner
I've added ctime and mtime metadata, so this ticket should be revisited and possibly closed.
comment:4 Changed at 2008-02-14T00:04:03Z by warner
- Component changed from unknown to code
- Owner changed from nobody to somebody
comment:5 Changed at 2009-07-28T16:19:34Z by zooko
- Resolution set to fixed
- Status changed from new to closed
This was fixed forever ago, and then the fix was improved a couple of times since then.
I was thinking of putting this metadata in the graph edge rather than in a separate node.
I'm certainly in favor of keeping it out of the CHK file itself. CHK URIs are a shortcut for an immutable sequence of bytes, nothing more.
Putting it in the edge would mean that the contents must be defined by the tahoe storage protocol (specifically we'd have to add code each time we wanted to add a new piece of metadata), and we need an API to get and set this data. But, maybe we should just make it a full dict, and allow apps to use whatever metadata they want.
The sorts of metadata that I have in mind are:
And in the future, I'm thinking that some amount of rsync hash data could go in there too, to make rsync-to-a-mutable-file faster.
Seems to me the choice is between 1) having an explicit place for metadata, with GET/PUT t=metadata&metadata=ctime -style APIs, and 2) having a convention (like .metadata/FILENAME) for metadata, which would require and explicit merge and upload each time the metadata changed. Both involve a number of additional round-trips to modify a file, although I think #2 is slightly more expensive. #1 makes the metadata visible to Tahoe, which would allow us to set some reasonable defaults:
So I guess I'm leaning towards #1.
I suppose an API for this would look like:
How's that sound?