#183 closed defect (fixed)

file upload timestamps

Reported by: zooko Owned by: somebody
Priority: major Milestone: eventually
Component: code Version: 0.6.1
Keywords: mutability metadata filesystem Cc:
Launchpad Bug:

Description

Dmitri is integrating Tahoe with Facebook and one of the requirements given him is to sort files by date of upload.

It would be nice for him if such a timestamp were included in the file metadata, which currently looks like this:

[
 "filenode", 
 {
  "ro_uri": "URI:CHK:fjcyrbtrpgzrx75tpj4gn3smze:x4jmb1cxn7ekgnq1nc5qap1kywfmryug4oy7rm8fspf6rx3f6pyo:3:10:524288000", 
  "size": 524288000
 }
]

However, there are a host of issues with this, including that the same file contents are uploaded by different people who aren't (and shouldn't be) aware of the timestamps of when other people uploaded it, that the CHK-URI is deterministically derived from the contents of the file (for Single Instance Storage), which won't work if a timestamp is included in the contents that determine the CHK-URI, and that we don't assume clock synchronization among separate computers.

Another question is if there should be a mechanism for general, user-determined metadata associated with a file. Here is an example mechanism, just to get us thinking:

Suppose you have the current api as documented in webapi.txt and nothing else. Now suppose you create a directory, and in it you make a subdirectory to hold metadata about your file. You could name the subdirectory after the file's location index if you want. In the metadata subdirectory you could store upload timestamps as well as all other sorts of information.

Change History (5)

comment:1 Changed at 2007-10-16T23:14:30Z by warner

I was thinking of putting this metadata in the graph edge rather than in a separate node.

I'm certainly in favor of keeping it out of the CHK file itself. CHK URIs are a shortcut for an immutable sequence of bytes, nothing more.

Putting it in the edge would mean that the contents must be defined by the tahoe storage protocol (specifically we'd have to add code each time we wanted to add a new piece of metadata), and we need an API to get and set this data. But, maybe we should just make it a full dict, and allow apps to use whatever metadata they want.

The sorts of metadata that I have in mind are:

  • ctime
  • mtime (which, for CHK, always equals ctime)
  • size (recorded as a convenience.. filesystem-walkers need to see it, but I'd like to remove it from the URI and put it in the UEB instead)
  • content_hash (for backup programs to determine if the file has changed, so they can decide whether to upload it or not)
  • tags (just a list of strings) (although to be efficient we'd need some other place to index these)

And in the future, I'm thinking that some amount of rsync hash data could go in there too, to make rsync-to-a-mutable-file faster.

Seems to me the choice is between 1) having an explicit place for metadata, with GET/PUT t=metadata&metadata=ctime -style APIs, and 2) having a convention (like .metadata/FILENAME) for metadata, which would require and explicit merge and upload each time the metadata changed. Both involve a number of additional round-trips to modify a file, although I think #2 is slightly more expensive. #1 makes the metadata visible to Tahoe, which would allow us to set some reasonable defaults:

  • ctime/mtime could be set automatically when you add the edge
  • size too

So I guess I'm leaning towards #1.

I suppose an API for this would look like:

GET /$URI?t=metadata&metadata=ctime}}} -> str(metadata['ctime'])
PUT /$URI?t=metadata&metadata=ctime <- str(time.time())
DELETE /$URI?t=metadata&metadata=ctime
GET /$URI?t=metadata -> JSON(metadata.items())

How's that sound?

comment:2 Changed at 2007-10-19T23:08:53Z by zooko

We're working on this stuff right now, but I don't know if it will be in v0.6.2.

comment:3 Changed at 2008-02-12T04:09:06Z by warner

I've added ctime and mtime metadata, so this ticket should be revisited and possibly closed.

comment:4 Changed at 2008-02-14T00:04:03Z by warner

  • Component changed from unknown to code
  • Owner changed from nobody to somebody

comment:5 Changed at 2009-07-28T16:19:34Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

This was fixed forever ago, and then the fix was improved a couple of times since then.

Note: See TracTickets for help on using tickets.