#947 assigned enhancement

Add file-with-metadata caps

Reported by: kpreid Owned by: davidsarah
Priority: major Milestone: 2.0.0
Component: code Version: 1.6.0
Keywords: newcaps newurls mutable immutable metadata rollback Cc: kpreid, jeremy@…
Launchpad Bug:

Description

Web architecture expects that a resource has a Content-Type. Modern filesystems have "extended attributes" per file as well as metadata such as modification time, and it is desirable to back these things up. Both of these things point to the idea that there ought to be addressable (has-a-URL) objects which designate the metadata as well as the file data (binary blob). My understanding of current Tahoe architecture is that all metadata is instead stored in the rows of the directory objects.

Additionally, metadata should be mutable iff the file is, so that it can be updated in-place without access to every directory which might contain it.

I imagine that these objects would contain a current-design file cap, rather than themselves containing the file data, so that we still get the convergent encryption space advantage even if file metadata differs among separately-created instances.

This idea raised at friam 2010-02-12.

Change History (10)

comment:1 Changed at 2010-02-15T05:54:25Z by zooko

Hm, the way I imagined implementing this at first was to have the client first fetch the associated metadata and then fetch the file. One way to envision the implementation would simply be to define a kind of directory which can only have one child link in it. Then take the cap to that directory and wrap it in a different cap type which means "fetch this directory then fetch the file it points to, applying all of the metadata that it contains".

But, we could also consider bundling some metadata along with the cap itself. For example, if the cap is being embedded into a URL, then include the metadata in the URL, along with the cap. Spelling out the content type in standard text format e.g. image/svg+xml would add significantly to the length of the URL, but perhaps we could define a custom compression scheme which could represent the most common types in only a character or two while falling back to uncompressed form for types that we haven't included in our compression definition.

comment:2 Changed at 2010-02-15T05:57:39Z by zooko

One reason that I am thinking about this is the "security-related extra metadata" that I've been ticketing about tonight: highest-known-version-number (#955), petrification-marker (#954), LAFS 301 Moved Permanently marker (not yet ticketed), etc.. It would be cool if, when I send you a URL containing a Tahoe-LAFS cap to a mutable file, I automatically include in that URL the highest version number of that file that I have ever seen, thus empowering you to reject rollback attacks which present an older file to you when you try to read it.

That one, at least, can't really be implemented in the indirection-node way (because if someone is going to rollback the file, they might also rollback the indirection-node), but would have to be in the bundled-with-the-original-URL way.

comment:3 Changed at 2010-02-15T06:06:23Z by zooko

  • Component changed from unknown to code
  • Keywords newcaps newurls mutable immutable added; inodes removed
  • Owner changed from nobody to somebody

comment:4 Changed at 2010-02-15T06:12:25Z by zooko

If you like this ticket, you might also like #956 (embed security metadata in parent directory) and #957 (embed security metadata in URL).

comment:5 Changed at 2010-02-23T03:10:46Z by zooko

  • Milestone changed from undecided to 2.0.0
  • Version changed from unknown to 1.6.0

comment:6 Changed at 2010-03-12T02:45:35Z by jsgf

  • Cc jeremy@… added

comment:7 Changed at 2010-10-06T01:37:52Z by zooko

  • Keywords rollback added

comment:8 Changed at 2011-01-16T09:21:30Z by zooko

Is this the same as #307 (maybe add node metadata? (in addition to edge metadata))?

comment:9 Changed at 2011-01-17T14:46:51Z by chrysn

as zooko correctly pointed out there, this is relevant for #1325 (make tahoe backup useable as a replacement for rsync).

personally i'd go for storing the file metadata in the directory. this does require the relevant data (mime type) to be included in the url in order to be used in connection with the file, but think about it that way: that's even true for the file name.

other reasons supporting metadata-in-directory are

  • faster access (fewer roundtrips, especially in the typical file-manager situation where a directory is listed and then all its files are stat-ed),
  • better compatibility (i guess there is a way to put additional metadata in the directory w/o breaking compatibility to older versions; doing this with intermediate nodes would be rather hard), and that
  • git does it that way too (ok, i admit, that's not really a reason).

comment:10 Changed at 2012-05-19T19:38:01Z by davidsarah

  • Owner changed from somebody to davidsarah
  • Status changed from new to assigned

[assigning to me as a reminder to explain why the Content-Type-in-direntry feature can't be implemented on its own without the Content-Type-in-URL feature]

Note: See TracTickets for help on using tickets.