#607 closed defect (fixed)

DIR2:IMM

Reported by: zooko Owned by: warner
Priority: major Milestone: 1.6.0
Component: code-dirnodes Version: 1.2.0
Keywords: newcaps news-done Cc:
Launchpad Bug:

Description (last modified by warner)

Directories are currently stored in SSK files. They were designed so that directories could be easily stored in different types of files, so it shouldn't be hard for someone to implement DIR2:CHK files. These would have nice properties, especially for backup applications:

  • It would be nice to know that your old backed-up directory is immutable.
  • It would allow convergence of backed-up directories (note that front-end backup tools such as Brian's backupdb and Shawn's backup tool might figure this out and converge old directories for you, but doing it by converging CHK's on upload might work better for some cases, including the case that you aren't using backupdb or Shawn's tool).
  • It would be faster to create (currently SSKs require an RSA key-pair generation on creation, which is expensive).

Change History (16)

comment:1 Changed at 2009-02-07T00:47:51Z by warner

  • Description modified (diff)

(tweak formatting.. itemized lists in trac's markup language require a leading space)

comment:2 Changed at 2009-08-28T03:16:51Z by warner

Another good feature of an immutable-file based directory is that it could be repaired, unlike our current RSA-based (write-enabler-based) mutable files, when referenced through a readcap (#625), like the ones created by "tahoe backup".

I'd like to implement this, and change "tahoe backup" to use it. The basic steps I anticipate are:

  • implement create_dirnode(mutable=True, initial_children={})
  • replace the existing create_empty_dirnode() with that
  • refactor DirectoryNode to separate out the underlying filenode better. The idea would be to nail down the interface that dirnodes need from the filenode that they've wrapped. The read side just needs read(). The write side needs the normal mutable-filenode operations, like modify(). We should have an immutable filenode which offers the same read-side interface as the mutable filenode does.
  • change the "NodeMaker" code to create dirnodes by first creating a filenode and then passing it as the constructor to Dirnode(). It may useful to first change the way that uploads are done, and create a special kind of immutable filenode for upload purposes. This "gestating" node would have an interface to add data, would perform the upload while data is added, and would then have a finalize() method, which would finish the upload process, compute the filecap, and return the real !IFilesystemNode which can be used for reading. Making this special node have the same interface as a mutable filenode's initial-upload methods would let Dirnode be oblivious to the type of filenode it's been given.

I'm planning to require that the contents of an immutable directory are also immutable (LIT, CHK, and DIR2:CHK, not regular mutable DIR2), so that these objects are always deep-readonly. (there may be an argument to provide shallow-readonly directories, but I think deep-readonly is more generally useful).

I'm pondering if there's a way to support multi-level trees in the future without drastic changes, so that this one-level immutable directory could turn into a full "virtual CD" (#204), with better performance (by bundling a whole tree of directories into a single distributed object). This would suggest making the name table accept tuples of names instead of just a single one.

I've also wondered if we should implement some faster lookup scheme for these immutable dirnodes, especially because we don't need to update it later. Maybe djb's "cdb" (constant-time database). I'm not sure that a database which has been optimized for minimal disk seeks will necessarily help us here, since the segment size is drastically larger than what a hard disk offers, and the network roundtrip latency is frequently an order of magnitude larger too. But certainly we can come up with something that's easier to pack and unpack than the DIR2 format.

Also, we can discard several things from the DIR2 format: we don't need child writecaps (just the readcaps), and we obviously don't need the obsolete salt. We probably still want the metadata dictionary, although that would potentially interfere with the grid-side convergence that Zooko mentioned.

Changing the table format would remove some of the benefits (and thus motivation) to the other refactoring changes described above: if we've got a separate class for immutable-dirnodes, then there's not much point in contorting mutable and immutable filenodes to present the same interface. But, it would probably be cleaner overall if there were just one dirnode class, whose mutability is determined solely by asking the underlying filenode about its own mutability. In this case, all the mutating methods will still exist on the immutable dirnodes, but they'd throw an exception if you actually try to call them in that situation, just as they do now.

comment:3 Changed at 2009-09-01T03:34:39Z by warner

Zooko and I had a chat, and agreed to leave the encoding format the same. So "DIR2:" and "DIR2-CHK" (or -IMM or something) will have the same format, just in different containers. We can put off a format change until DIR3.

We're not sure about the "prototype immutable filenode" refactoring (the one that would make dirnodes call the same write() method for both mutable and immutable filenodes). It might be better off deferred.

One way to make the download/read side more uniform would be to introduce "FileVersion" objects. I might have described these in some other ticket, but the idea would be to move the read/write methods out of MutableFileNode and onto this FileVersion object which represents a single specific version of the mutable slot. FileVersion.replace would encapsulate the servermap argument, performing the replacement only if the mutable file looked like it hadn't changed since the version was fetched. MutableFileNode.get_best_version() would return one of these version objects. ImmutableFileNode.get_best_version() would return self. Then we'd make sure the read() interface was the same for both. (this would dovetail nicely with the future LDMF files, which will offer multiple versions: once you've grabbed the one that you care about, use read() on it).

This would take a moderate amount of work, but would allow us to use the same dirnode code for both types: the dirnode read code would just do self._filenode.get_best_version().read().

comment:4 Changed at 2009-10-21T05:15:22Z by zooko

  • Summary changed from DIR2:CHK to DIR2:IMM

comment:5 Changed at 2009-10-21T05:29:51Z by zooko

I posted a couple of notes about this to http://allmydata.org/pipermail/tahoe-dev/2009-October/003027.html and hereby copy them into this comment:

When you create a DIR2:IMM, giving it a set of (childname, childcap) tuples, it should raise an exception if any childcap is not immutable. The immutable childcaps are "CHK" (perhaps renamed to "IMM"), LIT, and DIR2:CHK (or "DIR2:IMM").

When you unpack a DIR2:IMM, if you find any non-immutable children in there (i.e. because someone else's Tahoe-LAFS gateway is altered or buggy so that it did not raise the exception described above), then you treat that child as non-existent and log a warning.

There could optionally be a command to deep-walk a directory graph and produce an immutable snapshot of everything. This could be an expensive operation depending on how deep the graph is, but large files are typically already immutable, so snapshotting them is free. Anyway, if you want to put something into an immutable directory and you get rejected because the thing isn't immutable, then this command would be useful.

comment:6 Changed at 2009-10-21T05:30:00Z by zooko

  • Owner set to warner

comment:7 Changed at 2009-10-28T04:12:42Z by davidsarah

  • Keywords newcaps added

Tagging issues relevant to new cap protocol design.

comment:8 Changed at 2009-10-30T23:58:58Z by warner

I'm about 80% done with immutable directories. The current work is to add URI:DIR2-CHK: and URI:DIR2-LIT: to the set recognized by uri.py. (I'm planning to use CHK because the rest of the arguments are exactly the same as URI:CHK:/URI:LIT:). An ideal cap format would make the wrapping more explicit, like tahoe://grid-4/dir/imm/READCAP and tahoe://grid-4/imm/READCAP.

The next few steps are:

  • modify nodemaker.py to recognize the new caps and create immutable Filenodes for them and then wrap them in Directorynodes (this handles the read side)
  • add nodemaker.create_immutable_directory(children) to pack the children, perform an immutable upload, then transform the filecap into a dircap. (this handles the write side)
  • tests for those
  • new webapi (probably POST /uri?t=mkdir-immutable) that takes a JSON dict in the children= form portion: docs, tests, then implementation
  • done!

Along the way, I plan to change "tahoe backup" to use t=mkdir-with-children (which will speed things up a lot, but still create readcaps-to-mutable-directories). Then, once this ticket is closed, I'll change it again to use t=mkdir-immutable.

Incidentally, yeah, I think that a form of "cp -r" that creates an immutable deep copy of some dirnode would be a great idea. Maybe "cp -r --immutable" ? Likewise, it might be useful to have "cp -r --mutable", which explicitly creates mutable copies of everything being copied (at least of the dirnodes). The default behavior of "cp -r" should be to re-use immutable objects.

comment:9 Changed at 2009-11-12T00:32:19Z by warner

I've got the write and read sides done (in 5fe713fc52dc331b). I had to move create_immutable_directory to Client instead of NodeMaker (because it needs the client's convergence secret).. that may change later as I figure out how to best clean this stuff up. Tests are written too, but I won't be satisfied with them until I've resurrected the figleaf code (which was surgically removed to make the Ubuntu entry easier) and can figure out what's being missed.

Next up: webapi, and changing "tahoe backup" to use the new dirnodes.

comment:10 Changed at 2009-11-12T00:32:54Z by warner

  • Milestone changed from undecided to 1.6.0

looks like this will be the major (er, only) new feature in 1.6

comment:11 Changed at 2009-11-12T00:39:50Z by warner

changing "tahoe backup" to use this has been split out to #828, so the only remaining work on this ticket is to expose immutable directories via the webapi.

comment:12 Changed at 2009-11-12T06:02:10Z by zooko

Hopefully also #778 will be a new feature in 1.6.

comment:13 Changed at 2009-11-13T03:10:43Z by zooko

see also #830 (review Brian's patches for #607). I guess it is really the same as this ticket, but currently this ticket is assigned to Brian and that one is assigned to me.

comment:14 Changed at 2009-11-18T03:03:39Z by warner

I had the patch all ready to go, docs and tests and implementation, and then I had an epiphany: the JSON dictionary of child names+caps should be delivered as the *body* of the POST webapi request, rather than as the "children=" field of a multipart/form-data -type MIME body. This is easier for client-side implementors, and using form encoding feels inappropriate because we aren't using an HTML form to create the request anyways. Even HTML-embedded javascript will be using XMLHTTPRequest and a JSON encoder for the body rather than creating an HTML form and pressing the "submit" button programmatically.

So I'm going to spend an extra day rewriting the patch with this API.

comment:15 Changed at 2009-11-18T07:35:24Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

Done, in f85690697a21e669. Although I forgot to add the "if you find a mutable child in an immutable dirnode, complain and ignore it" part: I've just opened #833 for that.

comment:16 Changed at 2010-02-02T05:58:18Z by davidsarah

  • Keywords news-done added
Note: See TracTickets for help on using tickets.