[tahoe-dev] [tahoe-lafs] #607: DIR2:CHK

tahoe-lafs trac at allmydata.org
Thu Aug 27 20:16:51 PDT 2009


#607: DIR2:CHK
---------------------------+------------------------------------------------
 Reporter:  zooko          |           Owner:           
     Type:  defect         |          Status:  new      
 Priority:  major          |       Milestone:  undecided
Component:  code-dirnodes  |         Version:  1.2.0    
 Keywords:                 |   Launchpad_bug:           
---------------------------+------------------------------------------------

Comment(by warner):

 Another good feature of an immutable-file based directory is that it could
 be
 repaired, unlike our current RSA-based (write-enabler-based) mutable
 files,
 when referenced through a readcap (#625), like the ones created by "tahoe
 backup".

 I'd like to implement this, and change "tahoe backup" to use it. The basic
 steps I anticipate are:

  * implement {{{create_dirnode(mutable=True, initial_children={})}}}
  * replace the existing {{{create_empty_dirnode()}}} with that
  * refactor {{{DirectoryNode}}} to separate out the underlying filenode
    better. The idea would be to nail down the interface that dirnodes need
    from the filenode that they've wrapped. The read side just needs
 read().
    The write side needs the normal mutable-filenode operations, like
    modify(). We should have an immutable filenode which offers the same
    read-side interface as the mutable filenode does.
  * change the "!NodeMaker" code to create dirnodes by first creating a
    filenode and then passing it as the constructor to {{{Dirnode()}}}. It
 may
    useful to first change the way that uploads are done, and create a
 special
    kind of immutable filenode for upload purposes. This "gestating" node
    would have an interface to add data, would perform the upload while
 data
    is added, and would then have a finalize() method, which would finish
 the
    upload process, compute the filecap, and return the real
 !IFilesystemNode
    which can be used for reading. Making this special node have the same
    interface as a mutable filenode's initial-upload methods would let
 Dirnode
    be oblivious to the type of filenode it's been given.

 I'm planning to require that the contents of an immutable directory are
 also
 immutable (LIT, CHK, and DIR2:CHK, not regular mutable DIR2), so that
 these
 objects are always deep-readonly. (there may be an argument to provide
 shallow-readonly directories, but I think deep-readonly is more generally
 useful).

 I'm pondering if there's a way to support multi-level trees in the future
 without drastic changes, so that this one-level immutable directory could
 turn into a full "virtual CD" (#204), with better performance (by bundling
 a
 whole tree of directories into a single distributed object). This would
 suggest making the name table accept tuples of names instead of just a
 single
 one.

 I've also wondered if we should implement some faster lookup scheme for
 these
 immutable dirnodes, especially because we don't need to update it later.
 Maybe djb's "cdb" (constant-time database). I'm not sure that a database
 which has been optimized for minimal disk seeks will necessarily help us
 here, since the segment size is drastically larger than what a hard disk
 offers, and the network roundtrip latency is frequently an order of
 magnitude
 larger too. But certainly we can come up with something that's easier to
 pack
 and unpack than the DIR2 format.

 Also, we can discard several things from the DIR2 format: we don't need
 child
 writecaps (just the readcaps), and we obviously don't need the obsolete
 salt.
 We probably still want the metadata dictionary, although that would
 potentially interfere with the grid-side convergence that Zooko mentioned.

 Changing the table format would remove some of the benefits (and thus
 motivation) to the other refactoring changes described above: if we've got
 a
 separate class for immutable-dirnodes, then there's not much point in
 contorting mutable and immutable filenodes to present the same interface.
 But, it would probably be cleaner overall if there were just one dirnode
 class, whose mutability is determined solely by asking the underlying
 filenode about its own mutability. In this case, all the mutating methods
 will still exist on the immutable dirnodes, but they'd throw an exception
 if
 you actually try to call them in that situation, just as they do now.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/607#comment:2>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list