[tahoe-lafs-trac-stream] [tahoe-lafs] #959: tahoe-lafs objects

Fri Jul 5 20:06:53 UTC 2013

#959: tahoe-lafs objects
-------------------------+-------------------------------------------------
     Reporter:  warner   |      Owner:  nobody
         Type:           |     Status:  new
  enhancement            |  Milestone:  2.0.0
     Priority:  major    |    Version:  1.6.0
    Component:  unknown  |   Keywords:  objects validation backward-
   Resolution:           |  compatibility forward-compatibility revocation
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by nejucomo):

 **Summary:**
 1. Nejucomo brainstorms differences between a native C-List + blob
 feature, versus
 1. "emulating" that with existing lafs directories and files, and
 1. realizes that emulation is good enough, and
 1. abandons support for any new "C-List + blob" feature, and then
 1. asks if that "emulation" is good enough for live objects.

 -then finally decides to post the whole brainstorm in case posterity finds
 anything useful in it.  ;-)

 Replying to [comment:13 zooko]:
 > Replying to [comment:12 nejucomo]:
 > >
 > > With only the first two of these bullets, the storage model becomes
 "arbitrary DAGs (Directed Acyclic Graphs)" instead of only "files or
 directories".
 >
 > I don't understand the distinction. Isn't the current LAFS files-and-
 directories structure already an arbitrary DAG? Just use incrementing
 integers as your filenames in your directories, and then that's a C-list.
 Or am I missing something?

 Yes, having an existing lafs directory to serve as a C-List and a separate
 lafs file to serve as the blob, and a third container directory to bind
 the two is already possible, at the cost of efficiency.

 I'll call that "the emulation design" below (because it is emulating
 "C-List + blob" as a built-in feature).  Let's contrast emulation with a
 built-in feature:

 **Emulation:**
 * Requires three file nodes.
 * Requires the application to encode their data structure into lafs
 directory names or other edge metadata.
   * Is there a simple, safe way to encode arbitrary bytes into an edge
 name?
   * Is it possible to attach arbitrary metadata to edges?
 * Cannot guarantee consistency between the C-List-emulation and blob-
 emulation:
   * A reader must read these separately, thus has no guarantee the two
 reads return a consistent view.
 * Fits the current "files and directories" abstraction well, so the user
 interfaces map reasonably well.
   * For example, fuse interfaces know what to do.

 By contrast:

 **Native C-List+Blob Feature:**
 * Requires a single node for any application.
 * Allows the application to generate/interpret blob "directly".
 * Ensures mutation consistency between the C-List and blob in the same
 manner as single file-node mutable consistency is currently ensured.
 * Can represent directories as a particular application.
 * Breaks the notion of "just file and directories", introducing complexity
 into the user interfaces.
   * For example, what does a fuse interface do with an arbitrary C-List +
 blob?
     * It could do the "inverse" of the emulation above: Present a
 directory with a C-List subdirectory and a blob file.  Now we have the
 same encoding problems as with emulation, except instead of the
 application which wrote the blob inventing the encoding, a completely
 unaware general application (ie: fuse interface) has to pick an encoding.

 After brainstorming those differences, to me it seems like the primary
 advantage is mutation consistency properties.

 It occurs to me that if it's already possible to encode arbitrary metadata
 in the node edges, when we already have C-List + blob where the blob is
 split up across the edge metadata.

 Oh...  I just realized:  If the emulation container directory is mutable,
 but the C-List-emulation child and the blob child are immutable, this
 solves the consistency problem, right?  A reader of the container
 directory knows that the immutable C-List cap and the immutable blob-file
 cap were written together by a single writer, and therefore their inter-
 relationships are as consistent as the writer.

 Ok, now I'm pretty much satisfied that "C-List + blob" is unnecessary, at
 least for arbitrary DAGs.

 Is this also true of the "live objects" proposal?  If we require that the
 interpretation of a "C-List + blob + object-code" structure starts with a
 cap to a directory where C-List, blob, and object-code are separate but
 *immutable* files, is this a sufficient building block for live objects?

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/959#comment:15>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage