[tahoe-dev] [tahoe-lafs] #795: append-only files

Tue Sep 1 16:56:10 PDT 2009

#795: append-only files
--------------------------+-------------------------------------------------
 Reporter:  warner        |           Owner:           
     Type:  enhancement   |          Status:  new      
 Priority:  major         |       Milestone:  undecided
Component:  code-mutable  |         Version:  1.5.0    
 Keywords:                |   Launchpad_bug:           
--------------------------+-------------------------------------------------

Comment(by davidsarah):

 Replying to [comment:2 warner]:
 > Every once in a while, Wally the writecap holder might merge all
 > the append messages down into the "base object". Until this
 > happened, a colluding set of servers could discard arbitrary
 > append messages (meaning the file would not be monotonically
 > increasing). There might be ways to have each message include the
 > hash of the previous ciphertext to detect this sort of thing
 > (forcing the server to discard all-or-nothing, restoring
 > monotonicity, but introducing race conditions).

 ... or if we embrace the add-only collection abstraction, a hash
 of some subset of the existing entries.

 > Maybe "add-only collection" would be a better model for this,
 > instead of implying that the object contains a linear sequence
 > of bytes. In fact, we might say that the objects stored in this
 > collection are arbitrary strings, with IDs based upon hashing
 > their contents, and make the append(X) operation be idempotent,
 > and give readers an unsorted set of these strings. This would
 > make things like the add-only directory easier to model.

 This abstraction makes a lot of sense to me. I had been thinking
 of add-only directories as being implemented in terms of
 append-only files, but actually an add-only capability for a
 set of byte strings is directly applicable to all of the use
 cases I can think of.
 For example,

  - to implement the immutable backup scenario, write each backup
    as a set of new files (ideally using immutable directories, although
    you could make do without them). Since they are not attached to an
    existing directory structure, this does not require any authority
    other than the necessary storage quota. Then, use an add-only set
    to record the root caps for each backup. If the backups are
    incremental, make them dependent on (by including a hash of)
    previous backups. The read key for the set is kept off-line, or
    generated from a passphrase.

  - to implement a tamper-resistant "log file", use an add-only set to
    represent the set of timestamped log entries (perhaps entries at
    about the same time could be batched for efficiency).

  - an add-only cap can represent the authority to submit a transaction,
    with eventual consistency. This can represent any change to
    application-level data structures, not just an addition. Multiple
    holders of the add-only cap can submit transactions without being
    able to interfere with each other.

 The set abstraction also has the advantage of not appearing to give
 consistency properties that are actually unimplementable. When you
 read the set, you get some subset of its entries. Additions to the
 set are only eventually seen by readers. You can make an addition
 dependent on some of the existing entries, in which case anyone who
 sees that entry is guaranteed to also see its dependent entries.
 This sounds like a robust abstraction that you could imagine
 building useful higher-level distributed systems on top of.

 Can anyone think of important use cases where the add-only set
 semantics are not sufficient?

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/795#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid