[tahoe-dev] [tahoe-lafs] #795: append-only files
tahoe-lafs
trac at allmydata.org
Tue Sep 1 16:56:10 PDT 2009
#795: append-only files
--------------------------+-------------------------------------------------
Reporter: warner | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: undecided
Component: code-mutable | Version: 1.5.0
Keywords: | Launchpad_bug:
--------------------------+-------------------------------------------------
Comment(by davidsarah):
Replying to [comment:2 warner]:
> Every once in a while, Wally the writecap holder might merge all
> the append messages down into the "base object". Until this
> happened, a colluding set of servers could discard arbitrary
> append messages (meaning the file would not be monotonically
> increasing). There might be ways to have each message include the
> hash of the previous ciphertext to detect this sort of thing
> (forcing the server to discard all-or-nothing, restoring
> monotonicity, but introducing race conditions).
... or if we embrace the add-only collection abstraction, a hash
of some subset of the existing entries.
> Maybe "add-only collection" would be a better model for this,
> instead of implying that the object contains a linear sequence
> of bytes. In fact, we might say that the objects stored in this
> collection are arbitrary strings, with IDs based upon hashing
> their contents, and make the append(X) operation be idempotent,
> and give readers an unsorted set of these strings. This would
> make things like the add-only directory easier to model.
This abstraction makes a lot of sense to me. I had been thinking
of add-only directories as being implemented in terms of
append-only files, but actually an add-only capability for a
set of byte strings is directly applicable to all of the use
cases I can think of.
For example,
- to implement the immutable backup scenario, write each backup
as a set of new files (ideally using immutable directories, although
you could make do without them). Since they are not attached to an
existing directory structure, this does not require any authority
other than the necessary storage quota. Then, use an add-only set
to record the root caps for each backup. If the backups are
incremental, make them dependent on (by including a hash of)
previous backups. The read key for the set is kept off-line, or
generated from a passphrase.
- to implement a tamper-resistant "log file", use an add-only set to
represent the set of timestamped log entries (perhaps entries at
about the same time could be batched for efficiency).
- an add-only cap can represent the authority to submit a transaction,
with eventual consistency. This can represent any change to
application-level data structures, not just an addition. Multiple
holders of the add-only cap can submit transactions without being
able to interfere with each other.
The set abstraction also has the advantage of not appearing to give
consistency properties that are actually unimplementable. When you
read the set, you get some subset of its entries. Additions to the
set are only eventually seen by readers. You can make an addition
dependent on some of the existing entries, in which case anyone who
sees that entry is guaranteed to also see its dependent entries.
This sounds like a robust abstraction that you could imagine
building useful higher-level distributed systems on top of.
Can anyone think of important use cases where the add-only set
semantics are not sufficient?
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/795#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list