[tahoe-dev] Authenticated Data Structures and a Notary protocol

Andrew Miller amiller at cs.ucf.edu
Sun May 13 19:02:38 UTC 2012


Dear Tahoe-Dev,

I’m working on a decentralized ‘Notary’ protocol that generalizes
several problems, one of which is append-only files in Tahoe-LAFS
(ticket #795). I want to share some background reading and an initial
attempt based on dynamic Merkle trees.

The challenge with append-only files is to establish a sequential
ordering for several updates, each of which may originate from a
different source. There’s an inherent race condition - according to
what frame-of-reference do you determine which of A or B occurred
first? The current proposal in #795 is for the network to punt -
rather than commit to a sequence, it only allows AppendCap holders to
add their updates into an unordered set until someone with a stronger
cap comes along and authoritatively chooses an ordering.

	Definition: a Notary is an entity designated to choose a sequential
ordering over a set of events

A fully-trusted central party could do this, but that’s undesirable. A
decentralized peer-to-peer Notary would be best - this is what Bitcoin
[0] is designed to achieve. For now, just consider Notary as an
abstract role - the goal is to minimize what’s required of it. Let me
describe a model for three-party protocols involving a Notary:

	Notary -> Directory -> Verifiers

The Directory is a grid of untrusted servers storing a dataset. The
Notary signs and commits to a ‘valid’ sequence of updates to this
data, where ‘valid’ is according to application-specific rules, e.g.
“no more changes occur after a file has been petrified,” or “total BTC
value of inputs must match the total of outputs”. The Verifiers query
the Directory and receive a proof that the sequence is valid. We
expect to rely on the Notary for Quality-of-Service (to select an
ordering) but hopefully not for correctness (a valid ordering).

This is a slight generalization of the Authenticated Data Structures model
	CA -> Directory -> User     [NaorNissim98]

or equivalently
	Source -> Server -> Client      [Tamassia*]

in which a Source is trusted to perform updates to a data structure,
e.g. a set of valid identity certificates. The typical solution is for
the Source to incrementally update a balanced search tree containing
Merkle hashes. The Source signs and publishes the O(1) root hash for
each updated structure. Clients, knowing the current root hash, ask
the Server to answer queries, e.g. “is element X a member of the set?”
receiving not just the result, but also a proof (i.e., an O(log N)
path through the Merkle tree). The Notary is different from a
Source/CA in that a) trust in it is not assumed and b) its obligation
is limited to a bounded-resource computation task, e.g., looking up a
previous transaction in a database and validating a signature against
a public key. A Verifier is different from a User/Client in that it
must to validate every update from the Notary rather than trusting it.

So, lets try directly applying this dynamic Merkle-tree to append-only
files. Notarize can be a distinct capability, corresponding to the
‘squash’ function described in #795, but weaker than a WriteCap. All
updates are signed by the Notary, so it must either be
online/available or else updates pool for a while in an unordered
collection (again like #795). The Verifier processes each update by
fetching the necessary O(log N) proof data from the Directory, then
advancing its state (updating its own O(1) root hash), and finally
discarding the proof. Verify and Notary perform identical O(log N)
operations on a balanced search tree. Users who just want to fetch a
recent version must receive a root hash from a Verifier they trust.
Here’s what’s achieved:
  - Verifiers are efficient, require little state, and no secret
information - there can be many of them
  - Any invalid or inconsistent messages from the Notary to a Verifier
are detectable and provable

What can the Notary still get away with? It can:
  - disappear / suspend service / refuse all requests
  - preferentially ignore / delay requests it dislikes (or accept bribes)
  - present conflicting views to partitions of Verifiers with a
network divide between them

This is still a lot of reliance, beyond what can be called QoS. The
cool part though is that Verifiers/Notary don't need to store their
own copies of the full dataset, as I initially thought. This follows
from the incrementally-updated Merkle tree, which was described in
1998 but still surprised me to read about last month. I'm not sure if
this is intuitive or already well-known to everyone else!

Coming up later:
- a demo and toy implementation [1] of this append-only scheme
- a proof-of-throughput (based on [EetaPizzaPi]) used to measure QoS
on the Directory grid
- a decentralized Notary based on a lottery among Directory nodes, like Bitcoin


References (in order according to my personal preference):

[TamassiaPAD] Persistent Authenticated Data Structures and their Applications
http://www.cs.brown.edu/cgc/stms/papers/isc2001.pdf
My overall favorite paper of this bunch. This adds the ability to
answer queries of the form “was element X a member of the set at time
T?” Relevant to Bitcoin forks.

[EetaPizzaPi] 2011. How to Tell if Your Cloud Files Are Vulnerable to
Drive Crashes
http://www.rsa.com/rsalabs/staff/bios/kbowers/publications/RAFT.pdf

[TamassiaADS] Authenticated Data Structures. Algorithmica.
http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf
   “<davidsarah> ok, that was a nice short survey paper”

[NaorNissim98] Certificate Revocation and Certificate Update. USENIX
http://www.cs.bgu.ac.il/~kobbi/papers/revoke.ps
The first use of dynamic balanced Merkle search trees.

[TamassiaLattice] Cryptography for Efficiency: Authenticated Data
Structures Based on Lattices and Parallel Online Memory Checking
http://eprint.iacr.org/2011/102.pdf
The latest papers in Authenticated Data Structures have the best
explanations for the different protocol models. All the newest
contributions are about exploiting partially homomorphic properties of
some hash functions, especially lattice-based ones, to get either
different efficiency tradeoffs or privacy. This one is an
Authenticated bloom filter.

[#795] append-only files https://tahoe-lafs.org/trac/tahoe-lafs/ticket/795
[0] the Bitcoin whitepaper http://bitcoin.org/bitcoin.pdf
[1] my implementation of authenticated dictionary
https://github.com/amiller/redblackmerkle

-- 
Andrew Miller


More information about the tahoe-dev mailing list