[tahoe-dev] verification of subset of file == proof of retrievability

Wed Jun 13 06:34:45 UTC 2012

On 6/12/12 10:57 PM, Zooko Wilcox-O'Hearn wrote:
> Folks:
> 
> Over on the Bitcoin discussion forums (warning: wretched hive of scum
> and villainy), someone was asserting that they wanted a "proof of
> retrievability" protocol and saying that, while they hadn't looked,
> they were pretty sure Tahoe-LAFS didn't do it right:
> 
> https://bitcointalk.org/index.php?topic=2236.msg847771#msg847771
> 
> I was mildly annoyed by this, because actually we have some extremely
> strong features along those lines.

Well, we've never implemented a POR because we've never really wanted
one. "Proof-of-retrievability" is achieved by just retrieving the data.
What the cryptography literature calls proof-of-retention (or
-retrievability, -data-posession, or -ownership) gives you is a way to
*cheaply* (i.e. using less storage and bandwidth than the whole file)
assert that some remote server still has the data they claimed to have,
and haven't found some clever (i.e. cheaper) way to just pass the test
without really holding the whole file.

There's a number of protocols already out there. In general the verifier
can assert that either the server is holding the full original file, or
a collection of calculated verification data that is at least as large
as the original file (so if they want to pass the test, they might as
well be honest). http://www.rsa.com/rsalabs/node.asp?id=3357 is one
example.

The simplest way to do this, however, is for the verifier to hold the
whole file too. When we've discussed this in the Tahoe "one grid to rule
them all" context, we've talked about "share buddies": two
non-collaborating servers, each purportedly holding a copy of the same
share. They check up on each other by asking for keyed hashes of random
sections of the share (or the whole thing, if they're willing to spend
the disk IO on it). The idea was part of a larger server-driven-repair
thing we were thinking of: servers check up on the files they're helping
to store, and if they notice problems, they can trigger repair all by
themselves. Distributed reputation measurements are involved too:
servers can boost their reputation by delivering POR proofs on a timely
basis, and share-buddies can vouch for each other.

Anyways, it's sort of a neat idea, but we probably need some more
extensive accounting / server-reputation frameworks in place before
it'll be super useful in the Tahoe context.

cheers,
 -Brian