[tahoe-dev] verification of subset of file == proof of retrievability

Wed Jun 13 19:09:34 UTC 2012

On 6/13/12 9:57 AM, Zooko Wilcox-O'Hearn wrote:

> Yes, I know what it means. Perhaps you didn't read my blog port
> ("rant") about this topic that I linked to in my previous post?

Sorry, you're right, I hadn't read your blog post.

[/me reads blog post]

Good post! I completely agree with your points about secondary protocols
never getting implemented or tested, especially when they're less
important and more complicated than the primary. And the way you
describe access-control is spot-on: being able to delegate verification
to a third party without also giving them plaintext access is a huge
feature of Tahoe, and seems entirely neglected by the rest of the world
(probably due to the usual us-vs-them firewall/ACL/perimeter -centric
security model that tahoe and the objcap community has escaped).

But, I'm still not convinced that the tahoe download protocol really
provides an adequate substitute for a traditional PoR. While our
protocol can indeed be used to confirm that they have the whole file,
confirmation and download are equally expensive. Anyone with a readcap
can download an arbitrarily small range of data and get complete
validation on the blocks that were used to provide it, but you get no
validation on the rest of the blocks. To validate the server's entire
share requires downloading every segment.

OTOH, I'm not really sure how I'd use the features that PoR offers in
the first place. The goal is to incentivize a server (which you're
paying, somehow) to hold your data until you want it again, and you're
trying to maximize your chances of getting it back given some fixed
budget. But a positive PoR response is not a predictor of future
downloadability: even if PoR challenges are indistinguishable from real
downloads, there's no guarantee that the server will even still exist
tomorrow.

I *can* imagine PoR being used to justify continued payment of rent.
Here's an analogy: I've got a storage locker holding all my old
computers, far enough away that I rarely visit it in person, and each
month I write them a check. I could imagine asking them for proof that
my stuff is still there before writing that check ("send me a picture of
my stuff with today's newspaper"), to discourage them from throwing it
all out and double-renting the space to me and someone else. As
incentives go, it's not ideal, because it varies over time. Just before
I write my check, they're not going to throw out my stuff, because
they've already incurred their costs (nobody else has rented that space
this month), so the incentives are in my favor. Just after they get my
money, they could ditch my stuff and re-rent the locker, and I wouldn't
know about it for another month, so the income is neutral: there's no
immediate reason to stay honest. To keep them motivated at all points in
the cycle, they must extend you credit (pay at the end of the month, not
at the start), and they never come out ahead until you finally collect.

To maintain even that level of incentive, the proof has to be cheap to
provide. Even then they might still decide to demand a ransom when I
actually show up to collect my stuff.

The Tahoe equivalent would be a storage server that saves money by
throwing out 50% of the encrypted blocks of each file, so they can
provide full responses (hash chain and all) as long as you don't happen
to ask for the missing blocks. If you ever do, they could feign
connection errors or latency, so the downloader shifts to other servers,
and nobody would be the wiser. If they want to be really clever, they'll
throw out 100% of the blocks and regenerate them on-demand by repairing
from other server's shares. In both cases the client is paying for false
redundancy. (note to self: if we ever do add PoR, it needs to be on
individual shares, not the FEC-decoded ciphertext)

As Norm Hardy points out in http://cap-lore.com/BigStore/DataBank.html ,
the best way to fix the incentives is to convince the server to post a
bond (or risk their reputation, or something valuable to them and
valuable to you) that gets paid out if they ever fail to provide the
data on demand. As a client you decide how much the data is worth to
you, then you negotiate for a bond worth more than the data, and pay
some storage fee. A server that has confidence in their reliability and
business model will be willing to risk the bond. (making this scale may
be impossible, I'll grant, as might building a protocol that would
convince a judge of their breach).

Checking up on the server in the meantime is a way to reduce the
all-or-nothingness of it. The question to ask is "what would you do if
they fail the PoR?". In the Tahoe world, that counts as a lost/corrupted
share, which should trigger immediate repair, to maintain health of the
file.

So I guess I see PoR fitting into Tahoe somewhere between our
lightweight trust-the-server "file check" operation (which just asks "do
you still have this share?" and believes the answer), and the
heavyweight validate-every-bit-of-every-share "file verify" operation
(which fetches even more data than a normal download). If we had some
PoR scheme that was cheaper than full download, clients could run it
frequently (imagine 'tahoe check --por') and use the results to
preemptively repair damage.

I need to read the papers, but I think that bandwidth-cheap PoR schemes
still require a lot of data saved on the client, which is why
share-buddies seemed like the easiest approach (as well as enabling a
really simple protocol, like a salted hash, or even just an XOR of a
couple of randomly-chosen bytes). If a share buddy discovered their
partner was cheating, they could notify the client, who would demand the
full share, and if they failed to produce it, the client would stop
paying them and take their business elsewhere (repair and put the
replacement share on a different server).

> But this really puts the icing on that! The man who co-invented that
> protocol and co-authored that paper is also unaware of that fact!
> 
> So I guess I really should let Dr. Juels off the hook. ;-)

Hah!

cheers,
 -Brian