[tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files

Sat Sep 12 21:51:00 PDT 2009

Brian Warner wrote:
> David-Sarah Hopwood wrote:
>> I've designed a new version of the cap protocol I posted a few days
>> ago, that addresses Brian's comments about roadblock attacks.
> 
> This is great stuff.

It's the most fun (well, intellectual fun :-) I've had in ages. Thanks a
lot for your feedback.

> The diagram is immensely helpful too. Some comments:
> 
> * so S+T is the storage-index, and the conservative values make it 210
>   bits for immutable files and 288 bits for mutable files.

Correct. I didn't pay much attention to making this value shorter, since
it only contributes to the length of verify caps.

> * roadblock attacks: I agree that it's probably plenty to make "t" large
>   enough that the attacker can't put roadblocks in place before the
>   uploader finishes their upload. If convergence is in use, the
>   roadblocker may get more time to construct their preimages
>   (particularly for likely-to-be-popular files, especially if someone
>   has exclusive access to the file for some period of time before it
>   becomes popular).

Right. (Convergence can be implemented in this protocol by letting K1 be
a hash of the plaintext and the convergence secret; and by either omitting
Dhash, or letting KD be a hash of the plaintext and another capability
that allows destroying any of that convergence group's files.)

>   However, we can (and probably should) design the
>   upload code to detect a roadblock (i.e. notice shares for the existing
>   SI, download some pieces of it to validate the hashes, then discover
>   the problem) and re-upload with a randomly-generated key.

The scheme described later in this email completely prevents roadblock
attacks during initial upload.

> * more significantly, the attacker can prevent repair of the file,
>   because once the original shares are uploaded, they have plenty of
>   time to compute preimages and place roadblocks on all the other
>   servers

> But, 2**50 is still pretty big, and each operation requires generating a
> DSA keypair,

Unfortunately not. In the immutable case there is no key pair, and in
the mutable case any key pair will do, so you only have to generate
one (and sign one message). The preimage can be generated by varying
K1_enc, say, since K1_enc doesn't have to be valid to act as a
successful roadblock.

> so perhaps we can rely upon the significant expense of this
> attack to discourage people from trying it.

We could use a deliberately expensive hash for hash_m, but of course
it can't be so expensive as to be a burden for the legitimate uploader.

The frustrating thing is that there's an objective way to decide
which shares are correct (attempt to read them, which provides an
additional n bits of integrity), but that can't be used to prove to
the server that you have the correct shares, without giving them
the read cap.

Thinking off the top of my head: suppose that there were an arbiter
who you trusted not to disclose the read cap or the file contents,
and that the server trusted to decide which shares are the correct
ones. Then you could give the read cap to the arbiter, and they could
sign a declaration that you could give to the server along with the
shares.

I wonder whether there's a way to do the same thing without an arbiter
by using some fancy zero knowledge proof. Effectively you have to prove
that S = f(R || T), where f is currently a hash function but might be
replaceable with some other kind of deterministic one-way function,
without giving away R.

Here's a limited version of that idea that provides some of the benefit:
It is possible to require that the initial uploader prove knowledge
of the read cap, by making it effectively a private key, and the S
part of the storage index the corresponding public key. This wouldn't
prevent a roadblock in the repair case, because an attacker might already
know the read cap by then. However, it would completely prevent roadblocks
in the initial upload, when only the uploader knows the read cap, and
it would also foil attackers who never learn the read cap.

A simple way to do this is to hash R || T to give a discrete-log private
key for a signature scheme, and sign the other fields of the storage
record (no need to include the ciphertext since that is covered by V)
using that key. In other words, the server would use S as a verification
public key to check the other fields of the record (in addition to
checking T via the hash).

Because this scheme uses a signature rather than an interactive proof,
it does not impede copying records between servers without having to
know the read cap. (It is harmless to allow such copying even during
the initial upload.)

This would make storage indices and verify caps longer since S is now
an encoding of a group element, but if we use an elliptic curve then
it can be encoded in 2k + 1 bits for security level k, so that's still
perfectly reasonable.

> For mutable files:
> 
> * The destroy-cap is a neat bit of feature, but I'd want to think about
>   it further before actually building it in.  It makes a lot more sense
>   in the context of add-only files.. if you don't have those, then
>   there's not a lot of difference between filling the mutable slot with
>   an empty file and completely destroying the slot.

It makes sense for both add-only files and immutable files. Note that in
the case of immutable files it doesn't compromise collision resistance:
after a file has been destroyed it is possible for it to be recreated at
the same storage index (and made accessible by the same read cap), but in
that case it must have the same contents.

Of course there are limitations to how thoroughly a file can be deleted,
given that it can *necessarily* be recreated even by anyone who knows just
the "green" information, i.e. they don't need to have had the read cap.

I agree that we need to think about this feature further before deciding
whether to add it. As I see it, there are several classes of reasons why
you might want to delete a file:

a) you don't care about it any more and want the space to be reclaimable.
b) its contents are obsolete and you no longer want people to rely on
   them, even though it isn't important whether they know the contents.
c) you have a legal and/or moral obligation to no longer store the
   contents or otherwise contribute to providing access to them.
d) you screwed up and made information available that should have been
   confidential.

In the subcase of c) where the obligation is due to a timed data retention
policy, it is probably better to implement that by letting leases on the
file expire, rather than having to remember (or pass on the obligation to
your successors to remember) to actively delete it. Case a) can also
be handled by leases, in most situations short of a critical lack of
storage space. However, some subcases of b) and c) arguably require
support for active deletion. In case d) (which may overlap with c)),
deleting the file gives no guarantee at all that the information hasn't
leaked. You can always hope that no-one read it, but this feature isn't
really designed to support d) (and its use may actually tip off attackers
that the file is interesting).

In a conventional filesystem, another reason to delete files is that
they are cluttering up the namespace. In Tahoe, this reason doesn't
really apply because destroying a file would not delete any directory
entries that referred to it.

>   The existing URI:SSK
>   files grant this ability to writecap holders: they can set the
>   container length to zero, which will cause the server to delete the
>   share altogether (the idea being to help with rebalancing/GC).
> 
> * I'd rename "master cap" to "writecap" or "full-writecap".. I spent a
>   bit of time trying to figure out how to reattain offline
>   writecap-to-readcap attenuation before realizing that the red "KCsign"
>   value is actually the add-only cap, and the rainbow-colored box is the
>   writecap.

Actually the KC_sign value acts either as an add-only cap, or as a
write-only cap when it is used for mutable files. A write-only cap does
not allow reading any of the previous plaintexts or future plaintexts.
I don't know how useful this is, but it was trivial to support.

Also note that "full-writecap" doesn't make sense in the add-only case,
since there is no writing, only adding and destroying. (It might have
been a mistake to try to explain the mutable and add-only cases in the
same diagram, but the crypto was identical.)

>   Also, I'd probably use "extra-verification readcap" or
>   something similar instead of "read/verify cap".. the latter makes me
>   think that the "readcap" doesn't provide any verification properties
>   (when in fact, with your parameters, provides something like
>   128+50=178 bits of integrity checking, which corresponds to a 178/2=
>   2^89 work factor to find a single collision?).

Correct. But if you were to attenuate it to a verify cap without having
the U field, you'd only have 50 bits of integrity checking. I just
couldn't think of a better name that wasn't ridiculously verbose.

Note that because I drew the immutable diagram based on the mutable one,
the widths of the boxes in the former may be a bit misleading.

> * I like the variable-length cap thing. You'd store R+T+U everywhere,
>   but you'd let users pass just R+T when they want to share. I think we
>   could make this non-quantized: they could pass R+T+U[:extra] when they
>   want to share, setting 'extra' according to how much they value
>   brevity over integrity.

Yes, this protocol generalises quite naturally to variable-length caps
(which would be between n+t and n+m in length). I'll probably include
that explicitly in the next version.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com