[tahoe-dev] simple authority taxonomy versus a kind of privacy

Brian Warner warner at lothar.com
Sat Jul 18 14:27:09 PDT 2009


Zooko Wilcox-O'Hearn wrote:

> I was wondering about the word "traversal cap", thinking "Isn't that
> what one might call a 'deep verify cap'?".

Yeah, "deep-verify" is a reasonable term. Depending upon how we
implement lease-renewal-caps and repair-caps, the "traversal cap" might
be combineable with some other secret to generate those other caps for
all reachable files and directories, so we might also create a
"deep-repair" or "deep-renew" cap. But unless we go crazy and expand
the dirnode structure to half a dozen columns, these other deep- caps
will be based upon the deep-verify cap.

Since directories are really just specialized ways of interpreting the
contents of a file (mutable for now, we plan to add immutable in the
future), every directory will also have the same caps you'd get on the
underlying file object. There will always be a "shallow verify cap" for
each directory (i.e. the regular verifycap for the underlying file),
regardless of whether there is a good use for it or not. So I don't
think it makes much sense to try to get rid of them. If you're
concerned about people misusing them, just don't expose them in the
interface, or don't give them a provocative name.

> Now the reason why it could be useful to have a Shallow Verify Cap --
> to give someone the ability to verify the integrity of a directory
> without also giving them the ability to get the verify-caps of the
> children -- is for a kind of data-privacy.

Lease management and file-repair are the two things I can imagine
wanting to use lists of shallow verify caps for. A "manifest" is a list
of shallow caps (using as weak a cap as you can get.. perhaps just a
storage-index), with which you might:

 * compare against a previous manifest, to find out which
   files/directories have been added or removed, to cancel leases on
   the removals or add leases to the additions.
 * transform into a list of lease-renewal caps, to be given to a
   service that will keep your files alive while you and your computer
   are on vacation
 * transform into a list of repaircaps, for a similar repairer service

The reason for having a deep-verify-cap is to provide (for those folks
who are willing to give up this information) an easier way to delegate
these repair and lease-maintenance tasks. Instead of building a whole
manifest and giving thousands of repaircaps to your hired repair
service (and rebuilding the list each time you change anything), you
give them a single deep-repair-cap instead (which remains valid for a
long time).

> The only problem is: *they can do that anyway*. Anybody who can
> observe your Tahoe storage service connections (even though they are
> encrypted) or who operates a storage server can easily detect the
> exact structure of your filesystem -- which directories are linked to
> which other directories and files, as well as the precise size of all
> of the files.

I disagree with the first claim. A passive observer will scarcely find
it "easy" to determine the shape of your filesystem. The foolscap
messages and storage-index arguments are hidden by the encrypted links,
so the only information available to the adversary is the number of
bytes sent in each direction. If the client is only doing one thing at
a time, they can probably match requests to responses, and distinguish
between dirnode reads and file reads. Then their only way to
distinguish directories is by the size of their packed representation,
and they can only learn the size of files that you actually download
completely (and they can't distinguish between multiple files of the
same size). You'd have to do a recursive download of all your files,
one at a time, to let them build up the shape of your directory
structure this way.

The adversary who is able to mount this attack must be able to observe
all of your traffic to nearly all storage servers. A more active
adversary (who runs a storage server) must still run several servers
(something like numservers/k) to have a good chance of seeing most of
your requests, and still only gets to learn anything about the files
that you actually download.

In addition, the information you get from a deep-verify-cap is
significantly greater than what you can get from traffic analysis. With
the verifycap, you get the exact file identities (even without
downloading anything), which you can compare against other people's
manifests, or against information you retrieve from unrelated storage
servers. If convergence is enabled, knowing the storage-index lets you
mount partial-information-guessing attacks against those files. Also,
there are many Tahoe use-cases which involve trust relationships
between clients and server operators: the servers might be more trusted
than the outside world, so it may be ok that the client is more
vulnerable to server behavior than to third party observers.

My general feeling is this: the existence of traffic-analysis attacks
(which allow adversaries in certain positions to learn "X") is not a
strong argument to simply publish "X" to the entire world. The
traffic-analysis attacks we've identified are not trivial to mount: why
make the bad-guy's job easier? In addition, other people may find
reasonable ways to address these problems in the future (someone might
use PIR to retrieve shares): we'd be thwarting their noble efforts by
publishing the X that they're trying to hide.

That feeling may or may not be appropriate here, but in general I'm
suspicious of arguments to explicitly abandon privacy properties in
response to the threat of an attack.


Anyways, I don't understand what sort of code change you're advocating:
how could you "get rid of" a dirnode's shallow-verify cap? Let's just
keep calling it "the verifycap of the underlying mutable file" instead
of "shallow-verify" and not encourage the user to use it or believe
they can get any special privacy-preserving properties from it.

cheers,
 -Brian


More information about the tahoe-dev mailing list