Opened at 2008-02-08T22:26:00Z
Last modified at 2021-03-30T18:40:46Z
#308 new enhancement
add directory traversal / deep-verify capability?
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | code-dirnodes | Version: | 0.7.0 |
Keywords: | vdrive newcaps verify repair privacy anonymity research | Cc: | |
Launchpad Bug: |
Description
We might split our current three-level directory capability structure (write, read, verify) into four levels: write, read, traverse, verify. The 'traverse cap' would be able to access the traverse cap of child directories, and the verify cap of child files. It would not be able to read child names.
The issue is that, at present, the verifier manifest (a set of verifier caps for all files and directories reachable from some root) can only be generated by someone who holds a readcap for the root. This manifest generation cannot be safely delegated to some other party (such as a central allmydata server). So we're forced to decide between having customers expose their files to us (by giving us their root readcap) or being required to create their manifest (for file checking/repair) on their own.
If we had traversal caps, then customers could give us the traversal cap instead of their read cap. We could still see the shape of their filesystem (and probably the length of their filenames, and the size of their files), but perhaps that would be little enough exposure that customers would be comfortable with revealing it. In exchange, we could provide the service of keeping all their files checked and repaired with less effort on their part, even when they leave their node offline for months at a time.
The implementation would require a couple of pieces:
- dirnode capabilities would need to have a four-level structure. Writecaps beget readcaps. Readcaps beget traversecaps. Traversecaps beget verifycaps.
- I think this means that mutable file caps need an extra intermediate layer as well: this is tricky, and will require some staring at the DSA mutable file diagram to find a place that could accomodate it.
- Each edge entry contains five child items: (name, writecap, readcap, traversecap, metadata)
- 'name', 'readcap', and 'metadata' are encrypted with a key derived from the dirnode's readcap.
- 'writecap' is encrypted with the dirnode's writecap.
- 'traversecap' is encrypted with the dirnode's traversecap
When we do DSA dirnodes, we should take advantage of the compatibility break to implement something like this. I suspect it will require changes to the DSA scheme as well.
(there is probably a better name for this concept.. "walk cap"? "manifest cap"? "deep verify cap"?)
Attachments (1)
Change History (17)
comment:1 Changed at 2008-02-08T22:41:22Z by zooko
comment:2 Changed at 2008-02-08T23:56:07Z by warner
true. The requirement would be that they produce and deliver (reliably) a manifest some time after they stop changing things, and before they shut down their machine or go offline for a month. One concern is that we can't predict their behavior, so we might have to be fairly aggressive about pushing these manifests (like, after a minute of inactivity), and they're relatively expensive to build (since it requires a traversal of their whole directory tree). My original hope was to produce a manifest once per day, but I'm not sure how realistic that is w.r.t. a laptop which goes offline unexpectedly.
Another factor to keep in mind is directory sharing. We haven't talked much about who "owns" shared directories: one reasonable answer is that everybody does: if you can read the file, you share responsibility for keeping it alive (by maintaing a lease on it along with everyone else). Another reasonable policy is that we only add leases to files in writeable directories, declaring "ownership" to be equal to mutability. This approach would work better for read-only directories which are shared among many people, but would fail if the write-capable "owner" of that directory got tired of maintaining it.
In any case, if the set of files in your manifest can change without your involvement (because somebody else made additions to a shared directory), then we might want the manifest to be updated in a more offline fashion, and to do this we'd need some sort of traversal cap. On the other hand, we might make the argument that the manifest of the person who added that new child may contain the new file, and their manifest would be good enough to keep the file alive. Or, we could just state that you have to give us a new manifest at least once a month if you want to take advantage of our file keepalive services.
On the other other hand, the file keepalive service might also be the first-line quota enforcement service, and we might require that you submit your traversal cap as fairly cheap way to estimate the amount of space you're consuming. In this world, the rule would be that we'll only do keepalives for the 1G or 10G or whatever you've contracted with us to store, and the quota is primarily enforced by adding up the sizes of all files in the manifest (which we calculate ourselves, using the traversal cap). In this case, we'd only check with the storage servers rarely, either randomly or if we suspect that the client is storing large files outside the directory graph that they've given us traversal authority over. If the storage servers tell us that this user is storing more data than the manifest contains, we might get suspicious.
comment:3 Changed at 2008-02-12T04:15:00Z by warner
- Summary changed from directory traversal capability to add directory traversal / deep-verify capability?
comment:4 Changed at 2008-04-24T23:51:00Z by warner
- Component changed from code to code-dirnodes
- Owner somebody deleted
comment:5 Changed at 2008-06-01T20:43:33Z by warner
- Milestone changed from eventually to undecided
comment:6 Changed at 2009-10-28T04:11:57Z by davidsarah
- Keywords newcaps added
Tagging issues relevant to new cap protocol design.
comment:7 Changed at 2009-11-24T17:54:02Z by davidsarah
- Milestone changed from undecided to eventually
I'm pretty sure we want this, and I see how to do it for the Elk Point design.
The name deep-verify seems preferable because it would allow you to verify, not just traverse.
comment:8 Changed at 2009-12-20T15:49:54Z by davidsarah
- Keywords verify repair added
comment:9 Changed at 2010-02-23T03:08:39Z by zooko
- Milestone changed from eventually to 2.0.0
comment:10 Changed at 2011-02-20T04:34:55Z by davidsarah
In http://tahoe-lafs.org/pipermail/tahoe-dev/2009-July/002302.html , zooko asks whether we should make all verify caps deep (in the same way that all directory read caps are deep). He also points out this counterargument:
Now the reason why it could be useful to have a Shallow Verify Cap -- to give someone the ability to verify the integrity of a directory without also giving them the ability to get the verify-caps of the children -- is for a kind of data-privacy. You might want to give lots of people the ability to verify the integrity of your directories without also giving them the ability to trace your directory structure -- the sizes and link structure of your directories and files. As we've recently been discussing, it might be nice for every storage server to have a verify cap to go with every share that it holds. We generally agree that "verify caps are not secret" -- everyone in the world can see everyone else's verify caps. You might not want everyone to be able to see the shape of your filesystem though!
For the next version of the Elk Point protocol I'm working on (v4), I plan to make shallow verify caps the same as storage indices. So, it would be automatic that every storage server has a shallow verify cap for each share that it holds.
In this context I agree with the counterargument. A server shouldn't automatically get deep verify authority for the shares it holds.
zooko argued for a different conclusion:
The only problem is: *they can do that anyway*. Anybody who can observe your Tahoe storage service connections (even though they are encrypted) or who operates a storage server can easily detect the exact structure of your filesystem -- which directories are linked to which other directories and files, as well as the precise size of all of the files. To defend against this sort of traffic analysis or pattern detection is somewhere between "hard" and "impossible". Our comrades over at the GNUnet project, the Freenet project, and others have been trying to develop such techniques for years (both Brian and I have contributed to such projects in the past, Brian more recently than I). Whether they're close to succeeding is not clear to me (perhaps some representative of such projects or someone whose expertise is more current than mine could speak up). But it is certain that TahoeLAFS will not offer such privacy in the next couple of releases.
I don't agree that the cap protocol should be designed in a way that precludes this privacy gain. It's certainly hard to achieve privacy of directory structure against storage servers (even when running Tahoe-LAFS over Tor, I2P, etc.). However, if we move to an unencrypted storage protocol (or make encryption optional for that protocol), then making all verify caps deep would reveal the whole directory structure even to passive observers.
comment:11 Changed at 2011-02-20T04:35:14Z by davidsarah
- Keywords privacy anonymity added
comment:12 Changed at 2011-02-20T05:36:10Z by davidsarah
If you can't see the SVG attachment, try http://jacaranda.org/tahoe/immutable-elkpoint-4.png
Changed at 2011-02-20T05:47:06Z by davidsarah
Immutable file protocol "Elk Point 4" (Scalable Vector Graphics format). [corrected errors in text]
comment:13 follow-up: ↓ 14 Changed at 2011-02-20T05:59:16Z by davidsarah
A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X.
comment:14 in reply to: ↑ 13 Changed at 2011-02-20T06:25:47Z by davidsarah
Replying to davidsarah:
A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X.
Oh, there's a better solution. We can include hash(CS, Plain_K) in the share (incidentally fixing #453), and then compute K as a hash of that and Plain_X. Then the read cap holder can check the decrypted Plain_X against K, even though it doesn't in general know CS.
comment:15 Changed at 2013-01-22T14:09:35Z by zooko
- Keywords research added
comment:16 Changed at 2021-03-30T18:40:46Z by meejah
- Milestone 2.0.0 deleted
Ticket retargeted after milestone closed (editing milestones)
I like "deep verify cap" as a name.
However, their manifest doesn't change while they are off-line, right? So it doesn't seem too onerous to require them to produce manifests for checkers whenever they change their tree.
Still, it is an interesting idea.