13 | | In the meantime, a good work-around is to remove all hyperlinks pointing to external servers from any HTML file that you upload to a Tahoe grid, if you want the contents of the file to remain private. |
| 20 | * If the file is HTML and contains active content such as Javascript, |
| 21 | that javascript can read the URL (and consequently the secret URI) |
| 22 | as it runs. This javascript may then find some clever way to reveal |
| 23 | the URI to a third party (such as by changing the src= attribute of |
| 24 | an image tag). |
| 25 | |
| 26 | * We are thinking about ways to close off this leakage of authority while |
| 27 | preserving ease of use -- the ticket associated with this issue is ticket |
| 28 | #127. In the meantime, a good work-around is to remove all hyperlinks |
| 29 | pointing to external servers from any HTML file that you upload to a |
| 30 | Tahoe grid, and to not store HTML with embedded javascript, if you want |
| 31 | the contents of the file to remain private. Note that no other files or |
| 32 | directories are threatened, only the HREF/JS-bearing HTML file. |
29 | | Each file has a unique and unguessable identifier, called a "CHK-URI", which is derived from the file contents. Possession of this identifier is necessary and sufficient to download, reconstruct, decrypt, and verify the integrity of the file. If a person is not given the CHK-URI, then they cannot see the contents of the file. |
| 50 | Each file has a unique and unguessable identifier, called a "CHK-URI", which |
| 51 | may be derived from the file contents. Possession of this identifier is |
| 52 | necessary and sufficient to download, reconstruct, decrypt, and verify the |
| 53 | integrity of the file. If a person is not given the CHK-URI, then they cannot |
| 54 | see the contents of the file. |
33 | | Files in the Tahoe grid are immutable. If you upload a file to the grid, and then change part of it and upload it again, then there are now two files in the grid -- the old one and the new one -- and each has a distinct, unique, CHK-URI. |
| 58 | Files in the Tahoe grid are immutable. If you upload a file to the grid, and |
| 59 | then change part of it and upload it again, then there are now two files in |
| 60 | the grid -- the old one and the new one -- and each has a distinct, unique, |
| 61 | CHK-URI. The directory to which the new file was uploaded will only contain a |
| 62 | reference to the new file. If no other directories still reference the old |
| 63 | file (and if no manual copies of the URI were retained), the old file will be |
| 64 | unreachable. |
| 65 | |
| 66 | A future extension will provide mutable files. For these, a given URI will |
| 67 | not necessarily refer to a specific sequence of bytes, but rather to just the |
| 68 | most recent contents that were uploaded to that URI. Like dirnode URIs, these |
| 69 | URIs will come in read-write and read-only forms, and the file can only be |
| 70 | modified by someone who holds a read-write URI. |
37 | | ''To be filled in.'' Traffic analysis is subtle and powerful. For the moment, assume that if someone wants to, they can learn everything about your every act, including when were, and which file, by its unique identifier and its length ''except'' that they can't learn the actual contents of the files, except that if the file happens to be a file whose contents they already know then they can. Make sense? I'll come back later. |
| 74 | ''To be filled in.'' Traffic analysis is subtle and powerful. The distributed |
| 75 | nature of Tahoe provides even more information to a passive observer than |
| 76 | usual. |
| 77 | |
| 78 | All traffic between tahoe nodes uses transport-level encryption, so an |
| 79 | attacker must participate in a Tahoe network to obtain visibility into which |
| 80 | shares are being uploaded and downloaded. However, the promiscuous nature of |
| 81 | tahoe's Introduction protocol makes this rather easy. |
| 82 | |
| 83 | In small networks, most server see upload and download requests for all |
| 84 | files. In large networks, an attacker who can provide at least 10% of the |
| 85 | servers (for 3-of-10 encoding) will get to see upload/download requests for |
| 86 | all files. By seeing these requests, the attacker gets to know who is |
| 87 | interested in which files, although they cannot determine the contents of |
| 88 | those files unless they already have a copy (and convergence is being used). |
| 89 | |
| 90 | The directory nodes are encrypted, but all of the dirnodes are stored on the |
| 91 | same central server (the "vdrive server"). This server is in an excellent |
| 92 | position to see who accesses which dirnodes and when, and this information is |
| 93 | sufficient to build a dirnode graph that is equivalent to the user's |
| 94 | plaintext version. For example, if the server sees a get(dirnode#47, "34af") |
| 95 | followed by a get(dirnode#13, "8bb3"), it is safe to assume that dirnode#47 |
| 96 | contains dirnode#13 as a subdirectory, and that "34af" is the encrypted form |
| 97 | of the subdir's name. |
| 98 | |
| 99 | This reconstructed graph has file/subdir names which are encrypted but the |
| 100 | same length as the real ones. The file URIs are not known, although if a file |
| 101 | is uploaded or downloaded shortly after a dirnode is accessed it is easy to |
| 102 | relate the two. Again, this points to the identity of the file, but not its |
| 103 | contents. However, it makes it fairly easy for the dirnode server to tell, |
| 104 | e.g., if a lot of users are all referencing the same file. |
| 105 | |
| 106 | A future design will include distributed directory nodes (to improve |
| 107 | availability and reliability). This will result in the same traffic-analysis |
| 108 | exposure as the centralized vdrive server, but makes the traffic visible to |
| 109 | even more servers (anyone who controls more than 10% of the servers will be |
| 110 | able to see all dirnode requests). |