Changes between Version 9 and Version 10 of Security


Ignore:
Timestamp:
2007-10-17T01:17:34Z (16 years ago)
Author:
warner
Comment:

lots of updates

Legend:

Unmodified
Added
Removed
Modified
  • Security

    v9 v10  
    11= Security Considerations =
    22
    3 This page exists so that there is a single place to go to learn about the general security properties of Tahoe, as well as about any current known issues that might have security consequences.
     3This page exists so that there is a single place to go to learn about the
     4general security properties of Tahoe, as well as about any current known
     5issues that might have security consequences.
    46
    57= Current Known Security Issues in Tahoe =
    68
    7 There is currently one known issue in Tahoe that could have security implications.
     9 * XSRF / Browser-based Attacks
    810
    9 This issue is: if there is a file stored on a Tahoe storage grid, and that file gets downloaded and displayed in a web browser, and that file contains hyperlinks which get followed by the web browser, then the web server that those hyperlinks point to gets access to the file that the hyperlinks were in.  Remember that IMG tags typically get followed automatically, so it is not a complete defense against this problem to make sure that nobody who is viewing the page clicks on the hyperlinks.
     11  * #127: the URI of a file is embedded in the URL that is used to access it.
     12    This URI should not be unintentionally revealed to anyone else, because
     13    that would reveal the full contents of the file. There are two current
     14    ways this URI can be unintentionally revealed:
    1015
    11 We are thinking about ways to close off this leakage of authority while preserving ease of use -- the ticket associated with this issue is ticket #127.
     16    * If the file is HTML and contains a hyperlink to an external web server,
     17      any user who follows that hyperlink may reveal the URI to that web
     18      server through the Referrer header.
    1219
    13 In the meantime, a good work-around is to remove all hyperlinks pointing to external servers from any HTML file that you upload to a Tahoe grid, if you want the contents of the file to remain private.
     20    * If the file is HTML and contains active content such as Javascript,
     21      that javascript can read the URL (and consequently the secret URI)
     22      as it runs. This javascript may then find some clever way to reveal
     23      the URI to a third party (such as by changing the src= attribute of
     24      an image tag).
     25
     26  * We are thinking about ways to close off this leakage of authority while
     27    preserving ease of use -- the ticket associated with this issue is ticket
     28    #127. In the meantime, a good work-around is to remove all hyperlinks
     29    pointing to external servers from any HTML file that you upload to a
     30    Tahoe grid, and to not store HTML with embedded javascript, if you want
     31    the contents of the file to remain private. Note that no other files or
     32    directories are threatened, only the HREF/JS-bearing HTML file.
    1433
    1534= General Security Properties of Tahoe =
    1635
    17 '''The rest of this page, below, is not complete.'''  However, you can view [source:docs/architecture.txt@1432#L472 the detailed technical explanation] of which this page is eventually intended to be a summary.
     36'''The rest of this page, below, is not complete.''' However, you can view
     37[source:docs/architecture.txt@1432#L472 the detailed technical explanation]
     38of which this page is eventually intended to be a summary.
    1839
    1940= The Distributed Filesystem =
     
    2748==== read access ====
    2849
    29 Each file has a unique and unguessable identifier, called a "CHK-URI", which is derived from the file contents.  Possession of this identifier is necessary and sufficient to download, reconstruct, decrypt, and verify the integrity of the file.  If a person is not given the CHK-URI, then they cannot see the contents of the file.
     50Each file has a unique and unguessable identifier, called a "CHK-URI", which
     51may be derived from the file contents. Possession of this identifier is
     52necessary and sufficient to download, reconstruct, decrypt, and verify the
     53integrity of the file. If a person is not given the CHK-URI, then they cannot
     54see the contents of the file.
    3055
    3156==== mutation ====
    3257
    33 Files in the Tahoe grid are immutable.  If you upload a file to the grid, and then change part of it and upload it again, then there are now two files in the grid -- the old one and the new one -- and each has a distinct, unique, CHK-URI.
     58Files in the Tahoe grid are immutable. If you upload a file to the grid, and
     59then change part of it and upload it again, then there are now two files in
     60the grid -- the old one and the new one -- and each has a distinct, unique,
     61CHK-URI. The directory to which the new file was uploaded will only contain a
     62reference to the new file. If no other directories still reference the old
     63file (and if no manual copies of the URI were retained), the old file will be
     64unreachable.
     65
     66A future extension will provide mutable files. For these, a given URI will
     67not necessarily refer to a specific sequence of bytes, but rather to just the
     68most recent contents that were uploaded to that URI. Like dirnode URIs, these
     69URIs will come in read-write and read-only forms, and the file can only be
     70modified by someone who holds a read-write URI.
    3471
    3572== Traffic Analysis ==
    3673
    37 ''To be filled in.''  Traffic analysis is subtle and powerful.  For the moment, assume that if someone wants to, they can learn everything about your every act, including when were, and which file, by its unique identifier and its length ''except'' that they can't learn the actual contents of the files, except that if the file happens to be a file whose contents they already know then they can.  Make sense?  I'll come back later.
     74''To be filled in.'' Traffic analysis is subtle and powerful. The distributed
     75nature of Tahoe provides even more information to a passive observer than
     76usual.
     77
     78All traffic between tahoe nodes uses transport-level encryption, so an
     79attacker must participate in a Tahoe network to obtain visibility into which
     80shares are being uploaded and downloaded. However, the promiscuous nature of
     81tahoe's Introduction protocol makes this rather easy.
     82
     83In small networks, most server see upload and download requests for all
     84files. In large networks, an attacker who can provide at least 10% of the
     85servers (for 3-of-10 encoding) will get to see upload/download requests for
     86all files. By seeing these requests, the attacker gets to know who is
     87interested in which files, although they cannot determine the contents of
     88those files unless they already have a copy (and convergence is being used).
     89
     90The directory nodes are encrypted, but all of the dirnodes are stored on the
     91same central server (the "vdrive server"). This server is in an excellent
     92position to see who accesses which dirnodes and when, and this information is
     93sufficient to build a dirnode graph that is equivalent to the user's
     94plaintext version. For example, if the server sees a get(dirnode#47, "34af")
     95followed by a get(dirnode#13, "8bb3"), it is safe to assume that dirnode#47
     96contains dirnode#13 as a subdirectory, and that "34af" is the encrypted form
     97of the subdir's name.
     98
     99This reconstructed graph has file/subdir names which are encrypted but the
     100same length as the real ones. The file URIs are not known, although if a file
     101is uploaded or downloaded shortly after a dirnode is accessed it is easy to
     102relate the two. Again, this points to the identity of the file, but not its
     103contents. However, it makes it fairly easy for the dirnode server to tell,
     104e.g., if a lot of users are all referencing the same file.
     105
     106A future design will include distributed directory nodes (to improve
     107availability and reliability). This will result in the same traffic-analysis
     108exposure as the centralized vdrive server, but makes the traffic visible to
     109even more servers (anyone who controls more than 10% of the servers will be
     110able to see all dirnode requests).