| 1 | = Grid Identifiers = |
|---|
| 2 | |
|---|
| 3 | What makes up a Tahoe "grid"? The rough answer is a fairly-stable set of |
|---|
| 4 | Storage Servers. |
|---|
| 5 | |
|---|
| 6 | The read- and write- caps that point to files and directories are scoped to a |
|---|
| 7 | particular set of servers. The Tahoe peer-selection and erasure-coding |
|---|
| 8 | algorithms provide high availability as long as there is significant overlap |
|---|
| 9 | between the servers that were used for upload and the servers that are |
|---|
| 10 | available for subsequent download. When new peers are added, the shares will |
|---|
| 11 | get spread out in the search space, so clients must work harder to download |
|---|
| 12 | their files. When peers are removed, shares are lost, and file health is |
|---|
| 13 | threatened. Repair bandwidth must be used to generate new shares, so cost |
|---|
| 14 | increases with the rate of server departure. If servers leave the grid too |
|---|
| 15 | quickly, repair may not be able to keep up, and files will be lost. |
|---|
| 16 | |
|---|
| 17 | So to get long-term stability, we need that peer set to remain fairly stable. |
|---|
| 18 | A peer which joins the grid needs to stick around for a while. |
|---|
| 19 | |
|---|
| 20 | == Multiple Grids == |
|---|
| 21 | |
|---|
| 22 | The current Tahoe read-cap format doesn't admit the existence of multiple |
|---|
| 23 | grids. In fact, the "URI:" prefix implies that these cap strings are |
|---|
| 24 | universal: it suggests that this string (plus some protocol definition) is |
|---|
| 25 | completely sufficient to recover the file. |
|---|
| 26 | |
|---|
| 27 | However, there are a variety of reasons why we may want to have more than one |
|---|
| 28 | Tahoe grid in the world: |
|---|
| 29 | |
|---|
| 30 | * scaling: there are a variety of problems that are likely to be encountered |
|---|
| 31 | as we attempt to grow a Tahoe grid from a few dozen servers to a few |
|---|
| 32 | thousand, some of which are easier to deal with than others. Maintaining |
|---|
| 33 | connections to servers and keeping up-to-date on the locations of servers |
|---|
| 34 | is one issue. There are design improvements that can work around these, |
|---|
| 35 | but they will take time, and we may not want to wait for that work to be |
|---|
| 36 | done. Begin able to deploy multiple grids may be the best way to get a |
|---|
| 37 | large number of clients using tahoe at once. |
|---|
| 38 | |
|---|
| 39 | * managing quality of storage, storage allocation: the members of a |
|---|
| 40 | friendnet may want to restrict access to storage space to just each other, |
|---|
| 41 | and may want to run their grid without involving any external coordination |
|---|
| 42 | |
|---|
| 43 | * commercial goals: a company using Tahoe may want to restrict access to |
|---|
| 44 | storage space to just their customers |
|---|
| 45 | |
|---|
| 46 | * protocol upgrades, development: new and experimental versions of the tahoe |
|---|
| 47 | software may need to be deployed and analyzed in isolation from the grid |
|---|
| 48 | that clients are using for active storage |
|---|
| 49 | |
|---|
| 50 | So if we define a grid to be a set of storage servers, then two distinct |
|---|
| 51 | grids will have two distinct sets of storage servers. Clients are free to use |
|---|
| 52 | whichever grid they like (and have permission to use), however each time they |
|---|
| 53 | upload a file, they must choose a specific grid to put it in. Clients can |
|---|
| 54 | upload the same file to multiple grids in two separate upload operations. |
|---|
| 55 | |
|---|
| 56 | == Grid IDs in URIs == |
|---|
| 57 | |
|---|
| 58 | Each URI needs to be scoped to a specific grid, to avoid confusion ("I looked |
|---|
| 59 | for URI123 and it said File Not Found.. oh, which grid did you upload that |
|---|
| 60 | into?"). To accomplish this, the URI will contain a "grid identifier" that |
|---|
| 61 | references a specific Tahoe grid. The grid ID is shorthand for a relatively |
|---|
| 62 | stable set of storage servers. |
|---|
| 63 | |
|---|
| 64 | To make the URIs actually Universal, there must be a way to get from the grid |
|---|
| 65 | ID to the actual grid. This document defines a protocol by which a client |
|---|
| 66 | that wants to download a file from a previously-unknown grid will be able to |
|---|
| 67 | locate and connect to that grid. |
|---|
| 68 | |
|---|
| 69 | == Grid ID specification == |
|---|
| 70 | |
|---|
| 71 | The Grid ID is a string, using a fairly limited character set, alphanumerics |
|---|
| 72 | plus possibly a few others. It can be very short: a gridid of just "0" can be |
|---|
| 73 | used. The gridID will be copied into the cap string for every file that is |
|---|
| 74 | uploaded to that grid, so there is pressure to keep them short. |
|---|
| 75 | |
|---|
| 76 | The cap format needs to be able to distinguish the gridID from the rest of |
|---|
| 77 | the cap. This could be expressed in DNS-style dot notation, for example the |
|---|
| 78 | directory write-cap with a write-key of "0ZrD.." that lives on gridID "foo" |
|---|
| 79 | could be expressed as "D0ZrDNAHuxs0XhYJNmkdicBUFxsgiHzMdm.foo" . |
|---|
| 80 | |
|---|
| 81 | * design goals: non-word-breaking, double-click-pasteable, maybe |
|---|
| 82 | human-readable (do humans need to know which grid is being used? probably |
|---|
| 83 | not). |
|---|
| 84 | * does not need to be Secure (i.e. long and unguessable), but we must |
|---|
| 85 | analyze the sorts of DoS attack that can result if it is not (and even |
|---|
| 86 | if it is) |
|---|
| 87 | * does not need to be human-memorable, although that may assist debugging |
|---|
| 88 | and discussion ("my file is on grid 4, where is yours?) |
|---|
| 89 | * *does* need to be unique, but the total number of grids is fairly small |
|---|
| 90 | (counted in the hundreds or thousands rather than millions or billions) |
|---|
| 91 | and we can afford to coordinate the use of short names. Folks who don't |
|---|
| 92 | like coordination can pick a largeish random string. |
|---|
| 93 | |
|---|
| 94 | Each announcement that a Storage Server publishes (to introducers) will |
|---|
| 95 | include its grid id. If a server participates in multiple grids, it will make |
|---|
| 96 | multiple announcements, each with a single grid id. Clients will be able to |
|---|
| 97 | ask an introducer for information about all storage servers that participate |
|---|
| 98 | in a specific grid. |
|---|
| 99 | |
|---|
| 100 | Clients are likely to have a default grid id, to which they upload files. If |
|---|
| 101 | a client is adding a file to a directory that lives in a different grid, they |
|---|
| 102 | may upload the file to that other grid instead of their default. |
|---|
| 103 | |
|---|
| 104 | == Getting from a Grid ID to a grid == |
|---|
| 105 | |
|---|
| 106 | When a client decides to download a file, it starts by unpacking the cap and |
|---|
| 107 | extracting the grid ID. |
|---|
| 108 | |
|---|
| 109 | Then it attempts to connect to at least one introducer for that grid, by |
|---|
| 110 | leveraging DNS: |
|---|
| 111 | |
|---|
| 112 | hash $GRIDID id (with some tag) to get a long base32-encoded string: $HASH |
|---|
| 113 | |
|---|
| 114 | GET http://tahoe-$HASH.com/introducer/gridid/$GRIDID |
|---|
| 115 | |
|---|
| 116 | the results should be a JSON-encoded list of introducer FURLs |
|---|
| 117 | |
|---|
| 118 | for extra redundancy, if that query fails, perform the following additional |
|---|
| 119 | queries: |
|---|
| 120 | |
|---|
| 121 | GET http://tahoe-$HASH.net/introducer/gridid/$GRIDID |
|---|
| 122 | GET http://tahoe-$HASH.org/introducer/gridid/$GRIDID |
|---|
| 123 | GET http://tahoe-$HASH.tv/introducer/gridid/$GRIDID |
|---|
| 124 | GET http://tahoe-$HASH.info/introducer/gridid/$GRIDID |
|---|
| 125 | etc. |
|---|
| 126 | GET http://grids.tahoe-lafs.org/introducer/gridid/$GRIDID |
|---|
| 127 | |
|---|
| 128 | The first few introducers should be able to announce other introducers, via |
|---|
| 129 | the distributed gossip-based introduction scheme of #68. |
|---|
| 130 | |
|---|
| 131 | Properties: |
|---|
| 132 | |
|---|
| 133 | * claiming a grid ID is cheap: a single domain name registration (in an |
|---|
| 134 | uncontested namespace), and a simple web server. allmydata.com can publish |
|---|
| 135 | introducer FURLs for grids that don't want to register their own domain. |
|---|
| 136 | |
|---|
| 137 | * lookup is at least as robust as DNS. By using benevolent public services |
|---|
| 138 | like tahoe-grids.allmydata.com, reliability can be increased further. The |
|---|
| 139 | HTTP fetch can return a list of every known server node, all of which can |
|---|
| 140 | act as introducers. |
|---|
| 141 | |
|---|
| 142 | * not secure: anyone who can interfere with DNS lookups (or claims |
|---|
| 143 | tahoe-$HASH.com before you do) can cause clients to connect to their |
|---|
| 144 | servers instead of yours. This admits a moderate DoS attack against |
|---|
| 145 | download availability. Performing multiple queries (to .net, .org, etc) |
|---|
| 146 | and merging the results may mitigate this (you'll get their servers *and* |
|---|
| 147 | your servers; the download search will be slower but is still likely to |
|---|
| 148 | succeed). It may admit an upload DoS attack as well, or an upload |
|---|
| 149 | file-reliability attack (trick you into uploading to unreliable servers) |
|---|
| 150 | depending upon how the "server selection policy" (see below) is |
|---|
| 151 | implemented. |
|---|
| 152 | |
|---|
| 153 | Once the client is connected to an introducer, it will see if there is a |
|---|
| 154 | Helper who is willing to assist with the upload or download. (For download, |
|---|
| 155 | this might reduce the number of connections that the grid's storage servers |
|---|
| 156 | must deal with). If not, ask the introducers for storage servers, and connect |
|---|
| 157 | to them directly. |
|---|
| 158 | |
|---|
| 159 | == Controlling Access == |
|---|
| 160 | |
|---|
| 161 | The introducers are not used to enforce access control. Instead, a system of |
|---|
| 162 | public keys are used. |
|---|
| 163 | |
|---|
| 164 | There are a few kinds of access control that we might want to implement: |
|---|
| 165 | |
|---|
| 166 | * protect storage space: only let authorized clients upload/consume storage |
|---|
| 167 | * protect download bandwidth: only give shares to authorized clients |
|---|
| 168 | * protect share reliability: only upload shares to "good" servers |
|---|
| 169 | |
|---|
| 170 | The first two are implemented by the server, to protect their resources. The |
|---|
| 171 | last is implemented by the client, to avoid uploading shares to unreliable |
|---|
| 172 | servers (specifically, to maximize the utility of the client's limited upload |
|---|
| 173 | bandwidth: there's no problem with putting shares on unreliable peers per se, |
|---|
| 174 | but it is a problem if doing so means the client won't put a share on a more |
|---|
| 175 | reliable peer). |
|---|
| 176 | |
|---|
| 177 | The first limitation (protect storage space) will be implemented by public |
|---|
| 178 | keys and signed "storage authority" certificates. The client will present |
|---|
| 179 | some credentials to the storage server to convince it that the client |
|---|
| 180 | deserves the space. When storage servers are in this mode, they will have a |
|---|
| 181 | certificate that names a public key, and any credentials that can demonstrate |
|---|
| 182 | a path from that key will be accepted. This scheme is described in |
|---|
| 183 | docs/proposed/old-accounts-pubkey.txt . |
|---|
| 184 | |
|---|
| 185 | The second limitation is unexplored. The read-cap does not currently contain |
|---|
| 186 | any notion of who must pay for the bandwidth incurred. |
|---|
| 187 | |
|---|
| 188 | The third limitation (only upload to "good" servers), when enabled, is |
|---|
| 189 | implemented by a "server selection policy" on the client side, which defines |
|---|
| 190 | which server credentials will be accepted. This is just like the first |
|---|
| 191 | limitation in reverse. Before clients consider including a server in their |
|---|
| 192 | peer selection algorithm, they check the credentials, and ignore any that do |
|---|
| 193 | not meet them. |
|---|
| 194 | |
|---|
| 195 | This means that a client may not wish to upload anything to "foreign grids", |
|---|
| 196 | because they have no promise of reliability. The reasons that a client might |
|---|
| 197 | want to upload to a foreign grid need to be examined: reliability may not be |
|---|
| 198 | important, or it might be good enough to upload the file to the client's |
|---|
| 199 | "home grid" instead. |
|---|
| 200 | |
|---|
| 201 | The server selection policy is intended to be fairly open-ended: we can |
|---|
| 202 | imagine a policy that says "upload to any server that has a good reputation |
|---|
| 203 | among group X", or more complicated schemes that require less and less |
|---|
| 204 | centralized management. One important and simple scheme is to simply have a |
|---|
| 205 | list of acceptable keys: a friendnet with 5 members would include 5 such keys |
|---|
| 206 | in each policy, enabling every member to use the services of the others, |
|---|
| 207 | without having a single central manager with unilateral control over the |
|---|
| 208 | definition of the group. |
|---|
| 209 | |
|---|
| 210 | == Closed Grids == |
|---|
| 211 | |
|---|
| 212 | To implement these access controls, each client needs to be configured with |
|---|
| 213 | three things: |
|---|
| 214 | |
|---|
| 215 | * home grid ID (used to find introducers, helpers, storage servers) |
|---|
| 216 | * storage authority (certificate to enable uploads) |
|---|
| 217 | * server selection policy (identify good/reliable servers) |
|---|
| 218 | |
|---|
| 219 | If the server selection policy indicates centralized control (i.e. there is |
|---|
| 220 | some single key X which is used to sign the credentials for all "good" |
|---|
| 221 | servers), then this could be built in to the grid ID. By using the base32 |
|---|
| 222 | hash of the pubkey as the grid ID, clients would only need to be configured |
|---|
| 223 | with two things: the grid ID, and their storage authority. In this case, the |
|---|
| 224 | introducer would provide the pubkey, and the client would compare the hashes |
|---|
| 225 | to make sure they match. This is analogous to how a TubID is used in a FURL. |
|---|
| 226 | |
|---|
| 227 | Such grids would have significantly larger grid IDs, 24 characters or more. |
|---|