1 | .. -*- coding: utf-8-with-signature -*- |
---|
2 | |
---|
3 | ========== |
---|
4 | Tahoe URIs |
---|
5 | ========== |
---|
6 | |
---|
7 | 1. `File URIs`_ |
---|
8 | |
---|
9 | 1. `CHK URIs`_ |
---|
10 | 2. `LIT URIs`_ |
---|
11 | 3. `Mutable File URIs`_ |
---|
12 | |
---|
13 | 2. `Directory URIs`_ |
---|
14 | 3. `Internal Usage of URIs`_ |
---|
15 | |
---|
16 | Each file and directory in a Tahoe-LAFS file store is described by a "URI". |
---|
17 | There are different kinds of URIs for different kinds of objects, and there |
---|
18 | are different kinds of URIs to provide different kinds of access to those |
---|
19 | objects. Each URI is a string representation of a "capability" or "cap", and |
---|
20 | there are read-caps, write-caps, verify-caps, and others. |
---|
21 | |
---|
22 | Each URI provides both ``location`` and ``identification`` properties. |
---|
23 | ``location`` means that holding the URI is sufficient to locate the data it |
---|
24 | represents (this means it contains a storage index or a lookup key, whatever |
---|
25 | is necessary to find the place or places where the data is being kept). |
---|
26 | ``identification`` means that the URI also serves to validate the data: an |
---|
27 | attacker who wants to trick you into into using the wrong data will be |
---|
28 | limited in their abilities by the identification properties of the URI. |
---|
29 | |
---|
30 | Some URIs are subsets of others. In particular, if you know a URI which |
---|
31 | allows you to modify some object, you can produce a weaker read-only URI and |
---|
32 | give it to someone else, and they will be able to read that object but not |
---|
33 | modify it. Directories, for example, have a read-cap which is derived from |
---|
34 | the write-cap: anyone with read/write access to the directory can produce a |
---|
35 | limited URI that grants read-only access, but not the other way around. |
---|
36 | |
---|
37 | src/allmydata/uri.py is the main place where URIs are processed. It is |
---|
38 | the authoritative definition point for all the the URI types described |
---|
39 | herein. |
---|
40 | |
---|
41 | File URIs |
---|
42 | ========= |
---|
43 | |
---|
44 | The lowest layer of the Tahoe architecture (the "key-value store") is |
---|
45 | reponsible for mapping URIs to data. This is basically a distributed |
---|
46 | hash table, in which the URI is the key, and some sequence of bytes is |
---|
47 | the value. |
---|
48 | |
---|
49 | There are two kinds of entries in this table: immutable and mutable. For |
---|
50 | immutable entries, the URI represents a fixed chunk of data. The URI itself |
---|
51 | is derived from the data when it is uploaded into the grid, and can be used |
---|
52 | to locate and download that data from the grid at some time in the future. |
---|
53 | |
---|
54 | For mutable entries, the URI identifies a "slot" or "container", which can be |
---|
55 | filled with different pieces of data at different times. |
---|
56 | |
---|
57 | It is important to note that the values referenced by these URIs are just |
---|
58 | sequences of bytes, and that **no** filenames or other metadata is retained at |
---|
59 | this layer. The file store layer (which sits above the key-value store layer) |
---|
60 | is entirely responsible for directories and filenames and the like. |
---|
61 | |
---|
62 | CHK URIs |
---|
63 | -------- |
---|
64 | |
---|
65 | CHK (Content Hash Keyed) files are immutable sequences of bytes. They are |
---|
66 | uploaded in a distributed fashion using a "storage index" (for the "location" |
---|
67 | property), and encrypted using a "read key". A secure hash of the data is |
---|
68 | computed to help validate the data afterwards (providing the "identification" |
---|
69 | property). All of these pieces, plus information about the file's size and |
---|
70 | the number of shares into which it has been distributed, are put into the |
---|
71 | "CHK" uri. The storage index is derived by hashing the read key (using a |
---|
72 | tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be |
---|
73 | physically present in the URI. |
---|
74 | |
---|
75 | The current format for CHK URIs is the concatenation of the following |
---|
76 | strings:: |
---|
77 | |
---|
78 | URI:CHK:(key):(hash):(needed-shares):(total-shares):(size) |
---|
79 | |
---|
80 | Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the |
---|
81 | base32 encoding of the SHA-256 hash of the URI Extension Block, |
---|
82 | (needed-shares) is an ascii decimal representation of the number of shares |
---|
83 | required to reconstruct this file, (total-shares) is the same representation |
---|
84 | of the total number of shares created, and (size) is an ascii decimal |
---|
85 | representation of the size of the data represented by this URI. All base32 |
---|
86 | encodings are expressed in lower-case, with the trailing '=' signs removed. |
---|
87 | |
---|
88 | For example, the following is a CHK URI, generated from a previous version of |
---|
89 | the contents of :doc:`architecture.rst<../architecture>`:: |
---|
90 | |
---|
91 | URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733 |
---|
92 | |
---|
93 | Historical note: The name "CHK" is somewhat inaccurate and continues to be |
---|
94 | used for historical reasons. "Content Hash Key" means that the encryption key |
---|
95 | is derived by hashing the contents, which gives the useful property that |
---|
96 | encoding the same file twice will result in the same URI. However, this is an |
---|
97 | optional step: by passing a different flag to the appropriate API call, Tahoe |
---|
98 | will generate a random encryption key instead of hashing the file: this gives |
---|
99 | the useful property that the URI or storage index does not reveal anything |
---|
100 | about the file's contents (except filesize), which improves privacy. The |
---|
101 | URI:CHK: prefix really indicates that an immutable file is in use, without |
---|
102 | saying anything about how the key was derived. |
---|
103 | |
---|
104 | |
---|
105 | LIT URIs |
---|
106 | -------- |
---|
107 | |
---|
108 | LITeral files are also an immutable sequence of bytes, but they are so short |
---|
109 | that the data is stored inside the URI itself. These are used for files of 55 |
---|
110 | bytes or shorter, which is the point at which the LIT URI is the same length |
---|
111 | as a CHK URI would be. |
---|
112 | |
---|
113 | LIT URIs do not require an upload or download phase, as their data is stored |
---|
114 | directly in the URI. |
---|
115 | |
---|
116 | The format of a LIT URI is simply a fixed prefix concatenated with the base32 |
---|
117 | encoding of the file's data:: |
---|
118 | |
---|
119 | URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi |
---|
120 | |
---|
121 | The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte |
---|
122 | file that contains the string "hello" is "URI:LIT:nbswy3dp". |
---|
123 | |
---|
124 | Mutable File URIs |
---|
125 | ----------------- |
---|
126 | |
---|
127 | The other kind of DHT entry is the "mutable slot", in which the URI names a |
---|
128 | container to which data can be placed and retrieved without changing the |
---|
129 | identity of the container. |
---|
130 | |
---|
131 | These slots have write-caps (which allow read/write access), read-caps (which |
---|
132 | only allow read-access), and verify-caps (which allow a file checker/repairer |
---|
133 | to confirm that the contents exist, but does not let it decrypt the |
---|
134 | contents). |
---|
135 | |
---|
136 | Mutable slots use public key technology to provide data integrity, and put a |
---|
137 | hash of the public key in the URI. As a result, the data validation is |
---|
138 | limited to confirming that the data retrieved matches *some* data that was |
---|
139 | uploaded in the past, but not _which_ version of that data. |
---|
140 | |
---|
141 | The format of the write-cap for mutable files is:: |
---|
142 | |
---|
143 | URI:SSK:(writekey):(fingerprint) |
---|
144 | |
---|
145 | Where (writekey) is the base32 encoding of the 16-byte AES encryption key |
---|
146 | that is used to encrypt the RSA private key, and (fingerprint) is the base32 |
---|
147 | encoded 32-byte SHA-256 hash of the RSA public key. For more details about |
---|
148 | the way these keys are used, please see :doc:`mutable`. |
---|
149 | |
---|
150 | The format for mutable read-caps is:: |
---|
151 | |
---|
152 | URI:SSK-RO:(readkey):(fingerprint) |
---|
153 | |
---|
154 | The read-cap is just like the write-cap except it contains the other AES |
---|
155 | encryption key: the one used for encrypting the mutable file's contents. This |
---|
156 | second key is derived by hashing the writekey, which allows the holder of a |
---|
157 | write-cap to produce a read-cap, but not the other way around. The |
---|
158 | fingerprint is the same in both caps. |
---|
159 | |
---|
160 | Historical note: the "SSK" prefix is a perhaps-inaccurate reference to |
---|
161 | "Sub-Space Keys" from the Freenet project, which uses a vaguely similar |
---|
162 | structure to provide mutable file access. |
---|
163 | |
---|
164 | |
---|
165 | Directory URIs |
---|
166 | ============== |
---|
167 | |
---|
168 | The key-value store layer provides a mapping from URI to data. To turn this |
---|
169 | into a graph of directories and files, the "file store" layer (which sits on |
---|
170 | top of the key-value store layer) needs to keep track of "directory nodes", |
---|
171 | or "dirnodes" for short. :doc:`dirnodes` describes how these work. |
---|
172 | |
---|
173 | Dirnodes are contained inside mutable files, and are thus simply a particular |
---|
174 | way to interpret the contents of these files. As a result, a directory |
---|
175 | write-cap looks a lot like a mutable-file write-cap:: |
---|
176 | |
---|
177 | URI:DIR2:(writekey):(fingerprint) |
---|
178 | |
---|
179 | Likewise directory read-caps (which provide read-only access to the |
---|
180 | directory) look much like mutable-file read-caps:: |
---|
181 | |
---|
182 | URI:DIR2-RO:(readkey):(fingerprint) |
---|
183 | |
---|
184 | Historical note: the "DIR2" prefix is used because the non-distributed |
---|
185 | dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix. |
---|
186 | |
---|
187 | |
---|
188 | Internal Usage of URIs |
---|
189 | ====================== |
---|
190 | |
---|
191 | The classes in source:src/allmydata/uri.py are used to pack and unpack these |
---|
192 | various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and |
---|
193 | IDirnodeURI) which are implemented by these classes, and string-to-URI-class |
---|
194 | conversion routines have been registered as adapters, so that code which |
---|
195 | wants to extract e.g. the size of a CHK or LIT uri can do:: |
---|
196 | |
---|
197 | print IFileURI(uri).get_size() |
---|
198 | |
---|
199 | If the URI does not represent a CHK or LIT uri (for example, if it was for a |
---|
200 | directory instead), the adaptation will fail, raising a TypeError inside the |
---|
201 | IFileURI() call. |
---|
202 | |
---|
203 | Several utility methods are provided on these objects. The most important is |
---|
204 | ``to_string()``, which returns the string form of the URI. Therefore |
---|
205 | ``IURI(uri).to_string == uri`` is true for any valid URI. See the IURI class |
---|
206 | in source:src/allmydata/interfaces.py for more details. |
---|
207 | |
---|