source: trunk/docs/specifications/uri.rst

Last change on this file was eb1b51e, checked in by Brian Warner <warner@…>, at 2017-06-06T10:20:49Z

Clean up some remaining obsolete terminology. refs #2345

Signed-off-by: Daira Hopwood <daira@…>

  • Property mode set to 100644
File size: 8.9 KB
Line 
1.. -*- coding: utf-8-with-signature -*-
2
3==========
4Tahoe URIs
5==========
6
71.  `File URIs`_
8
9    1. `CHK URIs`_
10    2. `LIT URIs`_
11    3. `Mutable File URIs`_
12
132.  `Directory URIs`_
143.  `Internal Usage of URIs`_
15
16Each file and directory in a Tahoe-LAFS file store is described by a "URI".
17There are different kinds of URIs for different kinds of objects, and there
18are different kinds of URIs to provide different kinds of access to those
19objects. Each URI is a string representation of a "capability" or "cap", and
20there are read-caps, write-caps, verify-caps, and others.
21
22Each URI provides both ``location`` and ``identification`` properties.
23``location`` means that holding the URI is sufficient to locate the data it
24represents (this means it contains a storage index or a lookup key, whatever
25is necessary to find the place or places where the data is being kept).
26``identification`` means that the URI also serves to validate the data: an
27attacker who wants to trick you into into using the wrong data will be
28limited in their abilities by the identification properties of the URI.
29
30Some URIs are subsets of others. In particular, if you know a URI which
31allows you to modify some object, you can produce a weaker read-only URI and
32give it to someone else, and they will be able to read that object but not
33modify it. Directories, for example, have a read-cap which is derived from
34the write-cap: anyone with read/write access to the directory can produce a
35limited URI that grants read-only access, but not the other way around.
36
37src/allmydata/uri.py is the main place where URIs are processed. It is
38the authoritative definition point for all the the URI types described
39herein.
40
41File URIs
42=========
43
44The lowest layer of the Tahoe architecture (the "key-value store") is
45reponsible for mapping URIs to data. This is basically a distributed
46hash table, in which the URI is the key, and some sequence of bytes is
47the value.
48
49There are two kinds of entries in this table: immutable and mutable. For
50immutable entries, the URI represents a fixed chunk of data. The URI itself
51is derived from the data when it is uploaded into the grid, and can be used
52to locate and download that data from the grid at some time in the future.
53
54For mutable entries, the URI identifies a "slot" or "container", which can be
55filled with different pieces of data at different times.
56
57It is important to note that the values referenced by these URIs are just
58sequences of bytes, and that **no** filenames or other metadata is retained at
59this layer. The file store layer (which sits above the key-value store layer)
60is entirely responsible for directories and filenames and the like.
61
62CHK URIs
63--------
64
65CHK (Content Hash Keyed) files are immutable sequences of bytes. They are
66uploaded in a distributed fashion using a "storage index" (for the "location"
67property), and encrypted using a "read key". A secure hash of the data is
68computed to help validate the data afterwards (providing the "identification"
69property). All of these pieces, plus information about the file's size and
70the number of shares into which it has been distributed, are put into the
71"CHK" uri. The storage index is derived by hashing the read key (using a
72tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
73physically present in the URI.
74
75The current format for CHK URIs is the concatenation of the following
76strings::
77
78 URI:CHK:(key):(hash):(needed-shares):(total-shares):(size)
79
80Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the
81base32 encoding of the SHA-256 hash of the URI Extension Block,
82(needed-shares) is an ascii decimal representation of the number of shares
83required to reconstruct this file, (total-shares) is the same representation
84of the total number of shares created, and (size) is an ascii decimal
85representation of the size of the data represented by this URI. All base32
86encodings are expressed in lower-case, with the trailing '=' signs removed.
87
88For example, the following is a CHK URI, generated from a previous version of
89the contents of :doc:`architecture.rst<../architecture>`::
90
91 URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
92
93Historical note: The name "CHK" is somewhat inaccurate and continues to be
94used for historical reasons. "Content Hash Key" means that the encryption key
95is derived by hashing the contents, which gives the useful property that
96encoding the same file twice will result in the same URI. However, this is an
97optional step: by passing a different flag to the appropriate API call, Tahoe
98will generate a random encryption key instead of hashing the file: this gives
99the useful property that the URI or storage index does not reveal anything
100about the file's contents (except filesize), which improves privacy. The
101URI:CHK: prefix really indicates that an immutable file is in use, without
102saying anything about how the key was derived.
103
104
105LIT URIs
106--------
107
108LITeral files are also an immutable sequence of bytes, but they are so short
109that the data is stored inside the URI itself. These are used for files of 55
110bytes or shorter, which is the point at which the LIT URI is the same length
111as a CHK URI would be.
112
113LIT URIs do not require an upload or download phase, as their data is stored
114directly in the URI.
115
116The format of a LIT URI is simply a fixed prefix concatenated with the base32
117encoding of the file's data::
118
119 URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
120
121The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
122file that contains the string "hello" is "URI:LIT:nbswy3dp".
123
124Mutable File URIs
125-----------------
126
127The other kind of DHT entry is the "mutable slot", in which the URI names a
128container to which data can be placed and retrieved without changing the
129identity of the container.
130
131These slots have write-caps (which allow read/write access), read-caps (which
132only allow read-access), and verify-caps (which allow a file checker/repairer
133to confirm that the contents exist, but does not let it decrypt the
134contents).
135
136Mutable slots use public key technology to provide data integrity, and put a
137hash of the public key in the URI. As a result, the data validation is
138limited to confirming that the data retrieved matches *some* data that was
139uploaded in the past, but not _which_ version of that data.
140
141The format of the write-cap for mutable files is::
142
143 URI:SSK:(writekey):(fingerprint)
144
145Where (writekey) is the base32 encoding of the 16-byte AES encryption key
146that is used to encrypt the RSA private key, and (fingerprint) is the base32
147encoded 32-byte SHA-256 hash of the RSA public key. For more details about
148the way these keys are used, please see :doc:`mutable`.
149
150The format for mutable read-caps is::
151
152 URI:SSK-RO:(readkey):(fingerprint)
153
154The read-cap is just like the write-cap except it contains the other AES
155encryption key: the one used for encrypting the mutable file's contents. This
156second key is derived by hashing the writekey, which allows the holder of a
157write-cap to produce a read-cap, but not the other way around. The
158fingerprint is the same in both caps.
159
160Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
161"Sub-Space Keys" from the Freenet project, which uses a vaguely similar
162structure to provide mutable file access.
163
164
165Directory URIs
166==============
167
168The key-value store layer provides a mapping from URI to data. To turn this
169into a graph of directories and files, the "file store" layer (which sits on
170top of the key-value store layer) needs to keep track of "directory nodes",
171or "dirnodes" for short. :doc:`dirnodes` describes how these work.
172
173Dirnodes are contained inside mutable files, and are thus simply a particular
174way to interpret the contents of these files. As a result, a directory
175write-cap looks a lot like a mutable-file write-cap::
176
177 URI:DIR2:(writekey):(fingerprint)
178
179Likewise directory read-caps (which provide read-only access to the
180directory) look much like mutable-file read-caps::
181
182 URI:DIR2-RO:(readkey):(fingerprint)
183
184Historical note: the "DIR2" prefix is used because the non-distributed
185dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
186
187
188Internal Usage of URIs
189======================
190
191The classes in source:src/allmydata/uri.py are used to pack and unpack these
192various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and
193IDirnodeURI) which are implemented by these classes, and string-to-URI-class
194conversion routines have been registered as adapters, so that code which
195wants to extract e.g. the size of a CHK or LIT uri can do::
196
197 print IFileURI(uri).get_size()
198
199If the URI does not represent a CHK or LIT uri (for example, if it was for a
200directory instead), the adaptation will fail, raising a TypeError inside the
201IFileURI() call.
202
203Several utility methods are provided on these objects. The most important is
204``to_string()``, which returns the string form of the URI. Therefore
205``IURI(uri).to_string == uri`` is true for any valid URI. See the IURI class
206in source:src/allmydata/interfaces.py for more details.
207
Note: See TracBrowser for help on using the repository browser.