source: trunk/docs/specifications/http-storage-node-protocol.rst

Last change on this file was 2cfbfa8, checked in by blaisep <blaise@…>, at 2024-10-23T19:38:51Z

Fix 4116 and accept reviewer's feedback

  • Property mode set to 100644
File size: 48.8 KB
Line 
1.. -*- coding: utf-8 -*-
2
3Storage Node Protocol ("Great Black Swamp", "GBS")
4==================================================
5
6The target audience for this document is developers working on Tahoe-LAFS or on an alternate implementation intended to be interoperable.
7After reading this document,
8one should expect to understand how Tahoe-LAFS clients interact over the network with Tahoe-LAFS storage nodes.
9
10The primary goal of the introduction of this protocol is to simplify the task of implementing a Tahoe-LAFS storage server.
11Specifically, it should be possible to implement a Tahoe-LAFS storage server without a Foolscap implementation
12(substituting a simpler GBS server implementation).
13The Tahoe-LAFS client will also need to change but it is not expected that it will be noticably simplified by this change
14(though this may be the first step towards simplifying it).
15
16Glossary
17--------
18
19    `Foolscap <https://github.com/warner/foolscap/>`_
20        an RPC/RMI (Remote Procedure Call / Remote Method Invocation) protocol for use with Twisted
21
22    storage server
23        a Tahoe-LAFS process configured to offer storage and reachable over the network for store and retrieve operations
24
25    storage service
26        a Python object held in memory in the storage server which provides the implementation of the storage protocol
27
28    introducer
29        a Tahoe-LAFS process at a known location configured to re-publish announcements about the location of storage servers
30
31    :ref:`fURLs <fURLs>`
32        a self-authenticating URL-like string which can be used to locate a remote object using the Foolscap protocol (the storage service is an example of such an object)
33
34    :ref:`NURLs <NURLs>`
35        a self-authenticating URL-like string almost exactly like a fURL but without being tied to Foolscap
36
37    swissnum
38        a short random string which is part of a fURL/NURL and which acts as a shared secret to authorize clients to use a storage service
39
40    lease
41        state associated with a share informing a storage server of the duration of storage desired by a client
42
43    share
44        a single unit of client-provided arbitrary data to be stored by a storage server (in practice, one of the outputs of applying ZFEC encoding to some ciphertext with some additional metadata attached)
45
46    bucket
47        a group of one or more immutable shares held by a storage server and having a common storage index
48
49    slot
50        a group of one or more mutable shares held by a storage server and having a common storage index (sometimes "slot" is considered a synonym for "storage index of a slot")
51
52    storage index
53        a 16 byte string which can address a slot or a bucket (in practice, derived by hashing the encryption key associated with contents of that slot or bucket)
54
55    write enabler
56        a short secret string which storage servers require to be presented before allowing mutation of any mutable share
57
58    lease renew secret
59        a short secret string which storage servers required to be presented before allowing a particular lease to be renewed
60
61Additional terms related to the Tahoe-LAFS project in general are defined in the :doc:`../glossary`
62
63The key words
64"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL"
65in this document are to be interpreted as described in RFC 2119.
66
67Motivation
68----------
69
70Foolscap
71~~~~~~~~
72
73Foolscap is a remote method invocation protocol with several distinctive features.
74At its core it allows separate processes to refer each other's objects and methods using a capability-based model.
75This allows for extremely fine-grained access control in a system that remains highly securable without becoming overwhelmingly complicated.
76Supporting this is a flexible and extensible serialization system which allows data to be exchanged between processes in carefully controlled ways.
77
78Tahoe-LAFS avails itself of only a small portion of these features.
79A Tahoe-LAFS storage server typically only exposes one object with a fixed set of methods to clients.
80A Tahoe-LAFS introducer node does roughly the same.
81Tahoe-LAFS exchanges simple data structures that have many common, standard serialized representations.
82
83In exchange for this slight use of Foolscap's sophisticated mechanisms,
84Tahoe-LAFS pays a substantial price:
85
86* Foolscap is implemented only for Python.
87  Tahoe-LAFS is thus limited to being implemented only in Python.
88* There is only one Python implementation of Foolscap.
89  The implementation is therefore the de facto standard and understanding of the protocol often relies on understanding that implementation.
90* The Foolscap developer community is very small.
91  The implementation therefore advances very little and some non-trivial part of the maintenance cost falls on the Tahoe-LAFS project.
92* The extensible serialization system imposes substantial complexity compared to the simple data structures Tahoe-LAFS actually exchanges.
93
94HTTP
95~~~~
96
97HTTP is a request/response protocol that has become the lingua franca of the internet.
98Combined with the principles of Representational State Transfer (REST) it is widely employed to create, update, and delete data in collections on the internet.
99HTTP itself provides only modest functionality in comparison to Foolscap.
100However its simplicity and widespread use have led to a diverse and almost overwhelming ecosystem of libraries, frameworks, toolkits, and so on.
101
102By adopting HTTP in place of Foolscap Tahoe-LAFS can realize the following concrete benefits:
103
104* Practically every language or runtime has an HTTP protocol implementation (or a dozen of them) available.
105  This change paves the way for new Tahoe-LAFS implementations using tools better suited for certain situations
106  (mobile client implementations, high-performance server implementations, easily distributed desktop clients, etc).
107* The simplicity of and vast quantity of resources about HTTP make it a very easy protocol to learn and use.
108  This change reduces the barrier to entry for developers to contribute improvements to Tahoe-LAFS's network interactions.
109* For any given language there is very likely an HTTP implementation with a large and active developer community.
110  Tahoe-LAFS can therefore benefit from the large effort being put into making better libraries for using HTTP.
111* One of the core features of HTTP is the mundane transfer of bulk data and implementions are often capable of doing this with extreme efficiency.
112  The alignment of this core feature with a core activity of Tahoe-LAFS of transferring bulk data means that a substantial barrier to improved Tahoe-LAFS runtime performance will be eliminated.
113
114TLS
115~~~
116
117The Foolscap-based protocol provides *some* of Tahoe-LAFS's confidentiality, integrity, and authentication properties by leveraging TLS.
118An HTTP-based protocol can make use of TLS in largely the same way to provide the same properties.
119Provision of these properties *is* dependant on implementers following Great Black Swamp's rules for x509 certificate validation
120(rather than the standard "web" rules for validation).
121
122Design Requirements
123-------------------
124
125Security
126~~~~~~~~
127
128Summary
129!!!!!!!
130
131The storage node protocol should offer at minimum the security properties offered by the Foolscap-based protocol.
132The Foolscap-based protocol offers:
133
134* **Peer authentication** by way of checked x509 certificates
135* **Message authentication** by way of TLS
136* **Message confidentiality** by way of TLS
137
138  * A careful configuration of the TLS connection parameters *may* also offer **forward secrecy**.
139    However, Tahoe-LAFS' use of Foolscap takes no steps to ensure this is the case.
140
141* **Storage authorization** by way of a capability contained in the fURL addressing a storage service.
142
143Discussion
144!!!!!!!!!!
145
146A client node relies on a storage node to persist certain data until a future retrieval request is made.
147In this way, the client node is vulnerable to attacks which cause the data not to be persisted.
148Though this vulnerability can be (and typically is) mitigated by including redundancy in the share encoding parameters for stored data,
149it is still sensible to attempt to minimize unnecessary vulnerability to this attack.
150
151One way to do this is for the client to be confident the storage node with which it is communicating is really the expected node.
152That is, for the client to perform **peer authentication** of the storage node it connects to.
153This allows it to develop a notion of that node's reputation over time.
154The more retrieval requests the node satisfies correctly the more it probably will satisfy correctly.
155Therefore, the protocol must include some means for verifying the identify of the storage node.
156The initialization of the client with the correct identity information is out of scope for this protocol
157(the system may be trust-on-first-use, there may be a third-party identity broker, etc).
158
159With confidence that communication is proceeding with the intended storage node,
160it must also be possible to trust that data is exchanged without modification.
161That is, the protocol must include some means to perform **message authentication**.
162This is most likely done using cryptographic MACs (such as those used in TLS).
163
164The messages which enable the mutable shares feature include secrets related to those shares.
165For example, the write enabler secret is used to restrict the parties with write access to mutable shares.
166It is exchanged over the network as part of a write operation.
167An attacker learning this secret can overwrite share data with garbage
168(lacking a separate encryption key,
169there is no way to write data which appears legitimate to a legitimate client).
170Therefore, **message confidentiality** is necessary when exchanging these secrets.
171**Forward secrecy** is preferred so that an attacker recording an exchange today cannot launch this attack at some future point after compromising the necessary keys.
172
173A storage service offers service only to some clients.
174A client proves their authorization to use the storage service by presenting a shared secret taken from the fURL.
175In this way **storage authorization** is performed to prevent disallowed parties from consuming any storage resources.
176
177Functionality
178-------------
179
180Tahoe-LAFS application-level information must be transferred using this protocol.
181This information is exchanged with a dozen or so request/response-oriented messages.
182Some of these messages carry large binary payloads.
183Others are small structured-data messages.
184Some facility for expansion to support new information exchanges should also be present.
185
186Solutions
187---------
188
189An HTTP-based protocol, dubbed "Great Black Swamp" (or "GBS"), is described below.
190This protocol aims to satisfy the above requirements at a lower level of complexity than the current Foolscap-based protocol.
191
192Summary (Non-normative)
193~~~~~~~~~~~~~~~~~~~~~~~
194
195Communication with the storage node will take place using TLS.
196The TLS version and configuration will be dictated by an ongoing understanding of best practices.
197The storage node will present an x509 certificate during the TLS handshake.
198Storage clients will require that the certificate have a valid signature.
199The Subject Public Key Information (SPKI) hash of the certificate will constitute the storage node's identity.
200The **tub id** portion of the storage node fURL will be replaced with the SPKI hash.
201
202When connecting to a storage node,
203the client will take the following steps to gain confidence it has reached the intended peer:
204
205* It will perform the usual cryptographic verification of the certificate presented by the storage server.
206  That is,
207  it will check that the certificate itself is well-formed,
208  that it is currently valid [#]_,
209  and that the signature it carries is valid.
210* It will compare the SPKI hash of the certificate to the expected value.
211  The specifics of the comparison are the same as for the comparison specified by `RFC 7469`_ with "sha256" [#]_.
212
213To further clarify, consider this example.
214Alice operates a storage node.
215Alice generates a key pair and secures it properly.
216Alice generates a self-signed storage node certificate with the key pair.
217Alice's storage node announces (to an introducer) a NURL containing (among other information) the SPKI hash.
218Imagine the SPKI hash is ``i5xb...``.
219This results in a NURL of ``pb://i5xb...@example.com:443/g3m5...#v=1``.
220Bob creates a client node pointed at the same introducer.
221Bob's client node receives the announcement from Alice's storage node
222(indirected through the introducer).
223
224Bob's client node recognizes the NURL as referring to an HTTP-dialect server due to the ``v=1`` fragment.
225Bob's client node can now perform a TLS handshake with a server at the address in the NURL location hints
226(``example.com:443`` in this example).
227Following the above described validation procedures,
228Bob's client node can determine whether it has reached Alice's storage node or not.
229If and only if the validation procedure is successful does Bob's client node conclude it has reached Alice's storage node.
230**Peer authentication** has been achieved.
231
232Additionally,
233by continuing to interact using TLS,
234Bob's client and Alice's storage node are assured of both **message authentication** and **message confidentiality**.
235
236Bob's client further inspects the NURL for the *swissnum*.
237When Bob's client issues HTTP requests to Alice's storage node it includes the *swissnum* in its requests.
238**Storage authorization** has been achieved.
239
240.. note::
241
242   Foolscap TubIDs are 20 bytes (SHA1 digest of the certificate).
243   They are encoded with `Base32`_ for a length of 32 bytes.
244   SPKI information discussed here is 32 bytes (SHA256 digest).
245   They would be encoded in `Base32`_ for a length of 52 bytes.
246   `unpadded base64url`_ provides a more compact encoding of the information while remaining URL-compatible.
247   This would encode the SPKI information for a length of merely 43 bytes.
248   SHA1,
249   the current Foolscap hash function,
250   is not a practical choice at this time due to advances made in `attacking SHA1`_.
251   The selection of a safe hash function with output smaller than SHA256 could be the subject of future improvements.
252   A 224 bit hash function (SHA3-224, for example) might be suitable -
253   improving the encoded length to 38 bytes.
254
255
256Transition
257~~~~~~~~~~
258
259To provide a seamless user experience during this protocol transition,
260there should be a period during which both protocols are supported by storage nodes.
261The GBS announcement will be introduced in a way that *updated client* software can recognize.
262Its introduction will also be made in such a way that *non-updated client* software disregards the new information
263(of which it cannot make any use).
264
265Storage nodes will begin to operate a new GBS server.
266They may re-use their existing x509 certificate or generate a new one.
267Generation of a new certificate allows for certain non-optimal conditions to be addressed:
268
269* The ``commonName`` of ``newpb_thingy`` may be changed to a more descriptive value.
270* A ``notValidAfter`` field with a timestamp in the past may be updated.
271
272Storage nodes will announce a new NURL for this new HTTP-based server.
273This NURL will be announced alongside their existing Foolscap-based server's fURL.
274Such an announcement will resemble this::
275
276  {
277      "anonymous-storage-FURL": "pb://...",          # The old entry
278      "anonymous-storage-NURLs": ["pb://...#v=1"]    # The new, additional entry
279  }
280
281The transition process will proceed in three stages:
282
2831. The first stage represents the starting conditions in which clients and servers can speak only Foolscap.
284#. The intermediate stage represents a condition in which some clients and servers can both speak Foolscap and GBS.
285#. The final stage represents the desired condition in which all clients and servers speak only GBS.
286
287During the first stage only one client/server interaction is possible:
288the storage server announces only Foolscap and speaks only Foolscap.
289During the final stage there is only one supported interaction:
290the client and server are both updated and speak GBS to each other.
291
292During the intermediate stage there are four supported interactions:
293
2941. Both the client and server are non-updated.
295   The interaction is just as it would be during the first stage.
296#. The client is updated and the server is non-updated.
297   The client will see the Foolscap announcement and the lack of a GBS announcement.
298   It will speak to the server using Foolscap.
299#. The client is non-updated and the server is updated.
300   The client will see the Foolscap announcement.
301   It will speak Foolscap to the storage server.
302#. Both the client and server are updated.
303   The client will see the GBS announcement and disregard the Foolscap announcement.
304   It will speak GBS to the server.
305
306There is one further complication:
307the client maintains a cache of storage server information
308(to avoid continuing to rely on the introducer after it has been introduced).
309The follow sequence of events is likely:
310
3111. The client connects to an introducer.
312#. It receives an announcement for a non-updated storage server (Foolscap only).
313#. It caches this announcement.
314#. At some point, the storage server is updated.
315#. The client uses the information in its cache to open a Foolscap connection to the storage server.
316
317Ideally,
318the client would not rely on an update from the introducer to give it the GBS NURL for the updated storage server.
319In practice, we have decided not to implement this functionality.
320
321Server Details
322--------------
323
324The protocol primarily enables interaction with "resources" of two types:
325storage indexes
326and shares.
327A particular resource is addressed by the HTTP request path.
328Details about the interface are encoded in the HTTP message body.
329
330String Encoding
331~~~~~~~~~~~~~~~
332
333.. _Base32:
334
335Base32
336!!!!!!
337
338Where the specification refers to Base32 the meaning is *unpadded* Base32 encoding as specified by `RFC 4648`_ using a *lowercase variation* of the alphabet from Section 6.
339
340That is, the alphabet is:
341
342.. list-table:: Base32 Alphabet
343   :header-rows: 1
344
345   * - Value
346     - Encoding
347     - Value
348     - Encoding
349     - Value
350     - Encoding
351     - Value
352     - Encoding
353
354   * - 0
355     - a
356     - 9
357     - j
358     - 18
359     - s
360     - 27
361     - 3
362   * - 1
363     - b
364     - 10
365     - k
366     - 19
367     - t
368     - 28
369     - 4
370   * - 2
371     - c
372     - 11
373     - l
374     - 20
375     - u
376     - 29
377     - 5
378   * - 3
379     - d
380     - 12
381     - m
382     - 21
383     - v
384     - 30
385     - 6
386   * - 4
387     - e
388     - 13
389     - n
390     - 22
391     - w
392     - 31
393     - 7
394   * - 5
395     - f
396     - 14
397     - o
398     - 23
399     - x
400     -
401     -
402   * - 6
403     - g
404     - 15
405     - p
406     - 24
407     - y
408     -
409     -
410   * - 7
411     - h
412     - 16
413     - q
414     - 25
415     - z
416     -
417     -
418   * - 8
419     - i
420     - 17
421     - r
422     - 26
423     - 2
424     -
425     -
426
427Message Encoding
428~~~~~~~~~~~~~~~~
429
430Clients and servers MUST use the ``Content-Type`` and ``Accept`` header fields as specified in `RFC 9110`_ for message body negotiation.
431
432The encoding for HTTP message bodies SHOULD be `CBOR`_.
433Clients submitting requests using this encoding MUST include a ``Content-Type: application/cbor`` request header field.
434A request MAY be submitted using an alternate encoding by declaring this in the ``Content-Type`` header field.
435A request MAY indicate its preference for an alternate encoding in the response using the ``Accept`` header field.
436A request which includes no ``Accept`` header field MUST be interpreted in the same way as a request including a ``Accept: application/cbor`` header field.
437
438Clients and servers MAY support additional request and response message body encodings.
439
440Clients and servers SHOULD support ``application/json`` request and response message body encoding.
441For HTTP messages carrying binary share data,
442this is expected to be a particularly poor encoding.
443However,
444for HTTP messages carrying small payloads of strings, numbers, and containers
445it is expected that JSON will be more convenient than CBOR for ad hoc testing and manual interaction.
446
447For this same reason,
448JSON is used throughout for the examples presented here.
449Because of the simple types used throughout
450and the equivalence described in `RFC 7049`_
451these examples should be representative regardless of which of these two encodings is chosen.
452
453There are two exceptions to this rule.
454
4551. Sets
456!!!!!!!
457
458For CBOR messages,
459any sequence that is semantically a set (i.e. no repeated values allowed, order doesn't matter, and elements are hashable in Python) should be sent as a set.
460Tag 6.258 is used to indicate sets in CBOR;
461see `the CBOR registry <https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>`_ for more details.
462The JSON encoding does not support sets.
463Sets MUST be represented as arrays in JSON-encoded messages.
464
4652. Bytes
466!!!!!!!!
467
468The CBOR encoding natively supports a bytes type while the JSON encoding does not.
469Bytes MUST be represented as strings giving the `Base64`_ representation of the original bytes value.
470
471HTTP Design
472~~~~~~~~~~~
473
474The HTTP interface described here is informed by the ideas of REST
475(Representational State Transfer).
476For ``GET`` requests query parameters are preferred over values encoded in the request body.
477For other requests query parameters are encoded into the message body.
478
479Many branches of the resource tree are conceived as homogenous containers:
480one branch contains all of the share data;
481another branch contains all of the lease data;
482etc.
483
484Clients and servers MUST use the ``Authorization`` header field,
485as specified in `RFC 9110`_,
486for authorization of all requests to all endpoints specified here.
487The authentication *type* MUST be ``Tahoe-LAFS``.
488Clients MUST present the `Base64`_-encoded representation of the swissnum from the NURL used to locate the storage service as the *credentials*.
489
490If credentials are not presented or the swissnum is not associated with a storage service then the server MUST issue a ``401 UNAUTHORIZED`` response and perform no other processing of the message.
491
492Requests to certain endpoints MUST include additional secrets in the ``X-Tahoe-Authorization`` headers field.
493The endpoints which require these secrets are:
494
495* ``PUT /storage/v1/lease/:storage_index``:
496  The secrets included MUST be ``lease-renew-secret`` and ``lease-cancel-secret``.
497
498* ``POST /storage/v1/immutable/:storage_index``:
499  The secrets included MUST be ``lease-renew-secret``, ``lease-cancel-secret``, and ``upload-secret``.
500
501* ``PATCH /storage/v1/immutable/:storage_index/:share_number``:
502  The secrets included MUST be ``upload-secret``.
503
504* ``PUT /storage/v1/immutable/:storage_index/:share_number/abort``:
505  The secrets included MUST be ``upload-secret``.
506
507* ``POST /storage/v1/mutable/:storage_index/read-test-write``:
508  The secrets included MUST be ``lease-renew-secret``, ``lease-cancel-secret``, and ``write-enabler``.
509
510If these secrets are:
511
5121. Missing.
5132. The wrong length.
5143. Not the expected kind of secret.
5154. They are otherwise unparseable before they are actually semantically used.
516
517the server MUST respond with ``400 BAD REQUEST`` and perform no other processing of the message.
518401 is not used because this isn't an authorization problem, this is a "you sent garbage and should know better" bug.
519
520If authorization using the secret fails,
521then the server MUST send a ``401 UNAUTHORIZED`` response and perform no other processing of the message.
522
523Encoding
524~~~~~~~~
525
526* ``storage_index`` MUST be `Base32`_ encoded in URLs.
527* ``share_number`` MUST be a decimal representation
528
529General
530~~~~~~~
531
532``GET /storage/v1/version``
533!!!!!!!!!!!!!!!!!!!!!!!!!!!
534
535This endpoint allows clients to retrieve some basic metadata about a storage server from the storage service.
536The response MUST validate against this CDDL schema::
537
538  {'http://allmydata.org/tahoe/protocols/storage/v1' => {
539      'maximum-immutable-share-size' => uint
540      'maximum-mutable-share-size' => uint
541      'available-space' => uint
542      }
543   'application-version' => bstr
544  }
545
546The server SHOULD populate as many fields as possible with accurate information about its behavior.
547
548For fields which relate to a specific API
549the semantics are documented below in the section for that API.
550For fields that are more general than a single API the semantics are as follows:
551
552* available-space:
553  The server SHOULD use this field to advertise the amount of space that it currently considers unused and is willing to allocate for client requests.
554  The value is a number of bytes.
555
556
557``PUT /storage/v1/lease/:storage_index``
558!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
559
560Either renew or create a new lease on the bucket addressed by ``storage_index``.
561
562The renew secret and cancellation secret should be included as ``X-Tahoe-Authorization`` headers.
563For example::
564
565    X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
566    X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
567
568If the ``lease-renew-secret`` value matches an existing lease
569then the expiration time of that lease will be changed to 31 days after the time of this operation.
570If it does not match an existing lease
571then a new lease will be created with this ``lease-renew-secret`` which expires 31 days after the time of this operation.
572
573``lease-renew-secret`` and ``lease-cancel-secret`` values must be 32 bytes long.
574The server treats them as opaque values.
575:ref:`Share Leases` gives details about how the Tahoe-LAFS storage client constructs these values.
576
577In these cases the response is ``NO CONTENT`` with an empty body.
578
579It is possible that the storage server will have no shares for the given ``storage_index`` because:
580
581* no such shares have ever been uploaded.
582* a previous lease expired and the storage server reclaimed the storage by deleting the shares.
583
584In these cases the server takes no action and returns ``NOT FOUND``.
585
586
587Discussion
588``````````
589
590We considered an alternative where ``lease-renew-secret`` and ``lease-cancel-secret`` are placed in query arguments on the request path.
591This increases chances of leaking secrets in logs.
592Putting the secrets in the body reduces the chances of leaking secrets,
593but eventually we chose headers as the least likely information to be logged.
594
595Several behaviors here are blindly copied from the Foolscap-based storage server protocol.
596
597* There is a cancel secret but there is no API to use it to cancel a lease (see ticket:3768).
598* The lease period is hard-coded at 31 days.
599
600These are not necessarily ideal behaviors
601but they are adopted to avoid any *semantic* changes between the Foolscap- and HTTP-based protocols.
602It is expected that some or all of these behaviors may change in a future revision of the HTTP-based protocol.
603
604Immutable
605---------
606
607Writing
608~~~~~~~
609
610``POST /storage/v1/immutable/:storage_index``
611!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
612
613Initialize an immutable storage index with some buckets.
614The server MUST allow share data to be written to the buckets at most one time.
615The server MAY create a lease for the buckets.
616Details of the buckets to create are encoded in the request body.
617The request body MUST validate against this CDDL schema::
618
619  {
620    share-numbers: #6.258([0*256 uint])
621    allocated-size: uint
622  }
623
624For example::
625
626  {"share-numbers": [1, 7, ...], "allocated-size": 12345}
627
628The server SHOULD accept a value for **allocated-size** that is less than or equal to the lesser of the values of the server's version message's **maximum-immutable-share-size** or **available-space** values.
629
630The request MUST include ``X-Tahoe-Authorization`` HTTP headers that set the various secrets—upload, lease renewal, lease cancellation—that will be later used to authorize various operations.
631For example::
632
633   X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
634   X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
635   X-Tahoe-Authorization: upload-secret <base64-upload-secret>
636
637The response body MUST include encoded information about the created buckets.
638The response body MUST validate against this CDDL schema::
639
640  {
641    already-have: #6.258([0*256 uint])
642    allocated: #6.258([0*256 uint])
643  }
644
645For example::
646
647  {"already-have": [1, ...], "allocated": [7, ...]}
648
649The upload secret is an opaque _byte_ string.
650
651Handling repeat calls:
652
653* If the same API call is repeated with the same upload secret, the response is the same and no change is made to server state.
654  This is necessary to ensure retries work in the face of lost responses from the server.
655* If the API calls is with a different upload secret, this implies a new client, perhaps because the old client died.
656  Or it may happen because the client wants to upload a different share number than a previous client.
657  New shares will be created, existing shares will be unchanged, regardless of whether the upload secret matches or not.
658
659Discussion
660``````````
661
662We considered making this ``POST /storage/v1/immutable`` instead.
663The motivation was to keep *storage index* out of the request URL.
664Request URLs have an elevated chance of being logged by something.
665We were concerned that having the *storage index* logged may increase some risks.
666However, we decided this does not matter because:
667
668* the *storage index* can only be used to retrieve (not decrypt) the ciphertext-bearing share.
669* the *storage index* is already persistently present on the storage node in the form of directory names in the storage servers ``shares`` directory.
670* the request is made via HTTPS and so only Tahoe-LAFS can see the contents,
671  therefore no proxy servers can perform any extra logging.
672* Tahoe-LAFS itself does not currently log HTTP request URLs.
673
674The response includes ``already-have`` and ``allocated`` for two reasons:
675
676* If an upload is interrupted and the client loses its local state that lets it know it already uploaded some shares
677  then this allows it to discover this fact (by inspecting ``already-have``) and only upload the missing shares (indicated by ``allocated``).
678
679* If an upload has completed a client may still choose to re-balance storage by moving shares between servers.
680  This might be because a server has become unavailable and a remaining server needs to store more shares for the upload.
681  It could also just be that the client's preferred servers have changed.
682
683Regarding upload secrets,
684the goal is for uploading and aborting (see next sections) to be authenticated by more than just the storage index.
685In the future, we may want to generate them in a way that allows resuming/canceling when the client has issues.
686In the short term, they can just be a random byte string.
687The primary security constraint is that each upload to each server has its own unique upload key,
688tied to uploading that particular storage index to this particular server.
689
690Rejected designs for upload secrets:
691
692* Upload secret per share number.
693  In order to make the secret unguessable by attackers, which includes other servers,
694  it must contain randomness.
695  Randomness means there is no need to have a secret per share, since adding share-specific content to randomness doesn't actually make the secret any better.
696
697``PATCH /storage/v1/immutable/:storage_index/:share_number``
698!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
699
700Write data for the indicated share.
701The share number MUST belong to the storage index.
702The request body MUST be the raw share data (i.e., ``application/octet-stream``).
703The request MUST include a *Content-Range* header field;
704for large transfers this allows partially complete uploads to be resumed.
705
706For example,
707a 1MiB share can be divided in to eight separate 128KiB chunks.
708Each chunk can be uploaded in a separate request.
709Each request can include a *Content-Range* value indicating its placement within the complete share.
710If any one of these requests fails then at most 128KiB of upload work needs to be retried.
711
712The server MUST recognize when all of the data has been received and mark the share as complete
713(which it can do because it was informed of the size when the storage index was initialized).
714
715The request MUST include a ``X-Tahoe-Authorization`` header that includes the upload secret::
716
717    X-Tahoe-Authorization: upload-secret <base64-upload-secret>
718
719Responses:
720
721* When a chunk that does not complete the share is successfully uploaded the response MUST be ``OK``.
722  The response body MUST indicate the range of share data that has yet to be uploaded.
723  The response body MUST validate against this CDDL schema::
724
725    {
726      required: [0* {begin: uint, end: uint}]
727    }
728
729  For example::
730
731    { "required":
732      [ { "begin": <byte position, inclusive>
733        , "end":   <byte position, exclusive>
734        }
735      ,
736      ...
737      ]
738    }
739
740* When the chunk that completes the share is successfully uploaded the response MUST be ``CREATED``.
741* If the *Content-Range* for a request covers part of the share that has already,
742  and the data does not match already written data,
743  the response MUST be ``CONFLICT``.
744  In this case the client MUST abort the upload.
745  The client MAY then restart the upload from scratch.
746
747Discussion
748``````````
749
750``PUT`` verbs are only supposed to be used to replace the whole resource,
751thus the use of ``PATCH``.
752From RFC 7231::
753
754   An origin server that allows PUT on a given target resource MUST send
755   a 400 (Bad Request) response to a PUT request that contains a
756   Content-Range header field (Section 4.2 of [RFC7233]), since the
757   payload is likely to be partial content that has been mistakenly PUT
758   as a full representation.  Partial content updates are possible by
759   targeting a separately identified resource with state that overlaps a
760   portion of the larger resource, or by using a different method that
761   has been specifically defined for partial updates (for example, the
762   PATCH method defined in [RFC5789]).
763
764
765
766``PUT /storage/v1/immutable/:storage_index/:share_number/abort``
767!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
768
769This cancels an *in-progress* upload.
770
771The request MUST include a ``X-Tahoe-Authorization`` header that includes the upload secret::
772
773    X-Tahoe-Authorization: upload-secret <base64-upload-secret>
774
775If there is an incomplete upload with a matching upload-secret then the server MUST consider the abort to have succeeded.
776In this case the response MUST be ``OK``.
777The server MUST respond to all future requests as if the operations related to this upload did not take place.
778
779If there is no incomplete upload with a matching upload-secret then the server MUST respond with ``Method Not Allowed`` (405).
780The server MUST make no client-visible changes to its state in this case.
781
782``POST /storage/v1/immutable/:storage_index/:share_number/corrupt``
783!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
784
785Advise the server the data read from the indicated share was corrupt.
786The request body includes an human-meaningful text string with details about the corruption.
787It also includes potentially important details about the share.
788The request body MUST validate against this CDDL schema::
789
790  {
791    reason: tstr .size (1..32765)
792  }
793
794For example::
795
796  {"reason": "expected hash abcd, got hash efgh"}
797
798The report pertains to the immutable share with a **storage index** and **share number** given in the request path.
799If the identified **storage index** and **share number** are known to the server then the response SHOULD be accepted and made available to server administrators.
800In this case the response SHOULD be ``OK``.
801If the response is not accepted then the response SHOULD be ``Not Found`` (404).
802
803Discussion
804``````````
805
806The seemingly odd length limit on ``reason`` is chosen so that the *encoded* representation of the message is limited to 32768.
807
808Reading
809~~~~~~~
810
811``GET /storage/v1/immutable/:storage_index/shares``
812!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
813
814Retrieve a list (semantically, a set) indicating all shares available for the indicated storage index.
815The response body MUST validate against this CDDL schema::
816
817  #6.258([0*256 uint])
818
819For example::
820
821  [1, 5]
822
823If the **storage index** in the request path is not known to the server then the response MUST include an empty list.
824
825``GET /storage/v1/immutable/:storage_index/:share_number``
826!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
827
828Read a contiguous sequence of bytes from one share in one bucket.
829The response body MUST be the raw share data (i.e., ``application/octet-stream``).
830The ``Range`` header MAY be used to request exactly one ``bytes`` range,
831in which case the response code MUST be ``Partial Content`` (206).
832Interpretation and response behavior MUST be as specified in RFC 7233 § 4.1.
833Multiple ranges in a single request are *not* supported;
834open-ended ranges are also not supported.
835Clients MUST NOT send requests using these features.
836
837If the response reads beyond the end of the data,
838the response MUST be shorter than the requested range.
839It MUST contain all data up to the end of the share and then end.
840The resulting ``Content-Range`` header MUST be consistent with the returned data.
841
842If the response to a query is an empty range,
843the server MUST send a ``No Content`` (204) response.
844
845Discussion
846``````````
847
848Multiple ``bytes`` ranges are not supported.
849HTTP requires that the ``Content-Type`` of the response in that case be ``multipart/...``.
850The ``multipart`` major type brings along string sentinel delimiting as a means to frame the different response parts.
851There are many drawbacks to this framing technique:
852
8531. It is resource-intensive to generate.
8542. It is resource-intensive to parse.
8553. It is complex to parse safely [#]_ [#]_ [#]_ [#]_.
856
857A previous revision of this specification allowed requesting one or more contiguous sequences from one or more shares.
858This *superficially* mirrored the Foolscap based interface somewhat closely.
859The interface was simplified to this version because this version is all that is required to let clients retrieve any desired information.
860It only requires that the client issue multiple requests.
861This can be done with pipelining or parallel requests to avoid an additional latency penalty.
862In the future,
863if there are performance goals,
864benchmarks can demonstrate whether they are achieved by a more complicated interface or some other change.
865
866Mutable
867-------
868
869Writing
870~~~~~~~
871
872``POST /storage/v1/mutable/:storage_index/read-test-write``
873!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
874
875General purpose read-test-and-write operation for mutable storage indexes.
876A mutable storage index is also called a "slot"
877(particularly by the existing Tahoe-LAFS codebase).
878The first write operation on a mutable storage index creates it
879(that is,
880there is no separate "create this storage index" operation as there is for the immutable storage index type).
881
882The request MUST include ``X-Tahoe-Authorization`` headers with write enabler and lease secrets::
883
884    X-Tahoe-Authorization: write-enabler <base64-write-enabler-secret>
885    X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
886    X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
887
888The request body MUST include test, read, and write vectors for the operation.
889The request body MUST validate against this CDDL schema::
890
891  {
892    "test-write-vectors": {
893      0*256 share_number : {
894        "test": [0*30 {"offset": uint, "size": uint, "specimen": bstr}]
895        "write": [* {"offset": uint, "data": bstr}]
896        "new-length": uint / null
897      }
898    }
899    "read-vector": [0*30 {"offset": uint, "size": uint}]
900  }
901  share_number = uint
902
903For example::
904
905   {
906       "test-write-vectors": {
907           0: {
908               "test": [{
909                   "offset": 3,
910                   "size": 5,
911                   "specimen": "hello"
912               }, ...],
913               "write": [{
914                   "offset": 9,
915                   "data": "world"
916               }, ...],
917               "new-length": 5
918           }
919       },
920       "read-vector": [{"offset": 3, "size": 12}, ...]
921   }
922
923The response body contains a boolean indicating whether the tests all succeed
924(and writes were applied) and a mapping giving read data (pre-write).
925The response body MUST validate against this CDDL schema::
926
927  {
928    "success": bool,
929    "data": {0*256 share_number: [0* bstr]}
930  }
931  share_number = uint
932
933For example::
934
935  {
936      "success": true,
937      "data": {
938          0: ["foo"],
939          5: ["bar"],
940          ...
941      }
942  }
943
944A client MAY send a test vector or read vector to bytes beyond the end of existing data.
945In this case a server MUST behave as if the test or read vector referred to exactly as much data exists.
946
947For example,
948consider the case where the server has 5 bytes of data for a particular share.
949If a client sends a read vector with an ``offset`` of 1 and a ``size`` of 4 then the server MUST respond with all of the data except the first byte.
950If a client sends a read vector with the same ``offset`` and a ``size`` of 5 (or any larger value) then the server MUST respond in the same way.
951
952Similarly,
953if there is no data at all,
954an empty byte string is returned no matter what the offset or length.
955
956Reading
957~~~~~~~
958
959``GET /storage/v1/mutable/:storage_index/shares``
960!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
961
962Retrieve a set indicating all shares available for the indicated storage index.
963The response body MUST validate against this CDDL schema::
964
965  #6.258([0*256 uint])
966
967For example::
968
969  [1, 5]
970
971``GET /storage/v1/mutable/:storage_index/:share_number``
972!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
973
974Read data from the indicated mutable shares, just like ``GET /storage/v1/immutable/:storage_index``.
975
976The response body MUST be the raw share data (i.e., ``application/octet-stream``).
977The ``Range`` header MAY be used to request exactly one ``bytes`` range,
978in which case the response code MUST be ``Partial Content`` (206).
979Interpretation and response behavior MUST be specified in RFC 7233 § 4.1.
980Multiple ranges in a single request are *not* supported;
981open-ended ranges are also not supported.
982Clients MUST NOT send requests using these features.
983
984If the response reads beyond the end of the data,
985the response MUST be shorter than the requested range.
986It MUST contain all data up to the end of the share and then end.
987The resulting ``Content-Range`` header MUST be consistent with the returned data.
988
989If the response to a query is an empty range,
990the server MUST send a ``No Content`` (204) response.
991
992
993``POST /storage/v1/mutable/:storage_index/:share_number/corrupt``
994!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
995
996Advise the server the data read from the indicated share was corrupt.
997Just like the immutable version.
998
999Sample Interactions
1000-------------------
1001
1002This section contains examples of client/server interactions to help illuminate the above specification.
1003This section is non-normative.
1004
1005Immutable Data
1006~~~~~~~~~~~~~~
1007
10081. Create a bucket for storage index ``AAAAAAAAAAAAAAAA`` to hold two immutable shares, discovering that share ``1`` was already uploaded::
1009
1010     POST /storage/v1/immutable/AAAAAAAAAAAAAAAA
1011     Authorization: Tahoe-LAFS nurl-swissnum
1012     X-Tahoe-Authorization: lease-renew-secret efgh
1013     X-Tahoe-Authorization: lease-cancel-secret jjkl
1014     X-Tahoe-Authorization: upload-secret xyzf
1015
1016     {"share-numbers": [1, 7], "allocated-size": 48}
1017
1018     200 OK
1019     {"already-have": [1], "allocated": [7]}
1020
1021#. Upload the content for immutable share ``7``::
1022
1023     PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
1024     Authorization: Tahoe-LAFS nurl-swissnum
1025     Content-Range: bytes 0-15/48
1026     X-Tahoe-Authorization: upload-secret xyzf
1027     <first 16 bytes of share data>
1028
1029     200 OK
1030     { "required": [ {"begin": 16, "end": 48 } ] }
1031
1032     PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
1033     Authorization: Tahoe-LAFS nurl-swissnum
1034     Content-Range: bytes 16-31/48
1035     X-Tahoe-Authorization: upload-secret xyzf
1036     <second 16 bytes of share data>
1037
1038     200 OK
1039     { "required": [ {"begin": 32, "end": 48 } ] }
1040
1041     PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
1042     Authorization: Tahoe-LAFS nurl-swissnum
1043     Content-Range: bytes 32-47/48
1044     X-Tahoe-Authorization: upload-secret xyzf
1045     <final 16 bytes of share data>
1046
1047     201 CREATED
1048
1049#. Download the content of the previously uploaded immutable share ``7``::
1050
1051     GET /storage/v1/immutable/AAAAAAAAAAAAAAAA?share=7
1052     Authorization: Tahoe-LAFS nurl-swissnum
1053     Range: bytes=0-47
1054
1055     200 OK
1056     Content-Range: bytes 0-47/48
1057     <complete 48 bytes of previously uploaded data>
1058
1059#. Renew the lease on all immutable shares in bucket ``AAAAAAAAAAAAAAAA``::
1060
1061     PUT /storage/v1/lease/AAAAAAAAAAAAAAAA
1062     Authorization: Tahoe-LAFS nurl-swissnum
1063     X-Tahoe-Authorization: lease-cancel-secret jjkl
1064     X-Tahoe-Authorization: lease-renew-secret efgh
1065
1066     204 NO CONTENT
1067
1068Mutable Data
1069~~~~~~~~~~~~
1070
10711. Create mutable share number ``3`` with ``10`` bytes of data in slot ``BBBBBBBBBBBBBBBB``.
1072The special test vector of size 1 but empty bytes will only pass
1073if there is no existing share,
1074otherwise it will read a byte which won't match `b""`::
1075
1076     POST /storage/v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
1077     Authorization: Tahoe-LAFS nurl-swissnum
1078     X-Tahoe-Authorization: write-enabler abcd
1079     X-Tahoe-Authorization: lease-cancel-secret efgh
1080     X-Tahoe-Authorization: lease-renew-secret ijkl
1081
1082     {
1083         "test-write-vectors": {
1084             3: {
1085                 "test": [{
1086                     "offset": 0,
1087                     "size": 1,
1088                     "specimen": ""
1089                 }],
1090                 "write": [{
1091                     "offset": 0,
1092                     "data": "xxxxxxxxxx"
1093                 }],
1094                 "new-length": 10
1095             }
1096         },
1097         "read-vector": []
1098     }
1099
1100     200 OK
1101     {
1102         "success": true,
1103         "data": []
1104     }
1105
1106#. Safely rewrite the contents of a known version of mutable share number ``3`` (or fail)::
1107
1108     POST /storage/v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
1109     Authorization: Tahoe-LAFS nurl-swissnum
1110     X-Tahoe-Authorization: write-enabler abcd
1111     X-Tahoe-Authorization: lease-cancel-secret efgh
1112     X-Tahoe-Authorization: lease-renew-secret ijkl
1113
1114     {
1115         "test-write-vectors": {
1116             3: {
1117                 "test": [{
1118                     "offset": 0,
1119                     "size": <length of checkstring>,
1120                     "specimen": "<checkstring>"
1121                 }],
1122                 "write": [{
1123                     "offset": 0,
1124                     "data": "yyyyyyyyyy"
1125                 }],
1126                 "new-length": 10
1127             }
1128         },
1129         "read-vector": []
1130     }
1131
1132     200 OK
1133     {
1134         "success": true,
1135         "data": []
1136     }
1137
1138#. Download the contents of share number ``3``::
1139
1140     GET /storage/v1/mutable/BBBBBBBBBBBBBBBB?share=3
1141     Authorization: Tahoe-LAFS nurl-swissnum
1142     Range: bytes=0-16
1143
1144     200 OK
1145     Content-Range: bytes 0-15/16
1146     <complete 16 bytes of previously uploaded data>
1147
1148#. Renew the lease on previously uploaded mutable share in slot ``BBBBBBBBBBBBBBBB``::
1149
1150     PUT /storage/v1/lease/BBBBBBBBBBBBBBBB
1151     Authorization: Tahoe-LAFS nurl-swissnum
1152     X-Tahoe-Authorization: lease-cancel-secret efgh
1153     X-Tahoe-Authorization: lease-renew-secret ijkl
1154
1155     204 NO CONTENT
1156
1157.. _Base64: https://www.rfc-editor.org/rfc/rfc4648#section-4
1158
1159.. _RFC 4648: https://tools.ietf.org/html/rfc4648
1160
1161.. _RFC 7469: https://tools.ietf.org/html/rfc7469#section-2.4
1162
1163.. _RFC 7049: https://tools.ietf.org/html/rfc7049#section-4
1164
1165.. _RFC 9110: https://tools.ietf.org/html/rfc9110
1166
1167.. _CBOR: http://cbor.io/
1168
1169.. [#]
1170   The security value of checking ``notValidBefore`` and ``notValidAfter`` is not entirely clear.
1171   The arguments which apply to web-facing certificates do not seem to apply
1172   (due to the decision for Tahoe-LAFS to operate independently of the web-oriented CA system).
1173
1174   Arguably, complexity is reduced by allowing an existing TLS implementation which wants to make these checks make them
1175   (compared to including additional code to either bypass them or disregard their results).
1176   Reducing complexity, at least in general, is often good for security.
1177
1178   On the other hand, checking the validity time period forces certificate regeneration
1179   (which comes with its own set of complexity).
1180
1181   A possible compromise is to recommend certificates with validity periods of many years or decades.
1182   "Recommend" may be read as "provide software supporting the generation of".
1183
1184   What about key theft?
1185   If certificates are valid for years then a successful attacker can pretend to be a valid storage node for years.
1186   However, short-validity-period certificates are no help in this case.
1187   The attacker can generate new, valid certificates using the stolen keys.
1188
1189   Therefore, the only recourse to key theft
1190   (really *identity theft*)
1191   is to burn the identity and generate a new one.
1192   Burning the identity is a non-trivial task.
1193   It is worth solving but it is not solved here.
1194
1195.. [#]
1196   More simply::
1197
1198    from hashlib import sha256
1199    from cryptography.hazmat.primitives.serialization import (
1200      Encoding,
1201      PublicFormat,
1202    )
1203    from pybase64 import urlsafe_b64encode
1204
1205    def check_tub_id(tub_id):
1206        spki_bytes = cert.public_key().public_bytes(Encoding.DER, PublicFormat.SubjectPublicKeyInfo)
1207        spki_sha256 = sha256(spki_bytes).digest()
1208        spki_encoded = urlsafe_b64encode(spki_sha256)
1209        assert spki_encoded == tub_id
1210
1211   Note we use `unpadded base64url`_ rather than the Foolscap- and Tahoe-LAFS-preferred Base32.
1212
1213.. [#]
1214   https://www.cvedetails.com/cve/CVE-2017-5638/
1215.. [#]
1216   https://pivotal.io/security/cve-2018-1272
1217.. [#]
1218   https://nvd.nist.gov/vuln/detail/CVE-2017-5124
1219.. [#]
1220   https://efail.de/
1221
1222.. _unpadded base64url: https://tools.ietf.org/html/rfc7515#appendix-C
1223
1224.. _attacking SHA1: https://en.wikipedia.org/wiki/SHA-1#Attacks
Note: See TracBrowser for help on using the repository browser.