source: trunk/docs/proposed/http-storage-node-protocol.rst

Last change on this file was ffe2e977, checked in by Itamar Turner-Trauring <itamar@…>, at 2023-08-01T14:54:46Z

Better phrasing

  • Property mode set to 100644
File size: 48.7 KB
Line 
1.. -*- coding: utf-8 -*-
2
3Storage Node Protocol ("Great Black Swamp", "GBS")
4==================================================
5
6The target audience for this document is developers working on Tahoe-LAFS or on an alternate implementation intended to be interoperable.
7After reading this document,
8one should expect to understand how Tahoe-LAFS clients interact over the network with Tahoe-LAFS storage nodes.
9
10The primary goal of the introduction of this protocol is to simplify the task of implementing a Tahoe-LAFS storage server.
11Specifically, it should be possible to implement a Tahoe-LAFS storage server without a Foolscap implementation
12(substituting a simpler GBS server implementation).
13The Tahoe-LAFS client will also need to change but it is not expected that it will be noticably simplified by this change
14(though this may be the first step towards simplifying it).
15
16Glossary
17--------
18
19.. glossary::
20
21   `Foolscap <https://github.com/warner/foolscap/>`_
22     an RPC/RMI (Remote Procedure Call / Remote Method Invocation) protocol for use with Twisted
23
24   storage server
25     a Tahoe-LAFS process configured to offer storage and reachable over the network for store and retrieve operations
26
27   storage service
28     a Python object held in memory in the storage server which provides the implementation of the storage protocol
29
30   introducer
31     a Tahoe-LAFS process at a known location configured to re-publish announcements about the location of storage servers
32
33   :ref:`fURLs <fURLs>`
34     a self-authenticating URL-like string which can be used to locate a remote object using the Foolscap protocol
35     (the storage service is an example of such an object)
36
37   :ref:`NURLs <NURLs>`
38     a self-authenticating URL-like string almost exactly like a fURL but without being tied to Foolscap
39
40   swissnum
41     a short random string which is part of a fURL/NURL and which acts as a shared secret to authorize clients to use a storage service
42
43   lease
44     state associated with a share informing a storage server of the duration of storage desired by a client
45
46   share
47     a single unit of client-provided arbitrary data to be stored by a storage server
48     (in practice, one of the outputs of applying ZFEC encoding to some ciphertext with some additional metadata attached)
49
50   bucket
51     a group of one or more immutable shares held by a storage server and having a common storage index
52
53   slot
54     a group of one or more mutable shares held by a storage server and having a common storage index
55     (sometimes "slot" is considered a synonym for "storage index of a slot")
56
57   storage index
58     a 16 byte string which can address a slot or a bucket
59     (in practice, derived by hashing the encryption key associated with contents of that slot or bucket)
60
61   write enabler
62     a short secret string which storage servers require to be presented before allowing mutation of any mutable share
63
64   lease renew secret
65     a short secret string which storage servers required to be presented before allowing a particular lease to be renewed
66
67The key words
68"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL"
69in this document are to be interpreted as described in RFC 2119.
70
71Motivation
72----------
73
74Foolscap
75~~~~~~~~
76
77Foolscap is a remote method invocation protocol with several distinctive features.
78At its core it allows separate processes to refer each other's objects and methods using a capability-based model.
79This allows for extremely fine-grained access control in a system that remains highly securable without becoming overwhelmingly complicated.
80Supporting this is a flexible and extensible serialization system which allows data to be exchanged between processes in carefully controlled ways.
81
82Tahoe-LAFS avails itself of only a small portion of these features.
83A Tahoe-LAFS storage server typically only exposes one object with a fixed set of methods to clients.
84A Tahoe-LAFS introducer node does roughly the same.
85Tahoe-LAFS exchanges simple data structures that have many common, standard serialized representations.
86
87In exchange for this slight use of Foolscap's sophisticated mechanisms,
88Tahoe-LAFS pays a substantial price:
89
90* Foolscap is implemented only for Python.
91  Tahoe-LAFS is thus limited to being implemented only in Python.
92* There is only one Python implementation of Foolscap.
93  The implementation is therefore the de facto standard and understanding of the protocol often relies on understanding that implementation.
94* The Foolscap developer community is very small.
95  The implementation therefore advances very little and some non-trivial part of the maintenance cost falls on the Tahoe-LAFS project.
96* The extensible serialization system imposes substantial complexity compared to the simple data structures Tahoe-LAFS actually exchanges.
97
98HTTP
99~~~~
100
101HTTP is a request/response protocol that has become the lingua franca of the internet.
102Combined with the principles of Representational State Transfer (REST) it is widely employed to create, update, and delete data in collections on the internet.
103HTTP itself provides only modest functionality in comparison to Foolscap.
104However its simplicity and widespread use have led to a diverse and almost overwhelming ecosystem of libraries, frameworks, toolkits, and so on.
105
106By adopting HTTP in place of Foolscap Tahoe-LAFS can realize the following concrete benefits:
107
108* Practically every language or runtime has an HTTP protocol implementation (or a dozen of them) available.
109  This change paves the way for new Tahoe-LAFS implementations using tools better suited for certain situations
110  (mobile client implementations, high-performance server implementations, easily distributed desktop clients, etc).
111* The simplicity of and vast quantity of resources about HTTP make it a very easy protocol to learn and use.
112  This change reduces the barrier to entry for developers to contribute improvements to Tahoe-LAFS's network interactions.
113* For any given language there is very likely an HTTP implementation with a large and active developer community.
114  Tahoe-LAFS can therefore benefit from the large effort being put into making better libraries for using HTTP.
115* One of the core features of HTTP is the mundane transfer of bulk data and implementions are often capable of doing this with extreme efficiency.
116  The alignment of this core feature with a core activity of Tahoe-LAFS of transferring bulk data means that a substantial barrier to improved Tahoe-LAFS runtime performance will be eliminated.
117
118TLS
119~~~
120
121The Foolscap-based protocol provides *some* of Tahoe-LAFS's confidentiality, integrity, and authentication properties by leveraging TLS.
122An HTTP-based protocol can make use of TLS in largely the same way to provide the same properties.
123Provision of these properties *is* dependant on implementers following Great Black Swamp's rules for x509 certificate validation
124(rather than the standard "web" rules for validation).
125
126Design Requirements
127-------------------
128
129Security
130~~~~~~~~
131
132Summary
133!!!!!!!
134
135The storage node protocol should offer at minimum the security properties offered by the Foolscap-based protocol.
136The Foolscap-based protocol offers:
137
138* **Peer authentication** by way of checked x509 certificates
139* **Message authentication** by way of TLS
140* **Message confidentiality** by way of TLS
141
142  * A careful configuration of the TLS connection parameters *may* also offer **forward secrecy**.
143    However, Tahoe-LAFS' use of Foolscap takes no steps to ensure this is the case.
144
145* **Storage authorization** by way of a capability contained in the fURL addressing a storage service.
146
147Discussion
148!!!!!!!!!!
149
150A client node relies on a storage node to persist certain data until a future retrieval request is made.
151In this way, the client node is vulnerable to attacks which cause the data not to be persisted.
152Though this vulnerability can be (and typically is) mitigated by including redundancy in the share encoding parameters for stored data,
153it is still sensible to attempt to minimize unnecessary vulnerability to this attack.
154
155One way to do this is for the client to be confident the storage node with which it is communicating is really the expected node.
156That is, for the client to perform **peer authentication** of the storage node it connects to.
157This allows it to develop a notion of that node's reputation over time.
158The more retrieval requests the node satisfies correctly the more it probably will satisfy correctly.
159Therefore, the protocol must include some means for verifying the identify of the storage node.
160The initialization of the client with the correct identity information is out of scope for this protocol
161(the system may be trust-on-first-use, there may be a third-party identity broker, etc).
162
163With confidence that communication is proceeding with the intended storage node,
164it must also be possible to trust that data is exchanged without modification.
165That is, the protocol must include some means to perform **message authentication**.
166This is most likely done using cryptographic MACs (such as those used in TLS).
167
168The messages which enable the mutable shares feature include secrets related to those shares.
169For example, the write enabler secret is used to restrict the parties with write access to mutable shares.
170It is exchanged over the network as part of a write operation.
171An attacker learning this secret can overwrite share data with garbage
172(lacking a separate encryption key,
173there is no way to write data which appears legitimate to a legitimate client).
174Therefore, **message confidentiality** is necessary when exchanging these secrets.
175**Forward secrecy** is preferred so that an attacker recording an exchange today cannot launch this attack at some future point after compromising the necessary keys.
176
177A storage service offers service only to some clients.
178A client proves their authorization to use the storage service by presenting a shared secret taken from the fURL.
179In this way **storage authorization** is performed to prevent disallowed parties from consuming any storage resources.
180
181Functionality
182-------------
183
184Tahoe-LAFS application-level information must be transferred using this protocol.
185This information is exchanged with a dozen or so request/response-oriented messages.
186Some of these messages carry large binary payloads.
187Others are small structured-data messages.
188Some facility for expansion to support new information exchanges should also be present.
189
190Solutions
191---------
192
193An HTTP-based protocol, dubbed "Great Black Swamp" (or "GBS"), is described below.
194This protocol aims to satisfy the above requirements at a lower level of complexity than the current Foolscap-based protocol.
195
196Summary (Non-normative)
197~~~~~~~~~~~~~~~~~~~~~~~
198
199Communication with the storage node will take place using TLS.
200The TLS version and configuration will be dictated by an ongoing understanding of best practices.
201The storage node will present an x509 certificate during the TLS handshake.
202Storage clients will require that the certificate have a valid signature.
203The Subject Public Key Information (SPKI) hash of the certificate will constitute the storage node's identity.
204The **tub id** portion of the storage node fURL will be replaced with the SPKI hash.
205
206When connecting to a storage node,
207the client will take the following steps to gain confidence it has reached the intended peer:
208
209* It will perform the usual cryptographic verification of the certificate presented by the storage server.
210  That is,
211  it will check that the certificate itself is well-formed,
212  that it is currently valid [#]_,
213  and that the signature it carries is valid.
214* It will compare the SPKI hash of the certificate to the expected value.
215  The specifics of the comparison are the same as for the comparison specified by `RFC 7469`_ with "sha256" [#]_.
216
217To further clarify, consider this example.
218Alice operates a storage node.
219Alice generates a key pair and secures it properly.
220Alice generates a self-signed storage node certificate with the key pair.
221Alice's storage node announces (to an introducer) a NURL containing (among other information) the SPKI hash.
222Imagine the SPKI hash is ``i5xb...``.
223This results in a NURL of ``pb://i5xb...@example.com:443/g3m5...#v=1``.
224Bob creates a client node pointed at the same introducer.
225Bob's client node receives the announcement from Alice's storage node
226(indirected through the introducer).
227
228Bob's client node recognizes the NURL as referring to an HTTP-dialect server due to the ``v=1`` fragment.
229Bob's client node can now perform a TLS handshake with a server at the address in the NURL location hints
230(``example.com:443`` in this example).
231Following the above described validation procedures,
232Bob's client node can determine whether it has reached Alice's storage node or not.
233If and only if the validation procedure is successful does Bob's client node conclude it has reached Alice's storage node.
234**Peer authentication** has been achieved.
235
236Additionally,
237by continuing to interact using TLS,
238Bob's client and Alice's storage node are assured of both **message authentication** and **message confidentiality**.
239
240Bob's client further inspects the NURL for the *swissnum*.
241When Bob's client issues HTTP requests to Alice's storage node it includes the *swissnum* in its requests.
242**Storage authorization** has been achieved.
243
244.. note::
245
246   Foolscap TubIDs are 20 bytes (SHA1 digest of the certificate).
247   They are encoded with `Base32`_ for a length of 32 bytes.
248   SPKI information discussed here is 32 bytes (SHA256 digest).
249   They would be encoded in `Base32`_ for a length of 52 bytes.
250   `unpadded base64url`_ provides a more compact encoding of the information while remaining URL-compatible.
251   This would encode the SPKI information for a length of merely 43 bytes.
252   SHA1,
253   the current Foolscap hash function,
254   is not a practical choice at this time due to advances made in `attacking SHA1`_.
255   The selection of a safe hash function with output smaller than SHA256 could be the subject of future improvements.
256   A 224 bit hash function (SHA3-224, for example) might be suitable -
257   improving the encoded length to 38 bytes.
258
259
260Transition
261~~~~~~~~~~
262
263To provide a seamless user experience during this protocol transition,
264there should be a period during which both protocols are supported by storage nodes.
265The GBS announcement will be introduced in a way that *updated client* software can recognize.
266Its introduction will also be made in such a way that *non-updated client* software disregards the new information
267(of which it cannot make any use).
268
269Storage nodes will begin to operate a new GBS server.
270They may re-use their existing x509 certificate or generate a new one.
271Generation of a new certificate allows for certain non-optimal conditions to be addressed:
272
273* The ``commonName`` of ``newpb_thingy`` may be changed to a more descriptive value.
274* A ``notValidAfter`` field with a timestamp in the past may be updated.
275
276Storage nodes will announce a new NURL for this new HTTP-based server.
277This NURL will be announced alongside their existing Foolscap-based server's fURL.
278Such an announcement will resemble this::
279
280  {
281      "anonymous-storage-FURL": "pb://...",          # The old entry
282      "anonymous-storage-NURLs": ["pb://...#v=1"]    # The new, additional entry
283  }
284
285The transition process will proceed in three stages:
286
2871. The first stage represents the starting conditions in which clients and servers can speak only Foolscap.
288#. The intermediate stage represents a condition in which some clients and servers can both speak Foolscap and GBS.
289#. The final stage represents the desired condition in which all clients and servers speak only GBS.
290
291During the first stage only one client/server interaction is possible:
292the storage server announces only Foolscap and speaks only Foolscap.
293During the final stage there is only one supported interaction:
294the client and server are both updated and speak GBS to each other.
295
296During the intermediate stage there are four supported interactions:
297
2981. Both the client and server are non-updated.
299   The interaction is just as it would be during the first stage.
300#. The client is updated and the server is non-updated.
301   The client will see the Foolscap announcement and the lack of a GBS announcement.
302   It will speak to the server using Foolscap.
303#. The client is non-updated and the server is updated.
304   The client will see the Foolscap announcement.
305   It will speak Foolscap to the storage server.
306#. Both the client and server are updated.
307   The client will see the GBS announcement and disregard the Foolscap announcement.
308   It will speak GBS to the server.
309
310There is one further complication:
311the client maintains a cache of storage server information
312(to avoid continuing to rely on the introducer after it has been introduced).
313The follow sequence of events is likely:
314
3151. The client connects to an introducer.
316#. It receives an announcement for a non-updated storage server (Foolscap only).
317#. It caches this announcement.
318#. At some point, the storage server is updated.
319#. The client uses the information in its cache to open a Foolscap connection to the storage server.
320
321Ideally,
322the client would not rely on an update from the introducer to give it the GBS NURL for the updated storage server.
323In practice, we have decided not to implement this functionality.
324
325Server Details
326--------------
327
328The protocol primarily enables interaction with "resources" of two types:
329storage indexes
330and shares.
331A particular resource is addressed by the HTTP request path.
332Details about the interface are encoded in the HTTP message body.
333
334String Encoding
335~~~~~~~~~~~~~~~
336
337.. _Base32:
338
339Base32
340!!!!!!
341
342Where the specification refers to Base32 the meaning is *unpadded* Base32 encoding as specified by `RFC 4648`_ using a *lowercase variation* of the alphabet from Section 6.
343
344That is, the alphabet is:
345
346.. list-table:: Base32 Alphabet
347   :header-rows: 1
348
349   * - Value
350     - Encoding
351     - Value
352     - Encoding
353     - Value
354     - Encoding
355     - Value
356     - Encoding
357
358   * - 0
359     - a
360     - 9
361     - j
362     - 18
363     - s
364     - 27
365     - 3
366   * - 1
367     - b
368     - 10
369     - k
370     - 19
371     - t
372     - 28
373     - 4
374   * - 2
375     - c
376     - 11
377     - l
378     - 20
379     - u
380     - 29
381     - 5
382   * - 3
383     - d
384     - 12
385     - m
386     - 21
387     - v
388     - 30
389     - 6
390   * - 4
391     - e
392     - 13
393     - n
394     - 22
395     - w
396     - 31
397     - 7
398   * - 5
399     - f
400     - 14
401     - o
402     - 23
403     - x
404     -
405     -
406   * - 6
407     - g
408     - 15
409     - p
410     - 24
411     - y
412     -
413     -
414   * - 7
415     - h
416     - 16
417     - q
418     - 25
419     - z
420     -
421     -
422   * - 8
423     - i
424     - 17
425     - r
426     - 26
427     - 2
428     -
429     -
430
431Message Encoding
432~~~~~~~~~~~~~~~~
433
434Clients and servers MUST use the ``Content-Type`` and ``Accept`` header fields as specified in `RFC 9110`_ for message body negotiation.
435
436The encoding for HTTP message bodies SHOULD be `CBOR`_.
437Clients submitting requests using this encoding MUST include a ``Content-Type: application/cbor`` request header field.
438A request MAY be submitted using an alternate encoding by declaring this in the ``Content-Type`` header field.
439A request MAY indicate its preference for an alternate encoding in the response using the ``Accept`` header field.
440A request which includes no ``Accept`` header field MUST be interpreted in the same way as a request including a ``Accept: application/cbor`` header field.
441
442Clients and servers MAY support additional request and response message body encodings.
443
444Clients and servers SHOULD support ``application/json`` request and response message body encoding.
445For HTTP messages carrying binary share data,
446this is expected to be a particularly poor encoding.
447However,
448for HTTP messages carrying small payloads of strings, numbers, and containers
449it is expected that JSON will be more convenient than CBOR for ad hoc testing and manual interaction.
450
451For this same reason,
452JSON is used throughout for the examples presented here.
453Because of the simple types used throughout
454and the equivalence described in `RFC 7049`_
455these examples should be representative regardless of which of these two encodings is chosen.
456
457There are two exceptions to this rule.
458
4591. Sets
460!!!!!!!
461
462For CBOR messages,
463any sequence that is semantically a set (i.e. no repeated values allowed, order doesn't matter, and elements are hashable in Python) should be sent as a set.
464Tag 6.258 is used to indicate sets in CBOR;
465see `the CBOR registry <https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>`_ for more details.
466The JSON encoding does not support sets.
467Sets MUST be represented as arrays in JSON-encoded messages.
468
4692. Bytes
470!!!!!!!!
471
472The CBOR encoding natively supports a bytes type while the JSON encoding does not.
473Bytes MUST be represented as strings giving the `Base64`_ representation of the original bytes value.
474
475HTTP Design
476~~~~~~~~~~~
477
478The HTTP interface described here is informed by the ideas of REST
479(Representational State Transfer).
480For ``GET`` requests query parameters are preferred over values encoded in the request body.
481For other requests query parameters are encoded into the message body.
482
483Many branches of the resource tree are conceived as homogenous containers:
484one branch contains all of the share data;
485another branch contains all of the lease data;
486etc.
487
488Clients and servers MUST use the ``Authorization`` header field,
489as specified in `RFC 9110`_,
490for authorization of all requests to all endpoints specified here.
491The authentication *type* MUST be ``Tahoe-LAFS``.
492Clients MUST present the `Base64`_-encoded representation of the swissnum from the NURL used to locate the storage service as the *credentials*.
493
494If credentials are not presented or the swissnum is not associated with a storage service then the server MUST issue a ``401 UNAUTHORIZED`` response and perform no other processing of the message.
495
496Requests to certain endpoints MUST include additional secrets in the ``X-Tahoe-Authorization`` headers field.
497The endpoints which require these secrets are:
498
499* ``PUT /storage/v1/lease/:storage_index``:
500  The secrets included MUST be ``lease-renew-secret`` and ``lease-cancel-secret``.
501
502* ``POST /storage/v1/immutable/:storage_index``:
503  The secrets included MUST be ``lease-renew-secret``, ``lease-cancel-secret``, and ``upload-secret``.
504
505* ``PATCH /storage/v1/immutable/:storage_index/:share_number``:
506  The secrets included MUST be ``upload-secret``.
507
508* ``PUT /storage/v1/immutable/:storage_index/:share_number/abort``:
509  The secrets included MUST be ``upload-secret``.
510
511* ``POST /storage/v1/mutable/:storage_index/read-test-write``:
512  The secrets included MUST be ``lease-renew-secret``, ``lease-cancel-secret``, and ``write-enabler``.
513
514If these secrets are:
515
5161. Missing.
5172. The wrong length.
5183. Not the expected kind of secret.
5194. They are otherwise unparseable before they are actually semantically used.
520
521the server MUST respond with ``400 BAD REQUEST`` and perform no other processing of the message.
522401 is not used because this isn't an authorization problem, this is a "you sent garbage and should know better" bug.
523
524If authorization using the secret fails,
525then the server MUST send a ``401 UNAUTHORIZED`` response and perform no other processing of the message.
526
527Encoding
528~~~~~~~~
529
530* ``storage_index`` MUST be `Base32`_ encoded in URLs.
531* ``share_number`` MUST be a decimal representation
532
533General
534~~~~~~~
535
536``GET /storage/v1/version``
537!!!!!!!!!!!!!!!!!!!!!!!!!!!
538
539This endpoint allows clients to retrieve some basic metadata about a storage server from the storage service.
540The response MUST validate against this CDDL schema::
541
542  {'http://allmydata.org/tahoe/protocols/storage/v1' => {
543      'maximum-immutable-share-size' => uint
544      'maximum-mutable-share-size' => uint
545      'available-space' => uint
546      }
547   'application-version' => bstr
548  }
549
550The server SHOULD populate as many fields as possible with accurate information about its behavior.
551
552For fields which relate to a specific API
553the semantics are documented below in the section for that API.
554For fields that are more general than a single API the semantics are as follows:
555
556* available-space:
557  The server SHOULD use this field to advertise the amount of space that it currently considers unused and is willing to allocate for client requests.
558  The value is a number of bytes.
559
560
561``PUT /storage/v1/lease/:storage_index``
562!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
563
564Either renew or create a new lease on the bucket addressed by ``storage_index``.
565
566The renew secret and cancellation secret should be included as ``X-Tahoe-Authorization`` headers.
567For example::
568
569    X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
570    X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
571
572If the ``lease-renew-secret`` value matches an existing lease
573then the expiration time of that lease will be changed to 31 days after the time of this operation.
574If it does not match an existing lease
575then a new lease will be created with this ``lease-renew-secret`` which expires 31 days after the time of this operation.
576
577``lease-renew-secret`` and ``lease-cancel-secret`` values must be 32 bytes long.
578The server treats them as opaque values.
579:ref:`Share Leases` gives details about how the Tahoe-LAFS storage client constructs these values.
580
581In these cases the response is ``NO CONTENT`` with an empty body.
582
583It is possible that the storage server will have no shares for the given ``storage_index`` because:
584
585* no such shares have ever been uploaded.
586* a previous lease expired and the storage server reclaimed the storage by deleting the shares.
587
588In these cases the server takes no action and returns ``NOT FOUND``.
589
590
591Discussion
592``````````
593
594We considered an alternative where ``lease-renew-secret`` and ``lease-cancel-secret`` are placed in query arguments on the request path.
595This increases chances of leaking secrets in logs.
596Putting the secrets in the body reduces the chances of leaking secrets,
597but eventually we chose headers as the least likely information to be logged.
598
599Several behaviors here are blindly copied from the Foolscap-based storage server protocol.
600
601* There is a cancel secret but there is no API to use it to cancel a lease (see ticket:3768).
602* The lease period is hard-coded at 31 days.
603
604These are not necessarily ideal behaviors
605but they are adopted to avoid any *semantic* changes between the Foolscap- and HTTP-based protocols.
606It is expected that some or all of these behaviors may change in a future revision of the HTTP-based protocol.
607
608Immutable
609---------
610
611Writing
612~~~~~~~
613
614``POST /storage/v1/immutable/:storage_index``
615!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
616
617Initialize an immutable storage index with some buckets.
618The server MUST allow share data to be written to the buckets at most one time.
619The server MAY create a lease for the buckets.
620Details of the buckets to create are encoded in the request body.
621The request body MUST validate against this CDDL schema::
622
623  {
624    share-numbers: #6.258([0*256 uint])
625    allocated-size: uint
626  }
627
628For example::
629
630  {"share-numbers": [1, 7, ...], "allocated-size": 12345}
631
632The server SHOULD accept a value for **allocated-size** that is less than or equal to the lesser of the values of the server's version message's **maximum-immutable-share-size** or **available-space** values.
633
634The request MUST include ``X-Tahoe-Authorization`` HTTP headers that set the various secrets—upload, lease renewal, lease cancellation—that will be later used to authorize various operations.
635For example::
636
637   X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
638   X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
639   X-Tahoe-Authorization: upload-secret <base64-upload-secret>
640
641The response body MUST include encoded information about the created buckets.
642The response body MUST validate against this CDDL schema::
643
644  {
645    already-have: #6.258([0*256 uint])
646    allocated: #6.258([0*256 uint])
647  }
648
649For example::
650
651  {"already-have": [1, ...], "allocated": [7, ...]}
652
653The upload secret is an opaque _byte_ string.
654
655Handling repeat calls:
656
657* If the same API call is repeated with the same upload secret, the response is the same and no change is made to server state.
658  This is necessary to ensure retries work in the face of lost responses from the server.
659* If the API calls is with a different upload secret, this implies a new client, perhaps because the old client died.
660  Or it may happen because the client wants to upload a different share number than a previous client.
661  New shares will be created, existing shares will be unchanged, regardless of whether the upload secret matches or not.
662
663Discussion
664``````````
665
666We considered making this ``POST /storage/v1/immutable`` instead.
667The motivation was to keep *storage index* out of the request URL.
668Request URLs have an elevated chance of being logged by something.
669We were concerned that having the *storage index* logged may increase some risks.
670However, we decided this does not matter because:
671
672* the *storage index* can only be used to retrieve (not decrypt) the ciphertext-bearing share.
673* the *storage index* is already persistently present on the storage node in the form of directory names in the storage servers ``shares`` directory.
674* the request is made via HTTPS and so only Tahoe-LAFS can see the contents,
675  therefore no proxy servers can perform any extra logging.
676* Tahoe-LAFS itself does not currently log HTTP request URLs.
677
678The response includes ``already-have`` and ``allocated`` for two reasons:
679
680* If an upload is interrupted and the client loses its local state that lets it know it already uploaded some shares
681  then this allows it to discover this fact (by inspecting ``already-have``) and only upload the missing shares (indicated by ``allocated``).
682
683* If an upload has completed a client may still choose to re-balance storage by moving shares between servers.
684  This might be because a server has become unavailable and a remaining server needs to store more shares for the upload.
685  It could also just be that the client's preferred servers have changed.
686
687Regarding upload secrets,
688the goal is for uploading and aborting (see next sections) to be authenticated by more than just the storage index.
689In the future, we may want to generate them in a way that allows resuming/canceling when the client has issues.
690In the short term, they can just be a random byte string.
691The primary security constraint is that each upload to each server has its own unique upload key,
692tied to uploading that particular storage index to this particular server.
693
694Rejected designs for upload secrets:
695
696* Upload secret per share number.
697  In order to make the secret unguessable by attackers, which includes other servers,
698  it must contain randomness.
699  Randomness means there is no need to have a secret per share, since adding share-specific content to randomness doesn't actually make the secret any better.
700
701``PATCH /storage/v1/immutable/:storage_index/:share_number``
702!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
703
704Write data for the indicated share.
705The share number MUST belong to the storage index.
706The request body MUST be the raw share data (i.e., ``application/octet-stream``).
707The request MUST include a *Content-Range* header field;
708for large transfers this allows partially complete uploads to be resumed.
709
710For example,
711a 1MiB share can be divided in to eight separate 128KiB chunks.
712Each chunk can be uploaded in a separate request.
713Each request can include a *Content-Range* value indicating its placement within the complete share.
714If any one of these requests fails then at most 128KiB of upload work needs to be retried.
715
716The server MUST recognize when all of the data has been received and mark the share as complete
717(which it can do because it was informed of the size when the storage index was initialized).
718
719The request MUST include a ``X-Tahoe-Authorization`` header that includes the upload secret::
720
721    X-Tahoe-Authorization: upload-secret <base64-upload-secret>
722
723Responses:
724
725* When a chunk that does not complete the share is successfully uploaded the response MUST be ``OK``.
726  The response body MUST indicate the range of share data that has yet to be uploaded.
727  The response body MUST validate against this CDDL schema::
728
729    {
730      required: [0* {begin: uint, end: uint}]
731    }
732
733  For example::
734
735    { "required":
736      [ { "begin": <byte position, inclusive>
737        , "end":   <byte position, exclusive>
738        }
739      ,
740      ...
741      ]
742    }
743
744* When the chunk that completes the share is successfully uploaded the response MUST be ``CREATED``.
745* If the *Content-Range* for a request covers part of the share that has already,
746  and the data does not match already written data,
747  the response MUST be ``CONFLICT``.
748  In this case the client MUST abort the upload.
749  The client MAY then restart the upload from scratch.
750
751Discussion
752``````````
753
754``PUT`` verbs are only supposed to be used to replace the whole resource,
755thus the use of ``PATCH``.
756From RFC 7231::
757
758   An origin server that allows PUT on a given target resource MUST send
759   a 400 (Bad Request) response to a PUT request that contains a
760   Content-Range header field (Section 4.2 of [RFC7233]), since the
761   payload is likely to be partial content that has been mistakenly PUT
762   as a full representation.  Partial content updates are possible by
763   targeting a separately identified resource with state that overlaps a
764   portion of the larger resource, or by using a different method that
765   has been specifically defined for partial updates (for example, the
766   PATCH method defined in [RFC5789]).
767
768
769
770``PUT /storage/v1/immutable/:storage_index/:share_number/abort``
771!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
772
773This cancels an *in-progress* upload.
774
775The request MUST include a ``X-Tahoe-Authorization`` header that includes the upload secret::
776
777    X-Tahoe-Authorization: upload-secret <base64-upload-secret>
778
779If there is an incomplete upload with a matching upload-secret then the server MUST consider the abort to have succeeded.
780In this case the response MUST be ``OK``.
781The server MUST respond to all future requests as if the operations related to this upload did not take place.
782
783If there is no incomplete upload with a matching upload-secret then the server MUST respond with ``Method Not Allowed`` (405).
784The server MUST make no client-visible changes to its state in this case.
785
786``POST /storage/v1/immutable/:storage_index/:share_number/corrupt``
787!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
788
789Advise the server the data read from the indicated share was corrupt.
790The request body includes an human-meaningful text string with details about the corruption.
791It also includes potentially important details about the share.
792The request body MUST validate against this CDDL schema::
793
794  {
795    reason: tstr .size (1..32765)
796  }
797
798For example::
799
800  {"reason": "expected hash abcd, got hash efgh"}
801
802The report pertains to the immutable share with a **storage index** and **share number** given in the request path.
803If the identified **storage index** and **share number** are known to the server then the response SHOULD be accepted and made available to server administrators.
804In this case the response SHOULD be ``OK``.
805If the response is not accepted then the response SHOULD be ``Not Found`` (404).
806
807Discussion
808``````````
809
810The seemingly odd length limit on ``reason`` is chosen so that the *encoded* representation of the message is limited to 32768.
811
812Reading
813~~~~~~~
814
815``GET /storage/v1/immutable/:storage_index/shares``
816!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
817
818Retrieve a list (semantically, a set) indicating all shares available for the indicated storage index.
819The response body MUST validate against this CDDL schema::
820
821  #6.258([0*256 uint])
822
823For example::
824
825  [1, 5]
826
827If the **storage index** in the request path is not known to the server then the response MUST include an empty list.
828
829``GET /storage/v1/immutable/:storage_index/:share_number``
830!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
831
832Read a contiguous sequence of bytes from one share in one bucket.
833The response body MUST be the raw share data (i.e., ``application/octet-stream``).
834The ``Range`` header MAY be used to request exactly one ``bytes`` range,
835in which case the response code MUST be ``Partial Content`` (206).
836Interpretation and response behavior MUST be as specified in RFC 7233 § 4.1.
837Multiple ranges in a single request are *not* supported;
838open-ended ranges are also not supported.
839Clients MUST NOT send requests using these features.
840
841If the response reads beyond the end of the data,
842the response MUST be shorter than the requested range.
843It MUST contain all data up to the end of the share and then end.
844The resulting ``Content-Range`` header MUST be consistent with the returned data.
845
846If the response to a query is an empty range,
847the server MUST send a ``No Content`` (204) response.
848
849Discussion
850``````````
851
852Multiple ``bytes`` ranges are not supported.
853HTTP requires that the ``Content-Type`` of the response in that case be ``multipart/...``.
854The ``multipart`` major type brings along string sentinel delimiting as a means to frame the different response parts.
855There are many drawbacks to this framing technique:
856
8571. It is resource-intensive to generate.
8582. It is resource-intensive to parse.
8593. It is complex to parse safely [#]_ [#]_ [#]_ [#]_.
860
861A previous revision of this specification allowed requesting one or more contiguous sequences from one or more shares.
862This *superficially* mirrored the Foolscap based interface somewhat closely.
863The interface was simplified to this version because this version is all that is required to let clients retrieve any desired information.
864It only requires that the client issue multiple requests.
865This can be done with pipelining or parallel requests to avoid an additional latency penalty.
866In the future,
867if there are performance goals,
868benchmarks can demonstrate whether they are achieved by a more complicated interface or some other change.
869
870Mutable
871-------
872
873Writing
874~~~~~~~
875
876``POST /storage/v1/mutable/:storage_index/read-test-write``
877!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
878
879General purpose read-test-and-write operation for mutable storage indexes.
880A mutable storage index is also called a "slot"
881(particularly by the existing Tahoe-LAFS codebase).
882The first write operation on a mutable storage index creates it
883(that is,
884there is no separate "create this storage index" operation as there is for the immutable storage index type).
885
886The request MUST include ``X-Tahoe-Authorization`` headers with write enabler and lease secrets::
887
888    X-Tahoe-Authorization: write-enabler <base64-write-enabler-secret>
889    X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
890    X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
891
892The request body MUST include test, read, and write vectors for the operation.
893The request body MUST validate against this CDDL schema::
894
895  {
896    "test-write-vectors": {
897      0*256 share_number : {
898        "test": [0*30 {"offset": uint, "size": uint, "specimen": bstr}]
899        "write": [* {"offset": uint, "data": bstr}]
900        "new-length": uint / null
901      }
902    }
903    "read-vector": [0*30 {"offset": uint, "size": uint}]
904  }
905  share_number = uint
906
907For example::
908
909   {
910       "test-write-vectors": {
911           0: {
912               "test": [{
913                   "offset": 3,
914                   "size": 5,
915                   "specimen": "hello"
916               }, ...],
917               "write": [{
918                   "offset": 9,
919                   "data": "world"
920               }, ...],
921               "new-length": 5
922           }
923       },
924       "read-vector": [{"offset": 3, "size": 12}, ...]
925   }
926
927The response body contains a boolean indicating whether the tests all succeed
928(and writes were applied) and a mapping giving read data (pre-write).
929The response body MUST validate against this CDDL schema::
930
931  {
932    "success": bool,
933    "data": {0*256 share_number: [0* bstr]}
934  }
935  share_number = uint
936
937For example::
938
939  {
940      "success": true,
941      "data": {
942          0: ["foo"],
943          5: ["bar"],
944          ...
945      }
946  }
947
948A client MAY send a test vector or read vector to bytes beyond the end of existing data.
949In this case a server MUST behave as if the test or read vector referred to exactly as much data exists.
950
951For example,
952consider the case where the server has 5 bytes of data for a particular share.
953If a client sends a read vector with an ``offset`` of 1 and a ``size`` of 4 then the server MUST respond with all of the data except the first byte.
954If a client sends a read vector with the same ``offset`` and a ``size`` of 5 (or any larger value) then the server MUST respond in the same way.
955
956Similarly,
957if there is no data at all,
958an empty byte string is returned no matter what the offset or length.
959
960Reading
961~~~~~~~
962
963``GET /storage/v1/mutable/:storage_index/shares``
964!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
965
966Retrieve a set indicating all shares available for the indicated storage index.
967The response body MUST validate against this CDDL schema::
968
969  #6.258([0*256 uint])
970
971For example::
972
973  [1, 5]
974
975``GET /storage/v1/mutable/:storage_index/:share_number``
976!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
977
978Read data from the indicated mutable shares, just like ``GET /storage/v1/immutable/:storage_index``.
979
980The response body MUST be the raw share data (i.e., ``application/octet-stream``).
981The ``Range`` header MAY be used to request exactly one ``bytes`` range,
982in which case the response code MUST be ``Partial Content`` (206).
983Interpretation and response behavior MUST be specified in RFC 7233 § 4.1.
984Multiple ranges in a single request are *not* supported;
985open-ended ranges are also not supported.
986Clients MUST NOT send requests using these features.
987
988If the response reads beyond the end of the data,
989the response MUST be shorter than the requested range.
990It MUST contain all data up to the end of the share and then end.
991The resulting ``Content-Range`` header MUST be consistent with the returned data.
992
993If the response to a query is an empty range,
994the server MUST send a ``No Content`` (204) response.
995
996
997``POST /storage/v1/mutable/:storage_index/:share_number/corrupt``
998!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
999
1000Advise the server the data read from the indicated share was corrupt.
1001Just like the immutable version.
1002
1003Sample Interactions
1004-------------------
1005
1006This section contains examples of client/server interactions to help illuminate the above specification.
1007This section is non-normative.
1008
1009Immutable Data
1010~~~~~~~~~~~~~~
1011
10121. Create a bucket for storage index ``AAAAAAAAAAAAAAAA`` to hold two immutable shares, discovering that share ``1`` was already uploaded::
1013
1014     POST /storage/v1/immutable/AAAAAAAAAAAAAAAA
1015     Authorization: Tahoe-LAFS nurl-swissnum
1016     X-Tahoe-Authorization: lease-renew-secret efgh
1017     X-Tahoe-Authorization: lease-cancel-secret jjkl
1018     X-Tahoe-Authorization: upload-secret xyzf
1019
1020     {"share-numbers": [1, 7], "allocated-size": 48}
1021
1022     200 OK
1023     {"already-have": [1], "allocated": [7]}
1024
1025#. Upload the content for immutable share ``7``::
1026
1027     PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
1028     Authorization: Tahoe-LAFS nurl-swissnum
1029     Content-Range: bytes 0-15/48
1030     X-Tahoe-Authorization: upload-secret xyzf
1031     <first 16 bytes of share data>
1032
1033     200 OK
1034     { "required": [ {"begin": 16, "end": 48 } ] }
1035
1036     PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
1037     Authorization: Tahoe-LAFS nurl-swissnum
1038     Content-Range: bytes 16-31/48
1039     X-Tahoe-Authorization: upload-secret xyzf
1040     <second 16 bytes of share data>
1041
1042     200 OK
1043     { "required": [ {"begin": 32, "end": 48 } ] }
1044
1045     PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
1046     Authorization: Tahoe-LAFS nurl-swissnum
1047     Content-Range: bytes 32-47/48
1048     X-Tahoe-Authorization: upload-secret xyzf
1049     <final 16 bytes of share data>
1050
1051     201 CREATED
1052
1053#. Download the content of the previously uploaded immutable share ``7``::
1054
1055     GET /storage/v1/immutable/AAAAAAAAAAAAAAAA?share=7
1056     Authorization: Tahoe-LAFS nurl-swissnum
1057     Range: bytes=0-47
1058
1059     200 OK
1060     Content-Range: bytes 0-47/48
1061     <complete 48 bytes of previously uploaded data>
1062
1063#. Renew the lease on all immutable shares in bucket ``AAAAAAAAAAAAAAAA``::
1064
1065     PUT /storage/v1/lease/AAAAAAAAAAAAAAAA
1066     Authorization: Tahoe-LAFS nurl-swissnum
1067     X-Tahoe-Authorization: lease-cancel-secret jjkl
1068     X-Tahoe-Authorization: lease-renew-secret efgh
1069
1070     204 NO CONTENT
1071
1072Mutable Data
1073~~~~~~~~~~~~
1074
10751. Create mutable share number ``3`` with ``10`` bytes of data in slot ``BBBBBBBBBBBBBBBB``.
1076The special test vector of size 1 but empty bytes will only pass
1077if there is no existing share,
1078otherwise it will read a byte which won't match `b""`::
1079
1080     POST /storage/v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
1081     Authorization: Tahoe-LAFS nurl-swissnum
1082     X-Tahoe-Authorization: write-enabler abcd
1083     X-Tahoe-Authorization: lease-cancel-secret efgh
1084     X-Tahoe-Authorization: lease-renew-secret ijkl
1085
1086     {
1087         "test-write-vectors": {
1088             3: {
1089                 "test": [{
1090                     "offset": 0,
1091                     "size": 1,
1092                     "specimen": ""
1093                 }],
1094                 "write": [{
1095                     "offset": 0,
1096                     "data": "xxxxxxxxxx"
1097                 }],
1098                 "new-length": 10
1099             }
1100         },
1101         "read-vector": []
1102     }
1103
1104     200 OK
1105     {
1106         "success": true,
1107         "data": []
1108     }
1109
1110#. Safely rewrite the contents of a known version of mutable share number ``3`` (or fail)::
1111
1112     POST /storage/v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
1113     Authorization: Tahoe-LAFS nurl-swissnum
1114     X-Tahoe-Authorization: write-enabler abcd
1115     X-Tahoe-Authorization: lease-cancel-secret efgh
1116     X-Tahoe-Authorization: lease-renew-secret ijkl
1117
1118     {
1119         "test-write-vectors": {
1120             3: {
1121                 "test": [{
1122                     "offset": 0,
1123                     "size": <length of checkstring>,
1124                     "specimen": "<checkstring>"
1125                 }],
1126                 "write": [{
1127                     "offset": 0,
1128                     "data": "yyyyyyyyyy"
1129                 }],
1130                 "new-length": 10
1131             }
1132         },
1133         "read-vector": []
1134     }
1135
1136     200 OK
1137     {
1138         "success": true,
1139         "data": []
1140     }
1141
1142#. Download the contents of share number ``3``::
1143
1144     GET /storage/v1/mutable/BBBBBBBBBBBBBBBB?share=3
1145     Authorization: Tahoe-LAFS nurl-swissnum
1146     Range: bytes=0-16
1147
1148     200 OK
1149     Content-Range: bytes 0-15/16
1150     <complete 16 bytes of previously uploaded data>
1151
1152#. Renew the lease on previously uploaded mutable share in slot ``BBBBBBBBBBBBBBBB``::
1153
1154     PUT /storage/v1/lease/BBBBBBBBBBBBBBBB
1155     Authorization: Tahoe-LAFS nurl-swissnum
1156     X-Tahoe-Authorization: lease-cancel-secret efgh
1157     X-Tahoe-Authorization: lease-renew-secret ijkl
1158
1159     204 NO CONTENT
1160
1161.. _Base64: https://www.rfc-editor.org/rfc/rfc4648#section-4
1162
1163.. _RFC 4648: https://tools.ietf.org/html/rfc4648
1164
1165.. _RFC 7469: https://tools.ietf.org/html/rfc7469#section-2.4
1166
1167.. _RFC 7049: https://tools.ietf.org/html/rfc7049#section-4
1168
1169.. _RFC 9110: https://tools.ietf.org/html/rfc9110
1170
1171.. _CBOR: http://cbor.io/
1172
1173.. [#]
1174   The security value of checking ``notValidBefore`` and ``notValidAfter`` is not entirely clear.
1175   The arguments which apply to web-facing certificates do not seem to apply
1176   (due to the decision for Tahoe-LAFS to operate independently of the web-oriented CA system).
1177
1178   Arguably, complexity is reduced by allowing an existing TLS implementation which wants to make these checks make them
1179   (compared to including additional code to either bypass them or disregard their results).
1180   Reducing complexity, at least in general, is often good for security.
1181
1182   On the other hand, checking the validity time period forces certificate regeneration
1183   (which comes with its own set of complexity).
1184
1185   A possible compromise is to recommend certificates with validity periods of many years or decades.
1186   "Recommend" may be read as "provide software supporting the generation of".
1187
1188   What about key theft?
1189   If certificates are valid for years then a successful attacker can pretend to be a valid storage node for years.
1190   However, short-validity-period certificates are no help in this case.
1191   The attacker can generate new, valid certificates using the stolen keys.
1192
1193   Therefore, the only recourse to key theft
1194   (really *identity theft*)
1195   is to burn the identity and generate a new one.
1196   Burning the identity is a non-trivial task.
1197   It is worth solving but it is not solved here.
1198
1199.. [#]
1200   More simply::
1201
1202    from hashlib import sha256
1203    from cryptography.hazmat.primitives.serialization import (
1204      Encoding,
1205      PublicFormat,
1206    )
1207    from pybase64 import urlsafe_b64encode
1208
1209    def check_tub_id(tub_id):
1210        spki_bytes = cert.public_key().public_bytes(Encoding.DER, PublicFormat.SubjectPublicKeyInfo)
1211        spki_sha256 = sha256(spki_bytes).digest()
1212        spki_encoded = urlsafe_b64encode(spki_sha256)
1213        assert spki_encoded == tub_id
1214
1215   Note we use `unpadded base64url`_ rather than the Foolscap- and Tahoe-LAFS-preferred Base32.
1216
1217.. [#]
1218   https://www.cvedetails.com/cve/CVE-2017-5638/
1219.. [#]
1220   https://pivotal.io/security/cve-2018-1272
1221.. [#]
1222   https://nvd.nist.gov/vuln/detail/CVE-2017-5124
1223.. [#]
1224   https://efail.de/
1225
1226.. _unpadded base64url: https://tools.ietf.org/html/rfc7515#appendix-C
1227
1228.. _attacking SHA1: https://en.wikipedia.org/wiki/SHA-1#Attacks
Note: See TracBrowser for help on using the repository browser.