[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3777: Some potential issues with GET /v1/immutable/:storage_index

Mon Aug 23 15:27:41 UTC 2021

#3777: Some potential issues with GET /v1/immutable/:storage_index
--------------------------+-----------------------------------
     Reporter:  itamarst  |      Owner:  exarkun
         Type:  task      |     Status:  new
     Priority:  normal    |  Milestone:  HTTP Storage Protocol
    Component:  unknown   |    Version:  n/a
   Resolution:            |   Keywords:
Launchpad Bug:            |
--------------------------+-----------------------------------
Description changed by itamarst:

Old description:

> Downloading multiple shares is specified via the query string.
>
> The current spec gives an example:
>
> {{{
> GET
> /v1/immutable/:storage_index?share=:s0&share=:sN&offset=o1&size=z0&offset=oN&size=zN
> }}}
>
> First, that example is clearly wrong (should be "offset=o0", not
> "offset=o1".).
>
> Second, it's not made explicit how to match shares, offsets, and sizes.
> Implicitly it seems like the pattern is:
>
> 1. First all the shares.
> 2. Then, pairs of offset and size, in same order as shares.
>
> This pattern should be explicitly documented.
>
> Third, as part of documenting, it's worth thinking about the pattern in
> context of HTTP server implementations. HTTP server frameworks often
> don't preserve order _between_ parameters, only preserving order for
> multiple values of a single parameter. So e.g.
> `?share=1&share=2&offset=0&offset=100` will look the same as
> `?share=1&offset=0&share=2&offset=100` if you're using Twisted Web (or
> Flask), because you get back a mapping of argument key to list of values.
>
> Fourth, can I request multiple chunks from same share?
>
> Fifth, taking a step back and looking at the big picture, I am not
> certain that supporting reading from multiple shares (or more broadly,
> reading multiple chunks) in a single query is actually useful.
>
> 1. It's not clear to me it simplifies the client implementation.
> 2. There are API design difficulties of receive arbitrarily sized
> multiple chunks in a single stream.
> 3. Using single CBOR result streaming isn't really easy to support.
> Alternative is to concatenate multiple separately CBOR-encoded values
> (i.e. writing multiple results of `cbor2.dumpb` or whatever), which CBOR
> libraries can handle better.
> 4. Given support for parallel requests, it doesn't improve latency
> (though it does have minor reduction in bandwidth usage).

New description:

 Downloading multiple shares is specified via the query string.

 The current spec gives an example:

 {{{
 GET
 /v1/immutable/:storage_index?share=:s0&share=:sN&offset=o1&size=z0&offset=oN&size=zN
 }}}

 First, that example is clearly wrong (should be "offset=o0", not
 "offset=o1".).

 Second, it's not made explicit how to match shares, offsets, and sizes.
 Implicitly it seems like the pattern is:

 1. First all the shares.
 2. Then, pairs of offset and size, in same order as shares.

 This pattern should be explicitly documented.

 Third, as part of documenting, it's worth thinking about the pattern in
 context of HTTP server implementations. HTTP server frameworks often don't
 preserve order _between_ parameters, only preserving order for multiple
 values of a single parameter. So e.g.
 `?share=1&share=2&offset=0&offset=100` will look the same as
 `?share=1&offset=0&share=2&offset=100` if you're using Twisted Web (or
 Flask), because you get back a mapping of argument key to list of values.

 Fourth, can I request multiple chunks from same share?

 Fifth, taking a step back and looking at the big picture, I am not certain
 that supporting reading from multiple shares (or more broadly, reading
 multiple chunks) in a single query is actually useful.

 1. It's not clear to me it simplifies the client implementation. Clients
 can just send multiple HTTP requests in parallel.
 2. There are Python API design difficulties of receive arbitrarily sized
 multiple chunks in a single stream.
 3. Using single CBOR result streaming isn't really easy to support.
 Alternative is to concatenate multiple separately CBOR-encoded values
 (i.e. writing multiple results of `cbor2.dumpb` or whatever), which CBOR
 libraries can handle better.
 4. Given support for parallel requests, it doesn't improve latency (though
 it does have minor reduction in bandwidth usage).

--

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3777#comment:3>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage