[tahoe-lafs-trac-stream] [Tahoe-LAFS] #1565: URL formats for HTTP-based storage server

Tahoe-LAFS trac at tahoe-lafs.org
Mon Jul 30 14:33:08 UTC 2018


#1565: URL formats for HTTP-based storage server
------------------------------+--------------------------------
     Reporter:  warner        |      Owner:
         Type:  task          |     Status:  closed
     Priority:  major         |  Milestone:  eventually
    Component:  code-storage  |    Version:  1.9.0b1
   Resolution:  fixed         |   Keywords:  newurls accounting
Launchpad Bug:                |
------------------------------+--------------------------------
Changes (by exarkun):

 * status:  new => closed
 * resolution:   => fixed


Old description:

> Ticket #510 is about speaking to storage servers with mostly-plain HTTP.
> One
> piece of this is deciding what the URLs should look like. Downloading a
> share
> from the storage server should be a simple HTTP "GET", using a
> {{{Range:}}}
> header to fetch less than the whole share. But we also need ways to
> discover
> which shares are available for download, and eventually ways to upload
> data
> to the server too.
>
> Here's the starting point that I implemented in my prototype (which still
> uses Foolscap and get_buckets() to discover shares):
>
> * {{{GET /storage/imm/SI/%(storage_index)s/share/%(shnum)d}}}: retrieves
> data
>   from the given share. Normal downloads use e.g. {{{Range:
>   bytes=87418-131108,422601-422664,423593-423656}}} to fetch a bunch of
>   spans.
> * {{{GET /storage}}}: this currently returns a human-readable page
> describing
>   the state of the storage server.
>
> The next steps:
>
> * {{{GET /storage/imm/SI/%(storage_index)s/shares}}}: return a JSON list
> of
>   share numbers
> * {{{GET /storage/imm/SI/%(storage_index)s/all_shares}}}: return a JSON
>   dictionary mapping share number to a read data vector. The same spans
> are
>   returned for all shares. This collapses the Do-You-Have-Block query
> with
>   the initial data fetch, allowing one-round-trip downloads.
>
> I put "imm" into the URL because the current storage server treats
> immutable
> and mutable shares very differently (they have different container
> formats).
> It's not trivial to take an SI and switch on the type of share that it
> points
> to. It might be cleaner to fix the server to handle this well, and then
> remove the "imm" from the URL. OTOH, it might be better to leave them
> distinct.
>
> We need similar URLs for reading from mutable shares; they can probably
> be
> the same but with "mut" instead of "imm".
>
> We'll need POST URLs for uploading files and modifying mutable shares, as
> well as adding/renewing leases and other storage server methods. The
> request
> bodies will be more complicated since they'll need authorization
> signatures
> or something. But the basic URL target could be:
>
> * {{{POST /storage/imm/SI/%(storage_index)s/shares/%(shnum)d}}}: start
>   uploading the given share. Return 302 FOUND if the share already
> exists.
>   The upload can be spread across multiple requests, with a "finished"
> flag
>   on the last request. This might involve returning an "upload token"
> which
>   subsequent requests must reference.
> * {{{POST /storage/mut/SI/%(storage_index)s/shares/%(shnum)d}}}: modify
> the
>   given mutable share. The body will probably be a signed serialized JSON
>   modification request, basically a write-vector, along with a test-
> vector or
>   other collision-avoidance scheme.
>
> All of this presumes that Accounting is not being enforced on read
> access. At
> least one of the designs I've drawn up offers {{{read=False}}} control,
> as a
> stick for the storage operator to use against a client who doesn't pay
> their
> bills (but still less drastic than {{{store=False}}}, which deletes all
> their
> data). To enforce {{{read=False}}}, the GETs would need to be authorized,
> which either involves adding an extra signature header, or implementing
> them
> with a POST instead (and putting the signature in the request body).

New description:

 Ticket #510 is about speaking to storage servers with mostly-plain HTTP.
 One
 piece of this is deciding what the URLs should look like. Downloading a
 share
 from the storage server should be a simple HTTP "GET", using a
 {{{Range:}}}
 header to fetch less than the whole share. But we also need ways to
 discover
 which shares are available for download, and eventually ways to upload
 data
 to the server too.

 Here's the starting point that I implemented in my prototype (which still
 uses Foolscap and get_buckets() to discover shares):

 * {{{GET /storage/imm/SI/%(storage_index)s/share/%(shnum)d}}}: retrieves
 data
   from the given share. Normal downloads use e.g. `Range:
   bytes=87418-131108,422601-422664,423593-423656` to fetch a bunch of
   spans.
 * {{{GET /storage}}}: this currently returns a human-readable page
 describing
   the state of the storage server.

 The next steps:

 * {{{GET /storage/imm/SI/%(storage_index)s/shares}}}: return a JSON list
 of
   share numbers
 * {{{GET /storage/imm/SI/%(storage_index)s/all_shares}}}: return a JSON
   dictionary mapping share number to a read data vector. The same spans
 are
   returned for all shares. This collapses the Do-You-Have-Block query with
   the initial data fetch, allowing one-round-trip downloads.

 I put "imm" into the URL because the current storage server treats
 immutable
 and mutable shares very differently (they have different container
 formats).
 It's not trivial to take an SI and switch on the type of share that it
 points
 to. It might be cleaner to fix the server to handle this well, and then
 remove the "imm" from the URL. OTOH, it might be better to leave them
 distinct.

 We need similar URLs for reading from mutable shares; they can probably be
 the same but with "mut" instead of "imm".

 We'll need POST URLs for uploading files and modifying mutable shares, as
 well as adding/renewing leases and other storage server methods. The
 request
 bodies will be more complicated since they'll need authorization
 signatures
 or something. But the basic URL target could be:

 * {{{POST /storage/imm/SI/%(storage_index)s/shares/%(shnum)d}}}: start
   uploading the given share. Return 302 FOUND if the share already exists.
   The upload can be spread across multiple requests, with a "finished"
 flag
   on the last request. This might involve returning an "upload token"
 which
   subsequent requests must reference.
 * {{{POST /storage/mut/SI/%(storage_index)s/shares/%(shnum)d}}}: modify
 the
   given mutable share. The body will probably be a signed serialized JSON
   modification request, basically a write-vector, along with a test-vector
 or
   other collision-avoidance scheme.

 All of this presumes that Accounting is not being enforced on read access.
 At
 least one of the designs I've drawn up offers {{{read=False}}} control, as
 a
 stick for the storage operator to use against a client who doesn't pay
 their
 bills (but still less drastic than {{{store=False}}}, which deletes all
 their
 data). To enforce {{{read=False}}}, the GETs would need to be authorized,
 which either involves adding an extra signature header, or implementing
 them
 with a POST instead (and putting the signature in the request body).

--

Comment:

 This has been resolved as part of https://tahoe-lafs.org/trac/tahoe-
 lafs/ticket/2925

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1565#comment:3>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list