[tahoe-lafs-trac-stream] [tahoe-lafs] #1288: support streaming uploads in uploader

tahoe-lafs trac at tahoe-lafs.org
Thu Apr 12 01:16:43 UTC 2012


#1288: support streaming uploads in uploader
-------------------------+-------------------------------------------------
     Reporter:           |      Owner:
  davidsarah             |     Status:  new
         Type:           |  Milestone:  undecided
  enhancement            |    Version:  1.8.1
     Priority:  major    |   Keywords:  streaming performance upload sftp
    Component:  code-    |  fuse reliability newcaps
  encoding               |
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by davidsarah):

 Zooko, Brian and I [http://fred.submusic.ch/irc/tahoe-
 lafs/2012-04-11#i_585815 discussed this again on #tahoe-lafs].

 Goals:
 a. The SI should act as a verify cap and be sufficient for servers to
 verify the whole contents of any share they hold.
 b. For converging files, the share with a given shnum should be
 recognizable as the same share, and never '''stored''' more than once on a
 given server.
 c. If the client knows a suitable hash of the file plaintext and
 convergence secret before the upload, then it should be able to avoid the
 cost of uploading shares to servers that already have them, and also avoid
 encryption and erasure coding costs if shares have already been uploaded
 to enough servers. (If the protocol involves revealing this hash to
 storage servers, then it must not allow performing a guessing attack
 against the plaintext without knowing the convergence secret.)
 d. If the client learns that hash during the upload, then it should be
 able to abort any uploads of shares to servers that already have them.
 e. Streaming uploads should be possible, i.e. a client should be able to
 start to upload a large file without knowing its whole contents or length.
 f. For each file, the ideal share placement should be a pseudorandom
 distribution among servers that is a deterministic function of that file
 and the convergence secret. It is not necessary for this to be a function
 that uses the whole file contents, or for it to be cryptographically
 random.
 g. When doing an upload, the ideal share placement should be computable
 quickly by the client (e.g. using only a prefix of the file contents).
 h. When doing a download, the ideal share placement must be computable
 from the SI.
 i. Each server should be able to account for the space used by a given
 accounting principal, including the space that it uses temporarily during
 an upload (even if the upload will fail).
 j. The sizes of read and write caps should be minimized. The size of a
 verify cap/SI is less important but should still be fairly small.
 k. A downloader should have sufficient information, given the read cap and
 downloaded shares, to be able to check the integrity of the plaintext even
 if its decryption and erasure decoding routines are incorrect.
 l. The verify cap for a file should be derivable off-line from the read
 cap.
 m. If deep-verify caps are supported, the deep-verify cap for a file
 should be derivable off-line from the read cap, and the verify cap from
 the deep-verify vap.

 All of these goals can be achieved simultaneously by a variation on [https
 ://tahoe-lafs.org/~davidsarah/immutable-rainhill-3.svg Rainhill 3], or the
 simpler [https://tahoe-lafs.org/~davidsarah/immutable-rainhill-3x.png
 Rainhill 3x] that does not support deep-verify caps. For simplicity, I'll
 just describe the variation on Rainhill 3x here:

 * EncP_R is used as the plaintext hash for goals c and d. The client can
 at any time ask whether a server has a share with a given EncP_R. (To do
 this lookup efficiently, the storage server must index shares by this as
 well as the SI, but that is easy if we're using a database.)
 * the server selection index is computed as SSI = Enc[K_R](CS, first
 segment of Plain_R, Params), and included as an extra field in all cap
 types.
 * before uploading, a client reserves a certain amount of space for that
 upload with its accounting principal credentials (e.g. using a signed
 message). If it did not know the file size in advance and needs more
 space, it can increase the reservation.
 * during the upload both the client and server compute the SI. At the end,
 the server discards the share if it is a duplicate of one it already has;
 otherwise, it compares the SI it computed with the one the client tells
 it, and retains the share if they match.

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1288#comment:6>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list