[tahoe-lafs-trac-stream] [tahoe-lafs] #1288: support streaming uploads in uploader
tahoe-lafs
trac at tahoe-lafs.org
Thu Apr 12 01:16:43 UTC 2012
#1288: support streaming uploads in uploader
-------------------------+-------------------------------------------------
Reporter: | Owner:
davidsarah | Status: new
Type: | Milestone: undecided
enhancement | Version: 1.8.1
Priority: major | Keywords: streaming performance upload sftp
Component: code- | fuse reliability newcaps
encoding |
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by davidsarah):
Zooko, Brian and I [http://fred.submusic.ch/irc/tahoe-
lafs/2012-04-11#i_585815 discussed this again on #tahoe-lafs].
Goals:
a. The SI should act as a verify cap and be sufficient for servers to
verify the whole contents of any share they hold.
b. For converging files, the share with a given shnum should be
recognizable as the same share, and never '''stored''' more than once on a
given server.
c. If the client knows a suitable hash of the file plaintext and
convergence secret before the upload, then it should be able to avoid the
cost of uploading shares to servers that already have them, and also avoid
encryption and erasure coding costs if shares have already been uploaded
to enough servers. (If the protocol involves revealing this hash to
storage servers, then it must not allow performing a guessing attack
against the plaintext without knowing the convergence secret.)
d. If the client learns that hash during the upload, then it should be
able to abort any uploads of shares to servers that already have them.
e. Streaming uploads should be possible, i.e. a client should be able to
start to upload a large file without knowing its whole contents or length.
f. For each file, the ideal share placement should be a pseudorandom
distribution among servers that is a deterministic function of that file
and the convergence secret. It is not necessary for this to be a function
that uses the whole file contents, or for it to be cryptographically
random.
g. When doing an upload, the ideal share placement should be computable
quickly by the client (e.g. using only a prefix of the file contents).
h. When doing a download, the ideal share placement must be computable
from the SI.
i. Each server should be able to account for the space used by a given
accounting principal, including the space that it uses temporarily during
an upload (even if the upload will fail).
j. The sizes of read and write caps should be minimized. The size of a
verify cap/SI is less important but should still be fairly small.
k. A downloader should have sufficient information, given the read cap and
downloaded shares, to be able to check the integrity of the plaintext even
if its decryption and erasure decoding routines are incorrect.
l. The verify cap for a file should be derivable off-line from the read
cap.
m. If deep-verify caps are supported, the deep-verify cap for a file
should be derivable off-line from the read cap, and the verify cap from
the deep-verify vap.
All of these goals can be achieved simultaneously by a variation on [https
://tahoe-lafs.org/~davidsarah/immutable-rainhill-3.svg Rainhill 3], or the
simpler [https://tahoe-lafs.org/~davidsarah/immutable-rainhill-3x.png
Rainhill 3x] that does not support deep-verify caps. For simplicity, I'll
just describe the variation on Rainhill 3x here:
* EncP_R is used as the plaintext hash for goals c and d. The client can
at any time ask whether a server has a share with a given EncP_R. (To do
this lookup efficiently, the storage server must index shares by this as
well as the SI, but that is easy if we're using a database.)
* the server selection index is computed as SSI = Enc[K_R](CS, first
segment of Plain_R, Params), and included as an extra field in all cap
types.
* before uploading, a client reserves a certain amount of space for that
upload with its accounting principal credentials (e.g. using a signed
message). If it did not know the file size in advance and needs more
space, it can increase the reservation.
* during the upload both the client and server compute the SI. At the end,
the server discards the share if it is a duplicate of one it already has;
otherwise, it compares the SI it computed with the one the client tells
it, and retains the share if they match.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1288#comment:6>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list