[tahoe-lafs-trac-stream] [tahoe-lafs] #1288: support streaming uploads in uploader
tahoe-lafs
trac at tahoe-lafs.org
Thu Jul 28 13:02:07 PDT 2011
#1288: support streaming uploads in uploader
-------------------------+-------------------------------------------------
Reporter: | Owner:
davidsarah | Status: new
Type: | Milestone: undecided
enhancement | Version: 1.8.1
Priority: major | Keywords: streaming performance upload sftp
Component: code- | fuse reliability
encoding |
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Comment (by zooko):
Replying to [comment:2 davidsarah]:
>
> Well, another possibility is that the client starts to upload the file,
but aborts the upload if it finishes making a pass over the data and
detects that it was already stored. That might make sense if the client is
receiving the file faster than it is able to upload it.
Hey, that is a very good idea. If you're streaming a file to a gateway for
it to encrypt, erasure-code, and distribute among servers, then the
gateway could dynamically choose to what degree it wanted to read the file
from you faster than it can upload it, store it in temporary storage, and
precompute the hash of it for deduplication purposes and to what degree it
wanted to read the file from you only as fast as it could upload it to
storage servers.
> A difficulty here is that without knowing the file's hash, the client
can't determine the optimum set of servers to store shares on. But if the
number of servers on the grid were not much greater than
{{{shares.total}}}, then that might not matter, because it could start
uploading shares to all servers. (Or there could be some cleverer way to
work around this problem that I'm not seeing right now.)
Brian and I have discussed this. I think we should start by conceiving of
the "server selector" as a potentially different thing from the "file
identifier". The former is what you need to have to choose which servers
to contact first. The latter is what you send to a server to indicate to
the server which file out of all the files it knows about.
Only the "server selector" part has to be known before upload begins.
Another fact is that the server selector does not necessarily need to have
a lot of information in it. For example, what if it were a 2-byte random
value? That would define 65,536 ways to search any given set of servers
(e.g. permute the list of servers according to this 2-byte server
selector).
(The "file identifier" part does need to be collision-free: #753.)
There are some notes about these topics: #654, #482, wiki:ServerSelection,
#467, #872.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1288#comment:3>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list