[tahoe-lafs-trac-stream] [tahoe-lafs] #1288: support streaming uploads in uploader

tahoe-lafs trac at tahoe-lafs.org
Thu Jul 28 13:02:07 PDT 2011


#1288: support streaming uploads in uploader
-------------------------+-------------------------------------------------
     Reporter:           |      Owner:
  davidsarah             |     Status:  new
         Type:           |  Milestone:  undecided
  enhancement            |    Version:  1.8.1
     Priority:  major    |   Keywords:  streaming performance upload sftp
    Component:  code-    |  fuse reliability
  encoding               |
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------

Comment (by zooko):

 Replying to [comment:2 davidsarah]:
 >
 > Well, another possibility is that the client starts to upload the file,
 but aborts the upload if it finishes making a pass over the data and
 detects that it was already stored. That might make sense if the client is
 receiving the file faster than it is able to upload it.

 Hey, that is a very good idea. If you're streaming a file to a gateway for
 it to encrypt, erasure-code, and distribute among servers, then the
 gateway could dynamically choose to what degree it wanted to read the file
 from you faster than it can upload it, store it in temporary storage, and
 precompute the hash of it for deduplication purposes and to what degree it
 wanted to read the file from you only as fast as it could upload it to
 storage servers.

 > A difficulty here is that without knowing the file's hash, the client
 can't determine the optimum set of servers to store shares on. But if the
 number of servers on the grid were not much greater than
 {{{shares.total}}}, then that might not matter, because it could start
 uploading shares to all servers. (Or there could be some cleverer way to
 work around this problem that I'm not seeing right now.)

 Brian and I have discussed this. I think we should start by conceiving of
 the "server selector" as a potentially different thing from the "file
 identifier". The former is what you need to have to choose which servers
 to contact first. The latter is what you send to a server to indicate to
 the server which file out of all the files it knows about.

 Only the "server selector" part has to be known before upload begins.
 Another fact is that the server selector does not necessarily need to have
 a lot of information in it. For example, what if it were a 2-byte random
 value? That would define 65,536 ways to search any given set of servers
 (e.g. permute the list of servers according to this 2-byte server
 selector).

 (The "file identifier" part does need to be collision-free: #753.)

 There are some notes about these topics: #654, #482, wiki:ServerSelection,
 #467, #872.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1288#comment:3>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list