[tahoe-lafs-trac-stream] [tahoe-lafs] #320: add streaming (on-line) upload to HTTP interface

tahoe-lafs trac at tahoe-lafs.org
Sat Jul 20 20:25:41 UTC 2013


#320: add streaming (on-line) upload to HTTP interface
-------------------------+-------------------------------------------------
     Reporter:  warner   |      Owner:  zooko
         Type:           |     Status:  assigned
  enhancement            |  Milestone:  eventually
     Priority:  major    |    Version:  0.8.0
    Component:  code-    |   Keywords:  streaming performance upload fuse
  encoding               |  webdav twisted reliability http
   Resolution:           |
Launchpad Bug:           |
-------------------------+-------------------------------------------------
Changes (by nejucomo):

 * cc: nejucomo@… (added)


Old description:

> In 0.8.0, the upload interfaces visible to HTTP all require the file to
> be
> completely present on the tahoe node before any upload work can be
> accomplished. For a FUSE plugin (talking to a local tahoe node) that
> provides
> an open/write/close POSIX-like API to some application, this means that
> the
> write() calls all finish quickly, while the close() call takes a long
> time.
>
> Many applications cannot handle this. These apps enforce timeouts on the
> close() call on the order of 30-60 seconds. If these apps can handle
> network
> filesystems at all, my hunch is that they will be more tolerant of delays
> in
> the write() calls than in the close().
>
> This effectively imposes a maximum file size on uploads, determined by
> the
> link speed times the close() timeout. Using the helper can improve this
> by a
> factor of 'N/k' relative to non-assisted uploads. The current FUSE plugin
> has
> a number of unpleasant workarounds that involve lying to the close() call
> (pretending that the file has been uploaded when in fact it has not),
> which
> have a bunch of knock-on effects (like how to handle the subsequent
> open+read
> of the file that we've supposedly just written).
>
> To accomodate this better, we need to move the slow part of upload from
> close() into write(). That means that whatever slow DSL link we're
> traversing
> (either ciphertext to the helper or shares to the grid) needs to get data
> during write().
>
> This requires a number of items:
>
>   * an HTTP interface that will accept partial data.
>     * twisted.web doesn't deliver the Request to the Resource until the
> body
>       has been fully received, so to continue using twisted.web we must
>       either hack it or add something application-visible (like "upload
>       handles" which accept multiple PUTs or POSTs and then a final
> "close"
>       action).
>     * twisted.web2 offers streaming uploads, but 1) it isn't released
> yet, 2)
>       all the Twisted folks I've spoken to say we shouldn't use it yet,
> and
>       3) it doesn't work with Nevow. To use it, we would probably need to
>       include a copy of twisted.web2 with Tahoe, which either means
> renaming
>       it to something that doesn't conflict with the twisted package, or
>       including a copy of twisted as well.
>
>   * some way to use randomly-generated encryption keys instead of CHK-
> based
>     ones. At the very least we must make sure that we can start sending
> data
>     over the slow link before we've read the entire file. The FUSE
> interface
>     (with open/write/close) doesn't give the FUSE plugin knowledge of the
>     full file before the close() call. Our current helper remote
> interface
>     requires knowledge of the storage index (and thus the key) before the
>     helper is contacted. This introduces a tension between de-duplication
> and
>     streaming upload.
>
> I've got more notes on this stuff.. will add them later.

New description:

 In 0.8.0, the upload interfaces visible to HTTP all require the file to be
 completely present on the tahoe node before any upload work can be
 accomplished. For a FUSE plugin (talking to a local tahoe node) that
 provides
 an open/write/close POSIX-like API to some application, this means that
 the
 write() calls all finish quickly, while the close() call takes a long
 time.

 Many applications cannot handle this. These apps enforce timeouts on the
 close() call on the order of 30-60 seconds. If these apps can handle
 network
 filesystems at all, my hunch is that they will be more tolerant of delays
 in
 the write() calls than in the close().

 This effectively imposes a maximum file size on uploads, determined by the
 link speed times the close() timeout. Using the helper can improve this by
 a
 factor of 'N/k' relative to non-assisted uploads. The current FUSE plugin
 has
 a number of unpleasant workarounds that involve lying to the close() call
 (pretending that the file has been uploaded when in fact it has not),
 which
 have a bunch of knock-on effects (like how to handle the subsequent
 open+read
 of the file that we've supposedly just written).

 To accomodate this better, we need to move the slow part of upload from
 close() into write(). That means that whatever slow DSL link we're
 traversing
 (either ciphertext to the helper or shares to the grid) needs to get data
 during write().

 This requires a number of items:

   * an HTTP interface that will accept partial data.
     * twisted.web doesn't deliver the Request to the Resource until the
 body
       has been fully received, so to continue using twisted.web we must
       either hack it or add something application-visible (like "upload
       handles" which accept multiple PUTs or POSTs and then a final
 "close"
       action).
     * twisted.web2 offers streaming uploads, but 1) it isn't released yet,
 2)
       all the Twisted folks I've spoken to say we shouldn't use it yet,
 and
       3) it doesn't work with Nevow. To use it, we would probably need to
       include a copy of twisted.web2 with Tahoe, which either means
 renaming
       it to something that doesn't conflict with the twisted package, or
       including a copy of twisted as well.

   * some way to use randomly-generated encryption keys instead of CHK-
 based
     ones. At the very least we must make sure that we can start sending
 data
     over the slow link before we've read the entire file. The FUSE
 interface
     (with open/write/close) doesn't give the FUSE plugin knowledge of the
     full file before the close() call. Our current helper remote interface
     requires knowledge of the storage index (and thus the key) before the
     helper is contacted. This introduces a tension between de-duplication
 and
     streaming upload.

 I've got more notes on this stuff.. will add them later.

--

-- 
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/320#comment:29>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list