[tahoe-lafs-trac-stream] [tahoe-lafs] #320: add streaming (on-line) upload to HTTP interface
tahoe-lafs
trac at tahoe-lafs.org
Sat Jul 20 20:25:41 UTC 2013
#320: add streaming (on-line) upload to HTTP interface
-------------------------+-------------------------------------------------
Reporter: warner | Owner: zooko
Type: | Status: assigned
enhancement | Milestone: eventually
Priority: major | Version: 0.8.0
Component: code- | Keywords: streaming performance upload fuse
encoding | webdav twisted reliability http
Resolution: |
Launchpad Bug: |
-------------------------+-------------------------------------------------
Changes (by nejucomo):
* cc: nejucomo@… (added)
Old description:
> In 0.8.0, the upload interfaces visible to HTTP all require the file to
> be
> completely present on the tahoe node before any upload work can be
> accomplished. For a FUSE plugin (talking to a local tahoe node) that
> provides
> an open/write/close POSIX-like API to some application, this means that
> the
> write() calls all finish quickly, while the close() call takes a long
> time.
>
> Many applications cannot handle this. These apps enforce timeouts on the
> close() call on the order of 30-60 seconds. If these apps can handle
> network
> filesystems at all, my hunch is that they will be more tolerant of delays
> in
> the write() calls than in the close().
>
> This effectively imposes a maximum file size on uploads, determined by
> the
> link speed times the close() timeout. Using the helper can improve this
> by a
> factor of 'N/k' relative to non-assisted uploads. The current FUSE plugin
> has
> a number of unpleasant workarounds that involve lying to the close() call
> (pretending that the file has been uploaded when in fact it has not),
> which
> have a bunch of knock-on effects (like how to handle the subsequent
> open+read
> of the file that we've supposedly just written).
>
> To accomodate this better, we need to move the slow part of upload from
> close() into write(). That means that whatever slow DSL link we're
> traversing
> (either ciphertext to the helper or shares to the grid) needs to get data
> during write().
>
> This requires a number of items:
>
> * an HTTP interface that will accept partial data.
> * twisted.web doesn't deliver the Request to the Resource until the
> body
> has been fully received, so to continue using twisted.web we must
> either hack it or add something application-visible (like "upload
> handles" which accept multiple PUTs or POSTs and then a final
> "close"
> action).
> * twisted.web2 offers streaming uploads, but 1) it isn't released
> yet, 2)
> all the Twisted folks I've spoken to say we shouldn't use it yet,
> and
> 3) it doesn't work with Nevow. To use it, we would probably need to
> include a copy of twisted.web2 with Tahoe, which either means
> renaming
> it to something that doesn't conflict with the twisted package, or
> including a copy of twisted as well.
>
> * some way to use randomly-generated encryption keys instead of CHK-
> based
> ones. At the very least we must make sure that we can start sending
> data
> over the slow link before we've read the entire file. The FUSE
> interface
> (with open/write/close) doesn't give the FUSE plugin knowledge of the
> full file before the close() call. Our current helper remote
> interface
> requires knowledge of the storage index (and thus the key) before the
> helper is contacted. This introduces a tension between de-duplication
> and
> streaming upload.
>
> I've got more notes on this stuff.. will add them later.
New description:
In 0.8.0, the upload interfaces visible to HTTP all require the file to be
completely present on the tahoe node before any upload work can be
accomplished. For a FUSE plugin (talking to a local tahoe node) that
provides
an open/write/close POSIX-like API to some application, this means that
the
write() calls all finish quickly, while the close() call takes a long
time.
Many applications cannot handle this. These apps enforce timeouts on the
close() call on the order of 30-60 seconds. If these apps can handle
network
filesystems at all, my hunch is that they will be more tolerant of delays
in
the write() calls than in the close().
This effectively imposes a maximum file size on uploads, determined by the
link speed times the close() timeout. Using the helper can improve this by
a
factor of 'N/k' relative to non-assisted uploads. The current FUSE plugin
has
a number of unpleasant workarounds that involve lying to the close() call
(pretending that the file has been uploaded when in fact it has not),
which
have a bunch of knock-on effects (like how to handle the subsequent
open+read
of the file that we've supposedly just written).
To accomodate this better, we need to move the slow part of upload from
close() into write(). That means that whatever slow DSL link we're
traversing
(either ciphertext to the helper or shares to the grid) needs to get data
during write().
This requires a number of items:
* an HTTP interface that will accept partial data.
* twisted.web doesn't deliver the Request to the Resource until the
body
has been fully received, so to continue using twisted.web we must
either hack it or add something application-visible (like "upload
handles" which accept multiple PUTs or POSTs and then a final
"close"
action).
* twisted.web2 offers streaming uploads, but 1) it isn't released yet,
2)
all the Twisted folks I've spoken to say we shouldn't use it yet,
and
3) it doesn't work with Nevow. To use it, we would probably need to
include a copy of twisted.web2 with Tahoe, which either means
renaming
it to something that doesn't conflict with the twisted package, or
including a copy of twisted as well.
* some way to use randomly-generated encryption keys instead of CHK-
based
ones. At the very least we must make sure that we can start sending
data
over the slow link before we've read the entire file. The FUSE
interface
(with open/write/close) doesn't give the FUSE plugin knowledge of the
full file before the close() call. Our current helper remote interface
requires knowledge of the storage index (and thus the key) before the
helper is contacted. This introduces a tension between de-duplication
and
streaming upload.
I've got more notes on this stuff.. will add them later.
--
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/320#comment:29>
tahoe-lafs <https://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list