Opened at 2008-02-27T20:12:02Z
Last modified at 2013-07-20T20:25:40Z
#320 assigned enhancement
add streaming upload to HTTP interface — at Initial Version
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | eventually |
Component: | code-encoding | Version: | 0.8.0 |
Keywords: | streaming performance upload fuse webdav twisted reliability http | Cc: | jeremy@…, nejucomo@… |
Launchpad Bug: |
Description
In 0.8.0, the upload interfaces visible to HTTP all require the file to be completely present on the tahoe node before any upload work can be accomplished. For a FUSE plugin (talking to a local tahoe node) that provides an open/write/close POSIX-like API to some application, this means that the write() calls all finish quickly, while the close() call takes a long time.
Many applications cannot handle this. These apps enforce timeouts on the close() call on the order of 30-60 seconds. If these apps can handle network filesystems at all, my hunch is that they will be more tolerant of delays in the write() calls than in the close().
This effectively imposes a maximum file size on uploads, determined by the link speed times the close() timeout. Using the helper can improve this by a factor of 'N/k' relative to non-assisted uploads. The current FUSE plugin has a number of unpleasant workarounds that involve lying to the close() call (pretending that the file has been uploaded when in fact it has not), which have a bunch of knock-on effects (like how to handle the subsequent open+read of the file that we've supposedly just written).
To accomodate this better, we need to move the slow part of upload from close() into write(). That means that whatever slow DSL link we're traversing (either ciphertext to the helper or shares to the grid) needs to get data during write().
This requires a number of items:
- an HTTP interface that will accept partial data.
- twisted.web doesn't deliver the Request to the Resource until the body has been fully received, so to continue using twisted.web we must either hack it or add something application-visible (like "upload handles" which accept multiple PUTs or POSTs and then a final "close" action).
- twisted.web2 offers streaming uploads, but 1) it isn't released yet, 2) all the Twisted folks I've spoken to say we shouldn't use it yet, and 3) it doesn't work with Nevow. To use it, we would probably need to include a copy of twisted.web2 with Tahoe, which either means renaming it to something that doesn't conflict with the twisted package, or including a copy of twisted as well.
- some way to use randomly-generated encryption keys instead of CHK-based ones. At the very least we must make sure that we can start sending data over the slow link before we've read the entire file. The FUSE interface (with open/write/close) doesn't give the FUSE plugin knowledge of the full file before the close() call. Our current helper remote interface requires knowledge of the storage index (and thus the key) before the helper is contacted. This introduces a tension between de-duplication and streaming upload.
I've got more notes on this stuff.. will add them later.