#320 assigned enhancement

add streaming (on-line) upload to HTTP interface

Reported by: warner Owned by: zooko
Priority: major Milestone: eventually
Component: code-encoding Version: 0.8.0
Keywords: streaming performance upload fuse webdav twisted reliability http Cc: jeremy@…, nejucomo@…
Launchpad Bug:

Description (last modified by nejucomo)

In 0.8.0, the upload interfaces visible to HTTP all require the file to be completely present on the tahoe node before any upload work can be accomplished. For a FUSE plugin (talking to a local tahoe node) that provides an open/write/close POSIX-like API to some application, this means that the write() calls all finish quickly, while the close() call takes a long time.

Many applications cannot handle this. These apps enforce timeouts on the close() call on the order of 30-60 seconds. If these apps can handle network filesystems at all, my hunch is that they will be more tolerant of delays in the write() calls than in the close().

This effectively imposes a maximum file size on uploads, determined by the link speed times the close() timeout. Using the helper can improve this by a factor of 'N/k' relative to non-assisted uploads. The current FUSE plugin has a number of unpleasant workarounds that involve lying to the close() call (pretending that the file has been uploaded when in fact it has not), which have a bunch of knock-on effects (like how to handle the subsequent open+read of the file that we've supposedly just written).

To accomodate this better, we need to move the slow part of upload from close() into write(). That means that whatever slow DSL link we're traversing (either ciphertext to the helper or shares to the grid) needs to get data during write().

This requires a number of items:

  • an HTTP interface that will accept partial data.
    • twisted.web doesn't deliver the Request to the Resource until the body has been fully received, so to continue using twisted.web we must either hack it or add something application-visible (like "upload handles" which accept multiple PUTs or POSTs and then a final "close" action).
    • twisted.web2 offers streaming uploads, but 1) it isn't released yet, 2) all the Twisted folks I've spoken to say we shouldn't use it yet, and 3) it doesn't work with Nevow. To use it, we would probably need to include a copy of twisted.web2 with Tahoe, which either means renaming it to something that doesn't conflict with the twisted package, or including a copy of twisted as well.
  • some way to use randomly-generated encryption keys instead of CHK-based ones. At the very least we must make sure that we can start sending data over the slow link before we've read the entire file. The FUSE interface (with open/write/close) doesn't give the FUSE plugin knowledge of the full file before the close() call. Our current helper remote interface requires knowledge of the storage index (and thus the key) before the helper is contacted. This introduces a tension between de-duplication and streaming upload.

I've got more notes on this stuff.. will add them later.

Change History (29)

comment:1 Changed at 2008-03-05T18:35:12Z by warner

Since it looks like twisted.web2 won't be ready for production use for a while (if ever), and the hacks we'd have to make to twisted.web1 would be effectively the same as rewriting twisted.web2, we decided to go with the application-visible approach. This means upload handles.

Rob was more comfortable with server-generated handles than with client-generated ones, so the web-API I'm planning to build will use a series of POSTs like so:

  • POST /upload/open?key=KEYSPEC
    • if KEYSPEC is "CHK", then the server will buffer plaintext until the close, then compute the CHK encryption key. This defeats streaming.
    • if KEYSPEC is "random", then the server will generate a random encryption key. This enables streaming.
    • if KEYSPEC is a 32-byte hexidecimal string, the server will use the equivalent binary form as the encryption key. This enables streaming.
    • if KEYSPEC is a 26-character base32-encoded string, the server will use the equivalent binary form as the encryption key. This enables streaming. This is the same form as the output of 'tahoe dump-cap'.
  • the response body of the /upload/open call is an upload handle, composed entirely of URL-safe ASCII characters. All further calls will use it.
  • POST /upload/$HANDLE
    • the body of the POST will be one chunk of file data. All chunks will be written in order. No seek calls are supported at this time. The Content-Type of the POST can be anything except one of the usual HTML form encoding types (multipart/form-data or application/x-www-form-urlencoded), to prevent the twisted.web request handler from attempting to parse the chunk.
    • The POST will stall if necessary to prevent too much storage from being consumed in the client. If the upload is occurring in a streaming fashion, this will attempt to push the chunk over the slow link before returning, to accomplish the goal of moving the upload time from the close() call to the write() calls.
    • The response body will be empty
  • POST /upload/$HANDLE?close=true
    • the last chunk should be accompanied by ?close=true . This chunk may be empty.
    • the POST will stall until the upload has completed
    • the response body will contain the URI of the uploaded file

This API is all the application needs to know about, but to make streaming work, we need a bit more under the hood. The largest current challenge is that immutable lease requests must be accompanied by an accurate size value, so we can't start encoding until we know the size of the file. That means we can only get streaming with a helper. We need a new helper protocol that will start with a storage index and then push ciphertext to the helper (instead of having the helper pull ciphertext), then tell the helper that we're done. At that point, the helper knows the size of the file, so it can encode and push.

So I'm going to build these two protocols: the POST /upload one and the push-to-helper one, since that will enable streaming in our current most-important use case. Later, we can investigate a different storage-server protocol that will let us declare a maximum size, then push data until we're done, then reset the size to the correct value. With that one in place, we will be able to stream without a helper. Note, however, that CHK (computed by the tahoe node) always disables streaming.

comment:2 Changed at 2008-03-05T20:24:58Z by zooko

Ugh -- I was excited about making Tahoe do streaming using the simple old RESTful API. I'm not very excited about changing the wapi to facilitate streaming. If we're going the direction of extending the wapi to enable more sophisticated file semantics, then we should probably head in the direction of making it be a subset of WebDAV.

http://webdav.org/

Basically, there is value in better streaming performance with current simple wapi, and there is value in a more complex API that allows things like seek() and versioning (i.e. WebDAV), but extending the wapi to do this chunked streaming is a "sour spot" in the trade-off which uglifies the wapi and enables only a little bit of added functionality.

Another reason that I'm unhappy about this decision is that code to handle the current wapi in streaming fashion already exists and works:

http://twistedmatrix.com/trac/browser/branches/web2-new-stream-1937-2

Brian wrote "twisted.web2 won't be ready for production use for a while", but I'm skeptical about what this "ready for production use" actually means concretely -- I think it has more to do with the Twisted project not having working release automation and volunteers to do release management than with there actually being bugs that would prevent that code from sufficing for this ticket.

:-(

comment:3 Changed at 2008-03-06T02:17:51Z by warner

After much discussion and prioritizing, we've decided to back down from this goal, and put this project on hold for a month or more.

The problem that we hoped to solve with this feature was that native apps that use Tahoe through a FUSE plugin could behave badly if the close() call took a long time to fix. A secondary goal was to make the OS's built-in progress bar (for drag-and-drop copies) more accurate. There are three basic approaches we can take:

  1. We do streaming, write() takes a while, close() is fast. However, if we're using a helper, close() still takes about 3MBps, so it isn't instantaneous, and if some windows app has a 30-second timeout on close(), this still limits us to 90MB files. Also this kind of streaming means that we must give up convergence. Progress bars are fairly accurate. Close means close.
  2. No streaming, write() is fast, close() is slow. Apps have problems. Progress bar is wrong. Close means close.
  3. No streaming, and the FUSE plugin quietly implements an asynchronous write cache. write() is fast, close() is fast, apps are happy, progress bar is wrong, close means "we'll work on it".

We decided that approach 3 was the way to go. We plan to implement sync() in the FUSE layer to block until the write cache is empty (at least on systems where it exists.. we aren't yet sure if the SMB protocol that windows-FUSE uses provides such a call). Backup apps are likely to use something like sync() to be sure the data is really flushed out, and therefore they ought to be safe (although they might enforce some other sort of timeout on sync(), who knows).

We'll use a separate progress indication mechanism (a toolbar icon?) to let the user know that the write cache is non-empty, and that therefore they should not shut down their computer quite yet. The FUSE plugin should be able to display status information about its cache and an ETA of how long it will take to finish pushing.

This also ties in to the dirnode batching. If we're batching directory additions to make them go faster, we're doing write caching anyways, and have already committed to making the close() call lie about its completion status.

We may consider exposing tahoe's current-operation progress information in a machine-readable format to the FUSE plugin, so it can include that status in its own. To make this accurate, we need to add some sort of "task-id" (a unique number) to each webapi request. These task-ids can then be put in the JSON status output web page, so the FUSE plugin can correlate the tasks.

comment:4 Changed at 2008-03-06T02:35:16Z by warner

re: twisted.web2 not being ready for a while:

When we asked the twisted.web IRC folks last week, we identified the following problems:

  1. nevow is incompatible with twisted.web2, and nobody expressed interest in fixing nevow, despite the offer of money
  2. twisted.web2 has not been released yet, and nobody expressed interest in releasing it, despite the offer of money. web2 is in a strange place, where its existence is inhibiting work on web1, and the existence of web1 is inhibiting work on web2.
  3. despite the streaming code in twisted.web2 looking functional and (in my mind) well-designed, the consensus among the twisted folks was that it wasn't worth using, and that the code from that web2-new-stream branch might be better. The fact that there exist *two* functional streaming mechanisms and that the twisted community hasn't settled upon either of them makes me even less confident that web2 will be released any time soon. (it feels like they're arguing about things that don't need fixing). I may be completely wrong about this one, though.

Using an unreleased copy of twisted.web2 is difficult, because python's import mechanism makes it hard to have your twisted.internet come from one place and your twisted.web2 come from somewhere else. (setuptools "namespace packages" are one attempt to solve this, as is the divmod "combinator", and both appear to be pretty ugly hacks).

So I think the easiest approach would be to make a private copy of web2 in the allmydata tree, perhaps under allmydata.tw_web2 . To do this, we'd have to touch most of the 103 .py files and change their import statements to pull from allmydata.tw_web2.FOO instead of twisted.web2.FOO . This would make it difficult to apply later upstream patches, although we might get lucky and 'darcs replace' could do much of the work for us. However, I don't trust 'darcs replace' to do this correctly in the long term: I think each upstream update would need to be applied by hand and the results carefully inspected. We'd have to play darcs games (i.e. maintain a separate web2-tracking repo and merge its contents into the tahoe one with some directory-renaming patches) to enable ongoing updates. And we'd have to add 876kB of an external library to the Tahoe source tree, which is already much larger than I'd prefer.

The best outcome would be if the twisted folks made up their mind about web2, made a release, and then made a release of Twisted that included it. Then we could simply declare a dependency upon Twisted-2.6.0 or Twisted-8.0 or whatever they're going to call it this week and we'd be done. But that's certainly not going to happen before we ship 1.0 in a week, and I don't believe it is going to happen within the next three months either.

So, I'm glad that we were able to decide to punt on the streaming features, because I didn't see a happy way to implement them in a single PUT or POST, and I too did not like the multiple-POST app-visible approach described above.

comment:5 Changed at 2008-03-06T20:20:59Z by zooko

Brian, your summary is good. One thing you overlooked is the option of shipping our own entire twisted including twisted.web2, thus avoiding renaming issues.

Also, please be more specific about what you fear might go wrong with using darcs replace. On IRC you said that a potential problem is that the token might not match other uses, for example the token "twisted.web2" wouldn't match "from twisted import web2". This is a valid concern, but I want to be clear that there is nothing buggy or vague or complicated about darcs's replace-token functionality -- you just have to spell out all tokens that you want replaced. There are no funny merge edge cases or anything with token-replace patches.

comment:6 Changed at 2008-03-06T23:24:34Z by warner

Thanks! Yes, we could ship all of twisted with tahoe, at a cost of 853 files, 89 directories, and 7.8MB of python code (roughly 8 times larger than Tahoe itself: 97 files, 7 directories, and 1.2MB in src/allmydata/). In addition, we would be making it more difficult for users (and developers!) to use any other version of twisted along with Tahoe.

We are effectively doing this for/to our Mac and Windows users, by virtue of using py2app/py2exe, for the goal of making a single-file install. For that purpose, I think it's a win, and I wouldn't mind having a custom version of twisted in those application bundles. But for developers I think it would be a loss.

re: 'darcs replace'. My first concern is the set of filenames on which the operations are performed. I believe that darcs requires you to enumerate the filenames when you perform the replace command, and later patches could add files that contain tokens that you want to replace. The 'darcs replace' that renames twisted.web2 with allmydata.tw_web2 in foo.py, performed in January when we first started the process, will not catch the tokens in the new bar.py that got added in a later version of web2 released in June.

My second (weaker) concern is the variety of forms that the import statement might take:

  • from twisted.web2 import stream
  • import stream (ok, this one doesn't require rewriting)
  • from twisted.web2.dav import noneprops
  • import twisted.web2.dav.element.xmlext as ext
  • from twisted import web2 (although I can't find any instances of this)

This mainly depends upon the regexp that darcs uses to define a 'token', versus non-token boundaries. I think that if you just do 'darcs replace twisted.web2 allmydata.tw_web2' then it declares '.' to be a yes-token character, which means it can't be a token-boundary, which means that it won't be replaced in 'twisted.web2.dav'. But there may be a way to explicitly tell darcs what you want to use as an is-a-token regexp.

Once 2.5 and relative imports are more common, there could be other forms, although again it is unlikely that we'd see 'from ..web2.dav import noneprops', since that would be a dumb equivalent of 'from dav import noneprops'.

I don't believe web2 does dynamically-computed import statements, but I think nevow does (using twisted.python.reflect.namedAny, for example). These would also be likely missed by 'darcs replace'.

I haven't used 'darcs replace' enough to be comfortable with it, but I agree that there is nothing buggy or magical about it.

comment:7 Changed at 2008-03-08T04:15:30Z by zooko

  • Milestone changed from 0.9.0 (Allmydata 3.0 final) to 0.10.0

comment:8 Changed at 2008-05-09T00:10:40Z by warner

  • Milestone changed from 1.1.0 to undecided

this isn't going to happen for 1.1.0

comment:9 Changed at 2008-06-01T22:08:58Z by warner

We've discussed some of the storage-server protocol changes that would support this, in http://allmydata.org/pipermail/tahoe-dev/2008-May/000630.html

Also #392 (pipeline upload segments) is related.

comment:10 Changed at 2008-09-24T13:51:29Z by zooko

I mentioned this ticket as one of the most important-to-me improvements that we could make in the Tahoe code: http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html

comment:11 Changed at 2009-02-17T22:50:47Z by zooko

Argh! The lack of this feature just caused me to lose data!

My drive is nearly full on my Macbook Pro. I tried to backup a file to Tahoe so that I could delete that file to make room. While it was uploading, I started editing a very difficult, delicate, emotional letter to the OSI license-discuss mailing list about the Transitive Grace Period Public Licence.

Tahoe tried to make a temporary copy of the large file in order to hash it before uploading it, thus running my system out of disk space and causing the editor that I was using to crash and lose some of the letter I was composing. How frustrating!

The biggest reason why Tahoe doesn't already do streaming uploads was that we liked the "hash it before uploading" it as a way to achieve convergence so that successive uploads of the same file by the same person would not waste upload bandwidth and storage space. Now that we have backupdb, that same goal can be handled much more efficiently (most of the time) by backupdb. Hopefully now we can move to proper streaming upload.

comment:12 Changed at 2009-04-22T18:03:20Z by zooko

#684 is about the part of this in which the client can specify what encryption key to use. There is a patch submitted by Shawn Willden.

comment:13 Changed at 2009-09-10T16:11:39Z by zooko

  • Summary changed from add streaming upload to HTTP interface to add streaming (on-line) upload to HTTP interface

comment:14 Changed at 2009-12-04T18:50:32Z by zooko

If you love this ticket, you might also like #809 (Measure how segment size affects upload/download speed.) and #398 (allow users to disable use of helper: direct uploads might be faster).

comment:15 Changed at 2009-12-12T20:56:10Z by davidsarah

  • Keywords performance upload added

#684 (specifying the encryption key) is wontfixed, but I don't think it would be necessary for this ticket if random keys were used.

comment:16 Changed at 2010-02-11T04:37:53Z by davidsarah

  • Keywords fuse webdav twisted reliability added

comment:17 Changed at 2010-02-23T03:09:22Z by zooko

  • Milestone changed from eventually to 2.0.0

comment:18 follow-up: Changed at 2010-03-11T18:21:59Z by jsgf

  • Cc jeremy@… added

Are uploads using a helper streaming?

comment:19 in reply to: ↑ 18 Changed at 2010-05-15T04:42:19Z by zooko

Replying to jsgf:

Are uploads using a helper streaming?

Currently the Tahoe-LAFS gateway (storage client) receives the entire file plaintext, writes it out in a temp file on disk (while computing the secure hash of it), then generates an encryption key (using that secure hash), then reads it back from the temp file on disk, encrypting as it goes. This is all the same whether you're usikng an immutable upload helper or not. The difference is without the immutable upload helper you also do erasure coding during this second pass while you are doing encryption. With the immutable upload helper you just do the encryption, streaming the ciphertext to the immutable upload helper who does the erasure coding.

comment:20 Changed at 2010-05-16T05:27:25Z by zooko

#294 (make the option of random-key encryption available through the wui and cli) was about a related issue. In order to do streaming upload the Tahoe-LAFS gateway will of course have to do random-key encryption. However, I don't think users actually need to have a switch to control random-key encryption as such, so I've closed #294 and marked it as a duplicate of this ticket.

comment:21 Changed at 2010-05-16T06:05:06Z by zooko

I intend to have a go at this for Tahoe-LAFS v1.8. The part that I'm likely to have the most trouble with is getting access to the first part of the file which has been uploaded from e.g. the web browser to the twisted.web web server before the entire file has been uploaded. There is a longstanding, stale twisted ticket which is in the context of the now abandoned twisted.web2 project:

http://twistedmatrix.com/trac/ticket/1937 # in twisted.web2, change "stream" to use newfangled not yet defined stream api

There may be some other way to get access to the data incrementally before the entire file has been completely uploaded. Help?

comment:22 Changed at 2010-05-16T06:06:32Z by zooko

Other tickets that we would hopefully also be able to close as part of this work:

  • #1032 Display active HTTP upload operations on the status page
  • #951 uploads aren't cancelled by closing the web page
  • #952 multiple simultaneous uploads of the same file

comment:23 Changed at 2010-05-16T18:28:51Z by zooko

  • Milestone changed from 2.0.0 to 1.8.0
  • Owner set to zooko
  • Status changed from new to assigned

comment:24 follow-up: Changed at 2010-05-16T23:35:42Z by davidsarah

  • Keywords http added

In the case of the SFTP frontend, there is no problem with getting at the upload stream, unlike HTTP. So we could implement streaming upload immediately for SFTP at least in some cases (see #1041 for details), if the uploader itself supported it.

Perhaps we should leave this ticket for the issue of getting at the upload stream of an HTTP request in twisted.web (which is what most of the above comments are about), and open a ticket for streaming support in the new uploader. It looks like the current IUploadable interface isn't really suited to streaming (for example it has a get_size method, and it pulls the data when a "push" approach would be more appropriate), so there is some design work to do on that new ticket that is independent of HTTP.

comment:25 Changed at 2010-07-24T05:36:58Z by zooko

  • Milestone changed from 1.8.0 to soon

Although I would dearly love to get this ticket fixed, I think we have enough other important issues in front of us for v1.8.0, so I'm moving this into the "soon" Milestone. If you think you can fix this in the next couple of weeks, move it back into the "1.8" Milestone, but then you either have to move an equivalent mass of tickets out of "1.8" or you have to commit to spending an extra strong dose of volunteer energy to get this fixed. ;-)

comment:26 in reply to: ↑ 24 Changed at 2011-01-03T05:08:55Z by davidsarah

Replying to davidsarah:

Perhaps we should leave this ticket for the issue of getting at the upload stream of an HTTP request in twisted.web (which is what most of the above comments are about), and open a ticket for streaming support in the new uploader.

That ticket is #1288.

comment:27 Changed at 2011-01-27T07:25:36Z by zooko

The correct ticket in the Twisted issue tracker is: http://twistedmatrix.com/trac/ticket/288 (no way to access the data of an upload which is in-progress), not http://twistedmatrix.com/trac/ticket/1937 (in twisted.web2, change "stream" to use newfangled not yet defined stream api).

There is a preliminary patch by exarkun attached to Twisted ticket 288.

comment:28 Changed at 2012-12-06T21:38:08Z by davidsarah

  • Milestone changed from soon to eventually

comment:29 Changed at 2013-07-20T20:25:40Z by nejucomo

  • Cc nejucomo@… added
  • Description modified (diff)
Note: See TracTickets for help on using tickets.