#129 closed defect (fixed)

high memory usage during GET for large files and slow links

Reported by: warner Owned by: warner
Priority: critical Milestone: 0.6.0
Component: code-frontend-web Version: 0.5.1
Keywords: Cc:
Launchpad Bug:

Description

Load testing revealed that doing a GET of a large file through a slow link causes the memory footprint of the decoding node to balloon to the size of the file being downloaded. The cause is simple: decode is outpacing the download, and we're doing naive twisted.web transport.write for each segment. This forces the transport to buffer all of the data that we've written and which the client (in this case a browser on the other end of a DSL line) has not yet received.

I can think of two possible solutions:

  • make sure that decoding is a producer/consumer process. This means we hold off on downloading the shares for a given segment until the consumer (in this case the HTTP connection) says they want more (because their buffer size has dropped below some value). This changes the control flow in download, not coincidentally mirroring a similar change in upload (to support offloaded-uploading #116).
  • have the decode process write the data to a temporary file on disk, and then pass that off to the web transport to read at its leisure (and delete it when finished, using an anonymous filehandle)

Doing producer/consumer probably raises the memory footprint by 1MB for each active download (holding one segment of plaintext in memory while we wait for the client to download it, maybe 2MB if we pipeline the next segment's shares).

The tempfile approach means downloads run full-throttle and then finish, avoiding the memory overhead, but of course then we have a disk overhead of the full file size for the duration of the download. In practice, the kernel will cache these disk files until they get too large, then push them to an actual disk, with a cache size varying according to whatever else is using memory.

I'm inclined to implement the producer/consumer thing, but when I think about it, the kernel is in the best position to make the tradeoff between disk and memory, so it might be a better approach to simply let it do its job. Client behavior has an effect too: if people download half of a large file and then quit and never come back, the tempfile approach means a lot of wasted fetch/decode effort. On the other hand, the tempfile approach makes it a !!!lot!!! easier to keep the tempfile around for a couple of hours in case the client comes back to finish the job. (we'd have to implement Content-Range: on the GET command, but that might not be all that difficult).

Change History (3)

comment:1 Changed at 2007-09-19T04:21:33Z by warner

I've added an automated memory test for this: check out the buildbot "memcheck" builder for the current numbers. As of right now, downloading a 50MB file and pushing it over a slow HTTP 'GET' link causes the node to peak at 89MB.

comment:2 Changed at 2007-09-19T04:21:44Z by warner

  • Owner set to warner
  • Status changed from new to assigned

comment:3 Changed at 2007-09-19T08:12:20Z by warner

  • Milestone changed from undecided to 0.6.0
  • Resolution set to fixed
  • Status changed from assigned to closed

Fixed, in 1340c484c6c60c52. The producer/consumer stuff works great, and the memory footprint is now down to 29MB for a stalled download of a 50MB file (this is within 7% of the footprint of our other 50MB tests).

The new code also handles interrupted downloads extremely gracefully. The segment that is currently downloading completes, then the rest are skipped and the download finishes with a DownloadStopped? exception.

Note: See TracTickets for help on using tickets.