Opened at 2007-05-03T00:13:13Z
Last modified at 2007-08-11T01:43:07Z
#29 closed defect
web upload uses up lots of RAM — at Version 18
Reported by: | zooko | Owned by: | zooko |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | code | Version: | 0.2.0 |
Keywords: | Cc: | ||
Launchpad Bug: |
Description (last modified by warner)
Uploading files through the 'webish' frontend (with the upload form) results in a memory footprint of at least 2 * filesize. Downloading files might do the same.
Zooko's first observations suggest this might be more like 4x.
The main culprit seems to be the stdlib 'cgi' module, which twisted.web uses to parse the multipart-encoded upload form. The file to be uploaded appears as an input field in this form.
A secondary thing to look at (if/when we fix the upload side) is to make the download side streaming (producer/consumer), to avoid buffering the whole file in the twisted Transport queue.
Change History (18)
comment:1 Changed at 2007-05-03T14:45:33Z by zooko
comment:2 Changed at 2007-05-04T22:21:40Z by zooko
So to be more specific, if I understand correctly that the file is being uploaded by the web browser to the node, then the fix should probably be for the node to encode the file and upload the shares to the blockservers *as* the file is being received by the node, so that the node doesn't actually store more than a small segment of the file in RAM. This also implies that the node (acting as web server) has to be able to read in only as much of the file as it is ready to encode and upload, and leave the rest waiting, rather than greedily read in the entire file at once.
comment:3 Changed at 2007-05-23T22:29:42Z by zooko
- Owner changed from somebody to zooko
- Priority changed from major to critical
- Status changed from new to assigned
I'm upgrading the priority and assigning this to myself, because now that #22 is fixed, this issue is preventing me from sharing large files with my friends and family.
comment:4 Changed at 2007-05-23T22:34:06Z by zooko
http://pyramid.twistedmatrix.com/pipermail/twisted-web/2005-March/001315.html
makes me think that I'll need to rewrite webish to use twisted.web2 in order to do streaming upload. Reading further...
comment:5 Changed at 2007-05-23T22:48:40Z by zooko
I don't really know that it is 4 X. Looking at the code, I guess that it is probably 1 X plus a bit.
comment:6 Changed at 2007-05-23T22:49:26Z by zooko
How about this: there exists some constants c1, c2, and n1, such that for all file sizes n > n1, the RAM usage is greater than c1*n and less than c2*n.
comment:7 Changed at 2007-05-23T23:09:58Z by warner
twisted's http server needs a non-trivial amount of work before it will be convenient for us to access the incoming data before the POST has completed. On the plus side, I believe that twisted.web writes large HTTP bodies to a temporary file (to avoid consuming a lot of memory).
http://twistedmatrix.com/trac/ticket/288 is relevant.. once it is resolved, we should have access to the incoming file in small pieces. We can't use that, however, because our fileid/key-generating passes require access to the whole file. So the best we can reduce our memory footprint, but not our disk footprint. To really reduce the memory footprint, we'd need to use randomly-generated keys, give up on convergent encoding, use a randomly-generated 'storage index', and split the read-and-verify-cryptext capability (the verifierid) into read-crypttext (storage-index) and verify-crypttext (verifierid) capabilities. Not entirely unreasonable, mind you, but it would have significant impact on the mesh as a whole.
http://twistedmatrix.com/trac/ticket/1903 is also relevant: a POST that takes a long long time to complete will run afoul of the timeout. Fortunately it looks like the default value for this timeout is 12 hours.
I think that request.content is a filehandle that either references a StringIO (for small bodies) or a disk-based tempfile (for large bodies), so changing webish.py to use uploader.upload_filehandle(request.content) instead of upload_data(request.content.read()) would fix the memory problem on the upload side.
On the download side, I think the webish.WebDownloadTarget? already does the desired streaming.
Of course, I should really finish implementing those memory-footprint tests so we could watch this memory usage drop once we make this upload_filehandle fix..
comment:8 Changed at 2007-05-24T00:15:52Z by warner
- Version set to 0.2.0
comment:9 Changed at 2007-05-25T03:50:30Z by zooko
The memory-footprint tests are now ticket #54.
comment:10 Changed at 2007-05-27T14:47:07Z by zooko
- Owner changed from zooko to warner
- Status changed from assigned to new
Hey Brian: 04b649f97127b9ce didn't fix the problem (although I think it might have helped a little -- I'm not sure). I'm going to pass this ticket over to you, but feel free to pass it back to me if you think you won't actually be motivated to work on it soon...
comment:11 Changed at 2007-05-30T01:08:57Z by warner
With changeset ea78b4b605568479, running 'make check-memory' gives some preliminary numbers. They only cover upload, and they don't yet make a lot of sense.
The two most obvious places where we could consume memory roughly equal to the uploaded file are when we compute key/fileid/verifierid (allmydata.upload.Uploader.compute_id_strings, which uses a 64kB blocksize), and when we read in a segment for encoding: allmydata.encode.Encoder.MAX_SEGMENT_SIZE is 2MiB, which should result in 8MiB of footprint (we read crypttext in really tiny pieces, and encode it into segsize -sized shares with a 4x expansion).
On one run, uploading a 10MB file causes the peak memory footprint to grow from 24MB to 36MB, although the VmSize? returned to normal (23MB) after the upload finished. On another run, a 10MB file made the peak size grow from 24MB to 45MB, but uploading a 50MB file did not increase the peak size further.
More results as I get them..
Uploading a 50MB file causes the
comment:12 Changed at 2007-05-30T04:38:17Z by warner
I instrumented the upload process directly, grabbing VmSize? and VmPeak? out of /proc/NNN/status at various stages.
It looks like the simultaneous callRemote("put_block") calls are a significant hit, doubling the memory footprint of the encoded shares for a brief while as they flow out the network. For background, Foolscap puts off serialization until as late as possible, but unless/until we create a custom Slicer for shares, strings are still serialized as strings. So we have the 4*SEGMENT_SIZE encoded shares sitting in RAM, then Encoder._encoded_segment() does a batch of callRemotes in parallel, giving one encoded share to each landlord. When the callRemote is processed (which is generally right away, unless we've kicked Foolscap into streaming mode, and there is no API yet to enable that), the arguments are deep-serialized right away, creating a second copy of those 4*SEGMENT_SIZE shares. Since our SEGMENT_SIZE is 2MiB, that means 8MiB.
When Twisted's write() gets the data (specifically twisted.internet.abstract.FileDescriptor.write), it appends the strings to a list, since it is expecting to get lots of tiny strings. Later, when the socket becomes writable, FileDescriptor.doWrite merges all the strings in that list into a single one, and gives a derived buffer object to the socket. For a brief moment, the list of strings and the merged string are alive at the same time, but since this is happening one connection at a time, that should only bump up our footprint by 80kiB. There might be some other places where buffers get copied, but I'm inclined to doubt it.
So given a 2MiB segment size and a 25-of-100 (i.e. 4x) encoding, we've got:
- 2MiB crypttext (in a list of chunks, in Encoder.do_segment)
- 8MiB encoded shares (these overlap since codec.encode's Deferred is returned pre-fired), in a list of 100 80kiB blocks
- 8MiB serialized callRemote arguments (in the transport's _tempDataBuffer)
The serialized callRemote arguments stick around until we've finished writing them all out to the socket. Encoder._encoded_segment uses a DeferredList? for pacing, so we don't do any work on segment 2 until we've finished processing segment 1, so this 2+8+8=18MiB footprint won't overlap from one segment to another.
As an experiment, I modified Encoder._encoded_segment to do the callRemotes in serial, rather than in parallel. The VmPeak? for uploading a 50MiB file dropped from 37MB to 29MB, exactly as expected. Of course, this uses the network very differently and might be faster or slower.
A good thing to keep in mind is that nodes which are uploading files may also be receiving shares for that same file, so you have to add the received-share memory footprint to the sending-share footprint. I hacked the memory test to disable receiving shares to remove the effects of this.
comment:13 Changed at 2007-05-30T06:24:47Z by warner
Thoughts on the received-share memory footprint:
As the share arrives over the wire inside a Foolscap STRING token, memory usage will vary between 80KiB and 160KiB per share (we get a little bit more of the data, notice that we haven't gotten the whole token yet, append the chunk we got to the buffer, repeat until we have the whole token: each append operation creates a copy, after which the old buffer is released). Once the STRING token is finished, it should remain immutable and not copied elsewhere until it is delivered to the remote_put_block method, which will write it to the bucket and then release it. So even if we're sending multiple shares to a single peer (which will always be the case until we get a mesh with more than 100 peers), I wouldn't expect the received-share memory usage to be very large.
comment:14 Changed at 2007-05-30T17:03:27Z by zooko
- Owner changed from warner to zooko
- Status changed from new to assigned
It seems like you didn't reproduce the problem that I was reporting. When I upload a file of 600 million bytes, it uses up more than 700 million bytes of RAM, exceeding the max memory on my system and eventually triggering the arrival of the dreaded Linux Angel of Death -- the OOM killer. If I upload a file of 150 million bytes, it uses up something on the order of 200-300 million bytes of RAM, but it succeeds.
I'm sorry I didn't report this more specifically at the beginning -- I assumed that you had seen the same thing. Good work on the check_memory test! I'll take this ticket back for now... Talk to you soon.
comment:15 Changed at 2007-05-30T17:05:06Z by zooko
It sounds like, from what you write above, that other than this mysterious O(N) RAM usage problem, the other parts of upload, download, and foolscap are already pretty good about using limited memory.
comment:16 Changed at 2007-06-06T20:30:12Z by warner
I nailed it down to a call inside twisted.web.http.Request.requestReceived:
args.update(cgi.parse_multipart(self.content, pdict))
when uploading a 100MB file, the process' memory footprint grows 200MB during that call. self.content is a filehandle (specifically a tempfile.TemporaryFile?()), but apparently the stdlib cgi module is reading it all into memory at some point.
Is there a way to avoid using forms for the upload? Maybe this is just endemic to the web.
Unfortunately, I suspect this will bite an XMLRPC interface as well, unless we build it to use a non-form POST for the data. It'd hit a foolscap interface too, unless/until we make a custom Unslicer that feeds data to a tempfile as it arrives.
comment:17 Changed at 2007-06-06T20:33:38Z by warner
it's also possible that Nevow's form handling passes around a string instead of a filehandle, so if/when we figure out a twisted.web or cgi.py fix, we may also need to investigate the nevow code. My hunch is that tahoe proper is behaving correctly on this front.
comment:18 Changed at 2007-06-07T18:09:37Z by warner
- Description modified (diff)
- Summary changed from uses up lots of RAM to web upload uses up lots of RAM
The core encoding mechanism in Tahoe has been designed to use up a small amount of RAM which does not grow at all as the size of the input file grows. My first guess as to what is using up all this RAM is that the file is being transferred over HTTP from the web browser to the node and then stored in RAM in the node before being encoded.