#857 new defect

Make operation-handle-querying use only a little memory

Reported by: davidsarah Owned by: nobody
Priority: major Milestone: undecided
Component: code-frontend-web Version: 1.5.0
Keywords: memory performance ophandles large Cc:
Launchpad Bug:

Description

The documentation on operation handles starting at source:docs/frontends/webapi.txt@4112#L203 says:

Many "slow" operations can begin to use unacceptable amounts of memory when operation on large directory structures. The memory usage increases when the ophandle is polled, as the results must be copied into a JSON string, sent over the wire, then parsed by a client.

Change History (3)

comment:1 Changed at 2009-12-13T08:09:37Z by davidsarah

  • Component changed from unknown to code-frontend-web

comment:2 Changed at 2009-12-18T00:02:00Z by warner

hm, interesting. I have no idea how to improve this. There are two sources of memory usage. The first is the underlying results list, to which a new record is appended for each file/directory that is traversed. This one grows over time, unrelated to the act of querying the operation.

The second when the operation is queried, and the API specifies a JSON string that basically copies the underlying results list (converting some fields into a more JSON-representable format). The problem here is that the simplejson.dumps() call produces a single large string, probably with StringIO, which will probably use (briefly) about twice the memory as the original results list (one copy for lots of little stringlets, a second copy for the merged result, then the first copy is released).

Maybe there's a way to use simplejson.dump instead (which takes a file-like object as a target for .write() calls), and glue it onto the HTTP response channel. simplejson is going to run synchronously anyways, so it won't save us from one copy of that string (living in the HTTP transport write buffer), but maybe it could save us from the temporary condition of having two copies of that string.

OTOH, maybe we should give up the convenience of doing slow deep traversal operations within the node, and require the webapi client to do it, moving the buffering requirements out to their side. Or, make an API for slow deep traversals that streams the results, but pauses the operation if/when the HTTP channel is lost, to avoid the need to store unclaimed results. Or an API that requests results in chunks, and explicitly releases earlier chunks, so that the node could discard old results that the client has safely retrieved.

comment:3 Changed at 2010-01-15T20:31:45Z by davidsarah

  • Keywords large added
Note: See TracTickets for help on using tickets.