#384 closed enhancement (fixed)

t=deep-size needs rate-limiting

Reported by: warner Owned by:
Priority: major Milestone: 1.1.0
Component: code-performance Version: 1.0.0
Keywords: web Cc:
Launchpad Bug:

Description

The webapi "?t=deep-size" feature (as well as the t=manifest feature from which it is derived) needs to be rate-limited. I saw the prodnet webapi machine fire off about 300 directory retrievals in a single tick, which is enough of a load spike to stall the node for a few dozen seconds.

It might be useful to rebuild something like the old slowjob, but in a form that's easier to use this time around. Maybe an object which accepts a (callable, args, kwargs) tuple, and returns a Deferred that fires with the results. The call is not invoked until later, however, and the object has a limit on the number of simultaneous requests that will be outstanding, or perhaps a maximum rate at which requests will be released.

Change History (2)

comment:1 Changed at 2008-04-16T00:23:09Z by warner

Mike says that he saw similar problems on the windows client, before changing it to offload the t=deep-size queries to the prodnet webapi server. The trouble is, that machine gets overloaded by it too. So managing the parallelism would help both issues.

He saw a request use 50% of the local CPU for about 60 seconds. The same deep-size request took about four minutes when using a remote server, if I'm understanding his message correctly.

One important point to take away is that deep-size should not be called on every modification.. we should really be caching the size of filesystem and applying deltas as we add and remove files, then only doing a full deep-size every once in a while (maybe once a day) to correct the value.

comment:2 Changed at 2008-05-08T18:07:50Z by warner

  • Milestone changed from undecided to 1.1.0
  • Resolution set to fixed
  • Status changed from new to closed

I implemented this, in 3cb361e233054121. I did some experiments to decide upon a reasonable value for the default limit, and settled upon allowing 10 simultaneous requests per call to deep-size.

From my desktop machine (fluxx, Athlon 64 3500+ in 32bit mode), which has a pretty fast pipe to the storage servers in our colo, t=deep-size on a rather large directory tree (~1700 directories, including one that has at least 300 children) takes:

  • limit=2: 2m25s (13 directories per second)
  • limit=5: 2m15s (14.7 dps)
  • limit=10: 2m10s/2m13s/2m14s (15 dps)
  • limit=30: 2m13s/2m14s (15 dps)
  • limit=60: 2m13s (15 dps)
  • limit=120: 2m12s (15.7 dps)
  • limit=9999: 2m06s (16.6 dps)

The same test done from a machine in colo (tahoecs2, P4 3.4GHz), which probably gets lower latency to the storage servers but might have a slower CPU, gets:

  • limit=2: 2m35s/2m32s peak memory 67MB vmsize / 42MB rss
  • iimit=10: 2m37s/2m29s 68MB vmsize / 43MB rss
  • limit=9999: 2m28s/2m52s 122MB vmsize / 100MB rss

So increasing the concurrency limit causes:

  • marginal speedups in retrieval time (<25%), probably because it's filling the pipe better
  • significant increases in memory (2x), because there are lots of dirnode retrivals happening at the same time

Therefore I think limit=10 is a reasonable choice.

It is useful to note that the CPU was pegged at 100% for all trials. The current bottleneck is in the CPU, not the network. I suspect that the mostly-python unpacking of dirnodes is taking up most of the CPU.

Note: See TracTickets for help on using tickets.