Opened at 2008-06-23T19:31:56Z
Last modified at 2020-12-09T14:14:30Z
#472 closed defect
cpu monitor sometimes shows negative numbers — at Version 5
Reported by: | warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | code-nodeadmin | Version: | 1.1.0 |
Keywords: | statistics reliability | Cc: | |
Launchpad Bug: |
Description (last modified by exarkun)
On some of our production machines, the cpu watcher is displaying weird numbers: maybe there's a small field that wraps.
'stats': {'cpu_monitor.15min_avg': 0.019444445629178814, 'cpu_monitor.1min_avg': 0.0096666806870237352, 'cpu_monitor.5min_avg': 0.0070999994301001535, 'cpu_monitor.total': -273.11918400000002,
Change History (5)
comment:1 Changed at 2008-06-23T19:32:43Z by warner
comment:2 follow-up: ↓ 4 Changed at 2008-09-09T02:50:46Z by warner
Zooko did some digging, and learned that the clock(3) syscall returns a signed 32-bit number (which counts microseconds), which means it will wrap after 72 minutes of 100% CPU. It is not clear how python will handle this (since python has no native signed datatypes).
So, task number one: write a program that burns 100% CPU and prints out the value of time.clock() every minute: see what happens after 72 minutes, and figure out how to deal with that.
Task number two: change src/allmydata/stats.py . It currently records time.clock() every minute and looks for 1/5/15 minute deltas later. It needs to do something more complicated: perhaps record the delta between one-minute samples (clipping negative jumps that occur at wraparound, or doing some clever math to compensate for the wraparound), and then sum the last 1/5/15 values.
comment:3 Changed at 2010-02-11T03:52:34Z by davidsarah
- Keywords statistics reliability added
comment:4 in reply to: ↑ 2 Changed at 2011-07-23T02:28:50Z by davidsarah
comment:5 Changed at 2020-12-09T14:14:25Z by exarkun
- Description modified (diff)
This ticket is about operational visibility for operations that are no longer operational.
incidentally, the -273 number is getting less negative over time (a little while later it was at -271)