#472 closed defect

cpu monitor sometimes shows negative numbers — at Version 5

Reported by: warner Owned by:
Priority: major Milestone: undecided
Component: code-nodeadmin Version: 1.1.0
Keywords: statistics reliability Cc:
Launchpad Bug:

Description (last modified by exarkun)

On some of our production machines, the cpu watcher is displaying weird numbers: maybe there's a small field that wraps.

'stats': {'cpu_monitor.15min_avg': 0.019444445629178814,
           'cpu_monitor.1min_avg': 0.0096666806870237352,
           'cpu_monitor.5min_avg': 0.0070999994301001535,
           'cpu_monitor.total': -273.11918400000002,

Change History (5)

comment:1 Changed at 2008-06-23T19:32:43Z by warner

incidentally, the -273 number is getting less negative over time (a little while later it was at -271)

comment:2 follow-up: Changed at 2008-09-09T02:50:46Z by warner

Zooko did some digging, and learned that the clock(3) syscall returns a signed 32-bit number (which counts microseconds), which means it will wrap after 72 minutes of 100% CPU. It is not clear how python will handle this (since python has no native signed datatypes).

So, task number one: write a program that burns 100% CPU and prints out the value of time.clock() every minute: see what happens after 72 minutes, and figure out how to deal with that.

Task number two: change src/allmydata/stats.py . It currently records time.clock() every minute and looks for 1/5/15 minute deltas later. It needs to do something more complicated: perhaps record the delta between one-minute samples (clipping negative jumps that occur at wraparound, or doing some clever math to compensate for the wraparound), and then sum the last 1/5/15 values.

comment:3 Changed at 2010-02-11T03:52:34Z by davidsarah

  • Keywords statistics reliability added

comment:4 in reply to: ↑ 2 Changed at 2011-07-23T02:28:50Z by davidsarah

Replying to warner:

Zooko did some digging, and learned that the clock(3) syscall returns a signed 32-bit number (which counts microseconds), which means it will wrap after 72 minutes of 100% CPU.

~36 minutes, no? (231 / 1000000 / 60)

comment:5 Changed at 2020-12-09T14:14:25Z by exarkun

  • Description modified (diff)

This ticket is about operational visibility for operations that are no longer operational.

Note: See TracTickets for help on using tickets.