You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found that the expression sum by (cpu) (rate(node_cpu_seconds_total(instance="foo")[1m])) returns values arpund 0.1 instead of 1 (I am aware that this does not necessarily sums to 1 exactly, but something pretty close usually).
So it looks like we are off by a factor of around 10 🤔
Looking into cpu_openbsd.go, I found that the metrics are calculated by using sysctl kern.cp_time / sysctl kern.cp_time2 to get the number of ticks spent in each mode (at least that is what I understood from OpenBSD's sysctl manpage HERE), and then dividing that number by the clock rate (the number of ticks per second), which to me seems correct (although I am not sure about the difference between the "hard clock" and the "statistics clock" mentioned HERE, they are not different enough to explain the observed factor of 10).
So, given a clockrate of 100 hz (100 ticks per second), I would assume that the metrics are each just 1/100th of the values returned by sysctl kern.cp_time.
BUT when directly comparing the values returned from sysctl kern.cp_time with those returned by the exporter, we see they are more like 1/1000th (sysctl kern.cp_time returns the values in the order: interrupt, nice, user, system, spin, idle, see HERE):
Dividing the metric values by the values returned from sysctl kern.cp_time gives us 1024 🤔
So to me it appears that somehow we get the wrong value as the clockrate, but I have not been able to figure out where / how exactly that happens - maybe the return values from unix.SysctlRaw("kern.clockrate") get mapped to the wrong fields of the clockinfo struct?
I hope I included enough information for troubleshooting by someone more knowledgeable in golang, please let me know if I can provide any further useful info or assist in any way.
The text was updated successfully, but these errors were encountered:
paketb0te
changed the title
Bug in OpenBSD CPU stats
Bug in OpenBSD CPU stats - Metrics appear to be only ~1/10th of the actual values
Feb 19, 2024
I just tested this against a "normal" install of OpenBSD (the i386 .iso from the official sources) instead of the self-compiled image where this behaviour was observed - and the bug was not present there!
So we'll investigate the build steps of our custom image.
Host operating system: output of
uname -a
OpenBSD foo 7.3 GENERIC.MP#4 i386
node_exporter version: output of
node_exporter --version
node_exporter command line flags
--web.listen-address=10.0.2.15:9100 --collector.textfile.directory=/tmp/textfile_metrics/
node_exporter log output
n/a
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
Run node_exporter as a daemon on OpenBSD
What did you expect to see?
Correct CPU stats in Prometheus
What did you see instead?
Incorrect (from my understanding) CPU stats :)
We found that the expression
sum by (cpu) (rate(node_cpu_seconds_total(instance="foo")[1m]))
returns values arpund0.1
instead of1
(I am aware that this does not necessarily sums to 1 exactly, but something pretty close usually).So it looks like we are off by a factor of around 10 🤔
Looking into
cpu_openbsd.go
, I found that the metrics are calculated by usingsysctl kern.cp_time
/sysctl kern.cp_time2
to get the number of ticks spent in each mode (at least that is what I understood from OpenBSD'ssysctl
manpage HERE), and then dividing that number by the clock rate (the number of ticks per second), which to me seems correct (although I am not sure about the difference between the "hard clock" and the "statistics clock" mentioned HERE, they are not different enough to explain the observed factor of 10).So, given a clockrate of 100 hz (100 ticks per second), I would assume that the metrics are each just 1/100th of the values returned by
sysctl kern.cp_time
.BUT when directly comparing the values returned from
sysctl kern.cp_time
with those returned by the exporter, we see they are more like 1/1000th (sysctl kern.cp_time
returns the values in the order: interrupt, nice, user, system, spin, idle, see HERE):Dividing the metric values by the values returned from
sysctl kern.cp_time
gives us 1024 🤔So to me it appears that somehow we get the wrong value as the clockrate, but I have not been able to figure out where / how exactly that happens - maybe the return values from
unix.SysctlRaw("kern.clockrate")
get mapped to the wrong fields of theclockinfo
struct?I hope I included enough information for troubleshooting by someone more knowledgeable in golang, please let me know if I can provide any further useful info or assist in any way.
The text was updated successfully, but these errors were encountered: