Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in OpenBSD CPU stats - Metrics appear to be only ~1/10th of the actual values #2931

Open
paketb0te opened this issue Feb 19, 2024 · 1 comment

Comments

@paketb0te
Copy link
Contributor

Host operating system: output of uname -a

OpenBSD foo 7.3 GENERIC.MP#4 i386

node_exporter version: output of node_exporter --version

foo# node_exporter --version                                                   
node_exporter, version 1.5.0 (branch: non-git, revision: non-git)
  build user:       openbsd_ports
  build date:       2023-03-24
  go version:       go1.20.1
  platform:         openbsd/386

node_exporter command line flags

--web.listen-address=10.0.2.15:9100 --collector.textfile.directory=/tmp/textfile_metrics/

node_exporter log output

n/a

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Run node_exporter as a daemon on OpenBSD

What did you expect to see?

Correct CPU stats in Prometheus

What did you see instead?

Incorrect (from my understanding) CPU stats :)

We found that the expression sum by (cpu) (rate(node_cpu_seconds_total(instance="foo")[1m])) returns values arpund 0.1 instead of 1 (I am aware that this does not necessarily sums to 1 exactly, but something pretty close usually).

So it looks like we are off by a factor of around 10 🤔

Looking into cpu_openbsd.go, I found that the metrics are calculated by using sysctl kern.cp_time / sysctl kern.cp_time2 to get the number of ticks spent in each mode (at least that is what I understood from OpenBSD's sysctl manpage HERE), and then dividing that number by the clock rate (the number of ticks per second), which to me seems correct (although I am not sure about the difference between the "hard clock" and the "statistics clock" mentioned HERE, they are not different enough to explain the observed factor of 10).

So, given a clockrate of 100 hz (100 ticks per second), I would assume that the metrics are each just 1/100th of the values returned by sysctl kern.cp_time.

BUT when directly comparing the values returned from sysctl kern.cp_time with those returned by the exporter, we see they are more like 1/1000th (sysctl kern.cp_time returns the values in the order: interrupt, nice, user, system, spin, idle, see HERE):

foo# sysctl kern.clockrate                                                     
kern.clockrate=tick = 10000, hz = 100, profhz = 1024, stathz = 128
foo# 
foo# sysctl kern.cp_time && curl -s http://10.0.2.15:9100/metrics | grep -i cpu_seconds                                  
kern.cp_time=2391,0,1987,60,117,976656
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 953.765625
node_cpu_seconds_total{cpu="0",mode="interrupt"} 0.1142578125
node_cpu_seconds_total{cpu="0",mode="nice"} 0
node_cpu_seconds_total{cpu="0",mode="spin"} 0.05859375
node_cpu_seconds_total{cpu="0",mode="system"} 1.94140625
node_cpu_seconds_total{cpu="0",mode="user"} 2.3349609375

Dividing the metric values by the values returned from sysctl kern.cp_time gives us 1024 🤔
So to me it appears that somehow we get the wrong value as the clockrate, but I have not been able to figure out where / how exactly that happens - maybe the return values from unix.SysctlRaw("kern.clockrate") get mapped to the wrong fields of the clockinfo struct?

I hope I included enough information for troubleshooting by someone more knowledgeable in golang, please let me know if I can provide any further useful info or assist in any way.

@paketb0te paketb0te changed the title Bug in OpenBSD CPU stats Bug in OpenBSD CPU stats - Metrics appear to be only ~1/10th of the actual values Feb 19, 2024
@paketb0te
Copy link
Contributor Author

I just tested this against a "normal" install of OpenBSD (the i386 .iso from the official sources) instead of the self-compiled image where this behaviour was observed - and the bug was not present there!

So we'll investigate the build steps of our custom image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant