New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken retrieval of OpenBSD CPU metrics #1241
Conversation
3973ca3
to
cb2abbf
Compare
Test usage of the patched library on my machine: package main
import (
"fmt"
"log"
"github.com/shirou/gopsutil/v3/cpu"
)
func main() {
perCPUTimes, err := cpu.Times(true)
if err != nil {
log.Fatalln("error getting cpu times:", err)
}
for k, v := range perCPUTimes {
fmt.Printf("cpu %v: %+v\n", k, v)
}
}
|
Thank you for your surveying and PR! I have not checked OpenBSD 7 source code, but is it changed from #647 ? What the value is the |
I don't think anything in the OpenBSD source has changed. In my case, When I set
I have no idea why @omar-polo found this code change necessary. It appears that he may have been testing under QEMU--it's not clear. |
Some info about my machine. It appears that my old Core i5-3570 does not support hyperthreading and thus, the setting of Is it possible that this conditional is just incorrect? Perhaps it should look like this: if hasSMT {
j *= 2
} In other words, skip every other CPU when SMT is enabled.
|
No, I think that check is necessary. For example, on my machine:
then, looking at top(1)
since we retrieve the number of cpus using AFAICS it still works correctly here:
However, it's been a while since I worked on that code, and at glance there are a couple of things I don't like/shouldn't be needed anymore. I'll check out your pr and do some test soon, thanks :) |
OK, I think I see what's going on here. @omar-polo both you and I have four physical cores and SMT = 0. The difference is that your https://github.com/shirou/gopsutil/blob/master/cpu/cpu_openbsd.go#L164-L168 The problem is that the loop is based off Sound good? I can make a fix. |
@omar-polo if you set |
@chrissnell please try #1244, I think I've got it right this time :) The problem is in the assumption that if SMT is disabled (the default) then only even CPUs are online. I've changed my assumption to mirror what top does: https://github.com/openbsd/src/blob/master/usr.bin/top/machine.c#L258-L269 it uses Anyway, to reply to your question, yes, changing hw.stm alters the output from hw.ncpuonline (which is raised from 4 to 8 when smt is enabled) |
Sorry, I'm struggling with this: does anyone know how to replace the shirou/gopsutil dependency with the @omar-polo fork? I tried a few things with go.mod but it doesn't seem to work. I'm attempting to build telegraf with the forked repo. |
I don't use telegraf so I can't confirm, but appending
to telegraf' EDIT: i'm not sure if |
Sadly, the Can we just merge this as-is? |
I think #1244 includes this PR. So we can close this PR. Please let me know if I am wrong. |
Yes, my pr has the proper fix for this issue. Sorry for the delay, I've been a bit busy, I'll look into closing the final points for #1244 right now :) |
Closing as per discussion and #1244 |
CPU metrics are broken on OpenBSD 7.0-CURRENT as described in #1239. I believe that the problem is the multiplication of the CPU incrementer by 2 when the code walks through the available CPUs to gather CPU time metrics. The code makes uses of the KERN_CPTIME2 sysctl to fetch per-CPU time metrics.
Looking at the kernel code behind this particular sysctl, we have this.. As you can see, it's calling a
CPU_INFO_FOREACH
macro and iterating through the results until it has the info for the requested CPU. This macro is defined per-architecture and in every one that I found, it's just iterating throughcpu_info
structs defined here.I can't find anything that leads me to believe that this Go code should be incrementing by anything other than 1.
In the entire OpenBSD codebase, there's only one use of the KERN_CPTIME2 sysctl and that's in
snmpd
's source code, which iterates through the CPUs and it does this by incrementing the counter by 1.This particular code ^ hasn't changed since 2016 and I'll bet it still works. :-)
Here's another clue. As I mentioned, the guts of this syscall is just iterating through
cpu_info
struct. I looked around for other uses of this struct and found code for the new OpenBSD debugger. There are different implementations for different architectures, but for amd64, it also increments through CPUs by 1.Here is a PR to fix this. I only have one OpenBSD machine to test on, a simple Dell Optiplex workstation.