Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

illumos/solaris CPU usage is reported in ticks, not seconds #1837

Closed
davepacheco opened this issue Sep 5, 2020 · 1 comment · Fixed by #2963
Closed

illumos/solaris CPU usage is reported in ticks, not seconds #1837

davepacheco opened this issue Sep 5, 2020 · 1 comment · Fixed by #2963
Labels

Comments

@davepacheco
Copy link

Host operating system: output of uname -a

$ uname -a
SunOS lennier 5.11 omnios-r151034-0d278a0cc5 i86pc i386 i86pc

node_exporter version: output of node_exporter --version

$ ./node_exporter --version
node_exporter, version 1.0.1 (branch: master, revision: d8a1585f59ef1169837d08979ecc92dcea8aa58a)
  build user:       dap@lennier
  build date:       20200904-20:16:54
  go version:       go1.14.7

node_exporter command line flags

No command-line flags passed (node_exporter)

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Viewed stat node_cpu_seconds_total.

What did you expect to see?

I expected to see the total number of seconds of idle time for this CPU since boot.

What did you see instead?

I saw the total number of idle ticks for this CPU since boot.


It's easier to look at all the data in one place:

# curl -s localhost:9100/metrics | grep cpu.*idle; kstat -p -m cpu -i 0 -n sys | grep cpu.*idle; kstat | grep nsec_per_tick
node_cpu_seconds_total{cpu="0",mode="idle"} 8.238178e+06
node_cpu_seconds_total{cpu="1",mode="idle"} 8.344892e+06
cpu:0:sys:cpu_nsec_idle 8238179276443
cpu:0:sys:cpu_ticks_idle        8238179
cpu:0:sys:idlethread    3961542
        nsec_per_tick                   1000000

What we see in this snippet is that:

  • node_reporter is reporting 8238178 for "node_cpu_seconds_total" for cpu=0 mode="idle". This stat is documented to be measured in seconds.
  • According to the underlying kstats, the CPU has been idle for 8238179276443 nanoseconds, or 8238.179276443 seconds. The stat is off by a factor of 1,000,000.

Looking at the source, it's pretty clear why:

"idle": "cpu_ticks_idle",
"kernel": "cpu_ticks_kernel",
"user": "cpu_ticks_user",
"wait": "cpu_ticks_wait",

It's pulling the "cpu_ticks_idle" kstat, which is measured in ticks. That's related to seconds by "nsec_per_tick". The above output shows that nsec_per_tick is 1,000,000 on this system, which explains why our output is off by a factor of 1,000,000.

As far as I can tell, this has always been wrong in this way. My guess is that users don't see this if they're always graphing a ratio of the CPU time metrics (e.g., idle / sum_of_all_of_them). You see this if you're trying to calculate idle percent as 100 * node_cpu_seconds_total{mode="idle"}, which should work.

The straightforward solution would be to use the cpu_nsec_{idle,kernel,user,wait} kstats instead of the cpu_ticks_{idle,kernel,user,wait} kstats. I don't know if we'd be worried about this being a breaking change.

CC @dsnt02518 (because you seem to be doing related work in #1803), @jpds (maybe I've misunderstood something here?)

@davepacheco davepacheco changed the title illumos+solaris CPU usage is reported in ticks, not seconds illumos/solaris CPU usage is reported in ticks, not seconds Sep 5, 2020
@SuperQ SuperQ added the bug label Sep 6, 2020
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 19, 2024
Replace all cpu_ticks_* with cpu_nsec_*, since the former was off my a
magnitude of 10e6, and showed incorrect values for
node_cpu_seconds_total.

Fixes: prometheus#1837

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
@rexagod
Copy link
Contributor

rexagod commented Mar 19, 2024

I've opened up a PR purely based on David's research above (and a bit of mine), which should address this bug.

rexagod added a commit to rexagod/node_exporter that referenced this issue May 12, 2024
Replace all cpu_ticks_* with cpu_nsec_*, since the former was off my a
magnitude of 10e6, and showed incorrect values for
node_cpu_seconds_total.

Fixes: prometheus#1837

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
SuperQ pushed a commit that referenced this issue May 15, 2024
Replace all cpu_ticks_* with cpu_nsec_*, since the former was off my a
magnitude of 10e6, and showed incorrect values for
node_cpu_seconds_total.

Fixes: #1837

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
SuperQ pushed a commit that referenced this issue May 21, 2024
Replace all cpu_ticks_* with cpu_nsec_*, since the former was off my a
magnitude of 10e6, and showed incorrect values for
node_cpu_seconds_total.

Fixes: #1837

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
SuperQ pushed a commit that referenced this issue May 21, 2024
Replace all cpu_ticks_* with cpu_nsec_*, since the former was off my a
magnitude of 10e6, and showed incorrect values for
node_cpu_seconds_total.

Fixes: #1837

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
SuperQ pushed a commit that referenced this issue May 21, 2024
Replace all cpu_ticks_* with cpu_nsec_*, since the former was off my a
magnitude of 10e6, and showed incorrect values for
node_cpu_seconds_total.

Fixes: #1837

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants