Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_cpu_seconds_total reports incorrect cpu utilisation #581

Closed
vosdev opened this issue Jan 17, 2023 · 4 comments
Closed

node_cpu_seconds_total reports incorrect cpu utilisation #581

vosdev opened this issue Jan 17, 2023 · 4 comments

Comments

@vosdev
Copy link

vosdev commented Jan 17, 2023

Aleksandr requested I create an issue on the LXD discuss (https://discuss.linuxcontainers.org/t/lxd-5-10-has-been-released/16143/9)

This morning my LXD snap updated from 5.9 to 5.10. Few minutes after the upgrade, my prometheus instance was firing my alert on all LXC instances, stating they are using 85-99% CPU:

100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])) * 100) > 75

image

I have node_exporter running inside the LXC's, it is version 1.2.2.

My LXD snap is tracking latest/stable. This morning at 03:20 AM prometheus reported the high CPU usage on all my LXC instances. Their actual CPU usage is all below 5%.

minecraft1 ❯ w
 21:33:43 up  1:14,  1 user,  load average: 3.78, 3.08, 2.98
root @ nginx1 # w
21:33:41 up  1:14,  1 user,  load average: 3.76, 3.06, 2.97
@mihalicyn
Copy link
Member

mihalicyn commented Jan 18, 2023

Yes, I think you will need to upgrade your node_exporter version when it'll be released

prometheus/procfs#438 [merged]
prometheus/node_exporter#2318 [not merged yet]

This PR makes node_exporter to work correctly with offline-ed CPUs.

I can confirm that the issue (most likely) connected with 4eed66b

As a workaround you can turn on lxcfs.cfs mode by snap set lxd lxcfs.cfs=true. It should help. But you will need to restart all containers/VMs by snap restart lxd.

@mihalicyn
Copy link
Member

@vosdev if you'll try workaround, please report to us if it helps or not.

@vosdev
Copy link
Author

vosdev commented Jan 18, 2023

I have applied the workaround and it works :)

image

Thanks!

I'm sure more people will run into this issue. I'm probably not the only one running a similar query on Prometheus

Shall I close the issue?

@mihalicyn
Copy link
Member

I'm sure more people will run into this issue. I'm probably not the only one running a similar query on Prometheus

I thought about that too. Thanks a lot for reporting this! I hope that folks from the Prometheus project will merge PR with fix soon.

Shall I close the issue?

Yep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants