node_cpu_seconds_total reports incorrect cpu utilisation #581

vosdev · 2023-01-17T20:35:26Z

Aleksandr requested I create an issue on the LXD discuss (https://discuss.linuxcontainers.org/t/lxd-5-10-has-been-released/16143/9)

This morning my LXD snap updated from 5.9 to 5.10. Few minutes after the upgrade, my prometheus instance was firing my alert on all LXC instances, stating they are using 85-99% CPU:

100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])) * 100) > 75

I have node_exporter running inside the LXC's, it is version 1.2.2.

My LXD snap is tracking latest/stable. This morning at 03:20 AM prometheus reported the high CPU usage on all my LXC instances. Their actual CPU usage is all below 5%.

minecraft1 ❯ w
 21:33:43 up  1:14,  1 user,  load average: 3.78, 3.08, 2.98

root @ nginx1 # w
21:33:41 up  1:14,  1 user,  load average: 3.76, 3.06, 2.97

The text was updated successfully, but these errors were encountered:

mihalicyn · 2023-01-18T15:35:00Z

Yes, I think you will need to upgrade your node_exporter version when it'll be released

prometheus/procfs#438 [merged]
prometheus/node_exporter#2318 [not merged yet]

This PR makes node_exporter to work correctly with offline-ed CPUs.

I can confirm that the issue (most likely) connected with 4eed66b

As a workaround you can turn on lxcfs.cfs mode by snap set lxd lxcfs.cfs=true. It should help. But you will need to restart all containers/VMs by snap restart lxd.

mihalicyn · 2023-01-18T15:54:50Z

@vosdev if you'll try workaround, please report to us if it helps or not.

vosdev · 2023-01-18T19:49:27Z

I have applied the workaround and it works :)

Thanks!

I'm sure more people will run into this issue. I'm probably not the only one running a similar query on Prometheus

Shall I close the issue?

mihalicyn · 2023-01-18T19:51:17Z

I'm sure more people will run into this issue. I'm probably not the only one running a similar query on Prometheus

I thought about that too. Thanks a lot for reporting this! I hope that folks from the Prometheus project will merge PR with fix soon.

Shall I close the issue?

Yep

vosdev closed this as completed Jan 18, 2023

tomponline mentioned this issue Jan 27, 2023

lxcfs on /sys/devices/system/cpu #585

Closed

deemon87 mentioned this issue Jan 30, 2023

LXD 5.0.2 and Clickhouse client ClickHouse/ClickHouse#45770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_cpu_seconds_total reports incorrect cpu utilisation #581

node_cpu_seconds_total reports incorrect cpu utilisation #581

vosdev commented Jan 17, 2023

mihalicyn commented Jan 18, 2023 •

edited

mihalicyn commented Jan 18, 2023

vosdev commented Jan 18, 2023

mihalicyn commented Jan 18, 2023

node_cpu_seconds_total reports incorrect cpu utilisation #581

node_cpu_seconds_total reports incorrect cpu utilisation #581

Comments

vosdev commented Jan 17, 2023

mihalicyn commented Jan 18, 2023 • edited

mihalicyn commented Jan 18, 2023

vosdev commented Jan 18, 2023

mihalicyn commented Jan 18, 2023

mihalicyn commented Jan 18, 2023 •

edited