Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_filesystem_{free,avail}_bytes reporting values larger than node_filesystem_size_bytes #1672

Open
treydock opened this issue Apr 10, 2020 · 4 comments

Comments

@treydock
Copy link
Contributor

Host operating system: output of uname -a

$ uname -r
3.10.0-957.41.1.el7.x86_64

node_exporter version: output of node_exporter --version

$ node_exporter --version
node_exporter, version 1.0.0-rc.0 (branch: master, revision: a57f2465794ec60c40674706acc6c2ace12c1358)
  build user:       tdockendorf@pitzer-rw02.ten.osc.edu
  build date:       20200327-18:45:58
  go version:       go1.13.8

node_exporter command line flags

This is NFS root which produces lots of bind mounts so that is why we have a lot of filesystem ignores.

ExecStart=/usr/bin/node_exporter \
--collector.filesystem.ignored-fs-types=^(gpfs|nfs|nfs4|rootfs|tmpfs|cvmfs2|iso9660|autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$ \
--collector.textfile.directory=/var/lib/node_exporter/textfile_collector \
--collector.systemd.unit-whitelist=.+\.service \
--collector.systemd.unit-blacklist=(mmsdrserv)\.service \
--collector.netclass.ignored-devices=^(eth1|eth2|eth3|ib0|lo)$ \
--collector.filesystem.ignored-mount-points=^/(var/spool|var/log|var/lib/oprofile|var/account|var/cache/opensm|var/cache/ibutils|var/mmfs|var/adm/ras|var/lib/fail2ban|opt/puppetlabs/puppet/cache/clientbucket|opt/puppetlabs/puppet/cache/state|opt/dell/srvadmin/var|var/lib/identityfinder|var/lib/.identityfinder|var/cache/man|var/gdm|var/lib/xkb|var/lib/dbus|var/lib/nfs|var/lib/postfix|var/lib/gssproxy|var/singularity|var/lib/pcp/tmp|etc/lvm/cache|etc/lvm/archive|etc/lvm/backup|var/cache/foomatic|var/cache/logwatch|var/cache/httpd/ssl|var/cache/httpd/proxy|var/cache/php-pear|var/cache/systemtap|var/db/nscd|var/lib/dav|var/lib/dhcpd|var/lib/dhclient|var/lib/php|var/lib/pulse|var/lib/rsyslog|var/lib/ups|var/tmp|var/db/sudo|var/spool/cron|etc/sysconfig/iptables.d|etc/puppetlabs/mcollective|var/lib/node_exporter/textfile_collector|etc/adjtime|var/lib/arpwatch|var/lib/NetworkManager|var/cache/alchemist|var/lib/gdm|var/lib/iscsi|var/lib/ntp|var/lib/xen|var/empty/sshd/etc/localtime|var/lib/random-seed|var/lib/samba|etc/ofed-mic.map|opt/ipcm|usr/bin/turbostat|var/lib/pcp/pmdas/perfevent|var/lib/pcp/pmdas/infiniband|etc/sysconfig/network-scripts|etc/fstab|etc/pam.d|etc/security/access|etc/security/limits.d|etc/X11/xorg.conf.d|var/lib/sss|var/lib/logrotate||dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+|cvmfs/.+|run/.+)($|/) \
--collector.systemd \
--collector.cpu.info \
--collector.mountstats \
--collector.ntp \
--no-collector.hwmon \
--no-collector.mdadm \
--no-collector.nfsd \
--no-collector.softnet \
--no-collector.thermal_zone \
--no-collector.zfs

Are you running node_exporter in Docker?

Not via Docker.

What did you do that produced an error?

Look at a graph in Grafana that uses these metrics. The filesystem avail bytes is an extremely large number and much larger than size bytes.

What did you expect to see?

I would never expect avail or free bytes for a filesystem to exceed the size.

What did you see instead?

The orange line is the avail bytes and the green line that appears to be near 0 is size in bytes.

The size in bytes is 879510155264 which is accurate but the avail bytes is so much larger the scale makes size in bytes look near zero.

Screen Shot 2020-04-10 at 9 06 10 AM

@discordianfish
Copy link
Member

That is odd.. What does df say about /tmp on these systems?

@treydock
Copy link
Contributor Author

[root@o0297 ~]# df /tmp
Filesystem             1K-blocks    Used Available Use% Mounted on
/dev/mapper/vg0-lv_tmp 858896636 1176452 857720184   1% /tmp

The metric for total size remained accurate while the avail/free was the one that was higher than total size. These are HPC compute nodes so it's possible this happened when /tmp was full due to some user doing something they shouldn't but hard to say for sure since the monitoring numbers we rely on were incorrect.

@discordianfish
Copy link
Member

Would be useful to get the raw output of statfs from here: https://github.com/prometheus/node_exporter/blob/master/collector/filesystem_linux.go#L78

Do you see any errors in the node-exporter log? Maybe the mountpoint got stuck leading to this miscalculation. But the code is pretty straight forward, so not sure what is going on here. Maybe some float overflow (https://github.com/prometheus/node_exporter/blob/master/collector/filesystem_linux.go#L109) but I doubt that.

@treydock
Copy link
Contributor Author

I've looked at the code and also can't imagine how this would become a problem as the code is essentially taking values returned by the kernel and doing simple math to get bytes from blocks.

There are no relevant errors in logs. The only logs from node_exporter are from issues generating mountinfo but that's an issue with procfs (prometheus/procfs#282)

Apr  9 03:23:58 o0297 node_exporter: level=error ts=2020-04-09T07:23:58.801Z caller=collector.go:161 msg="collector failed" name=mountstats duration_seconds=0.007737361 err="failed to parse mountinfo: couldn't find enough fields in mount string: 108 53 0:34 / /var/lib/nfs/rpc_pipefs rw,relatime - rpc_pipefs sunrpc rw"

rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 19, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 30, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 30, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 30, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 30, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 30, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/node_exporter that referenced this issue Mar 30, 2024
Handle cases where, owing to multiplying two `uint64` integers and
typecasting it to `float64`, the overall precision is lost when the
values concerned exceed the `floatMantissa64` (1 << 53) before or after
the operation (which is well within the acceptable `uint64` range).

Fixes: prometheus#1672

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants