node_filesystem_{size,avail}_bytes report wrong values for ZFS filesystems #1498

Baughn · 2019-09-24T15:55:48Z

Host operating system: output of `uname -a`

Linux backup-target.atelieraphelion.com 5.0.0-29-generic #31~18.04.1-Ubuntu SMP Thu Sep 12 18:29:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

/nix/store/svm3ypaq5dyznfxr7lhqvk6ymyy0cs9n-node_exporter-0.17.0-bin/bin/node_exporter --version
node_exporter, version (branch: , revision: )
build user:
build date:
go version: go1.12.7

node_exporter command line flags

/nix/store/svm3ypaq5dyznfxr7lhqvk6ymyy0cs9n-node_exporter-0.17.0-bin/bin/node_exporter --web.listen-address 0.0.0.0:9100

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Created a disk-space alert using node_filesystem_avail_bytes{fstype=~"ext4|zfs|xfs"} / node_filesystem_size_bytes < 0.1

What did you expect to see?

The alert should fire if available space is below 10%, as _avail_bytes should be <10% of size_bytes.

What did you see instead?

The alert never fires for ZFS filesystems, because _avail_bytes and _size_bytes are always equal.

The text was updated successfully, but these errors were encountered:

knweiss · 2019-12-07T16:12:05Z

I think this is basically a duplicate of the closed (FreeBSD) issue #1287.

peterjeremy · 2023-08-13T05:51:58Z

Having just been bitten by this bug, the problem is a bit more subtle than implied by the initial description: The node_filesystem_avail_bytes and node_filesystem_size_bytes are being correctly calculated by the node_exporter code but are using stale (cached) values from the kernel.

In more detail, the unix.Getfsstat call specifies MNT_NOWAIT and getfsstat(2) states:

Normally mode should be specified as MNT_WAIT. If mode is set to
MNT_NOWAIT, getfsstat() will return the information it has available
without requesting an update from each file system.

And, having studied changes in node_filesystem_avail_bytes over time, as well as rummaging around in the FreeBSD kernel sources, it seems that the cached data is basically never updated under normal operations. This means that, unless something else (like df) invokes getfsstat with MNT_WAIT or statfs(2), the reported data will reflect the information from when the filesystem was created or mounted - rendering it useless for Prometheus alerting.

As for fixing the issue:

The most obvious "fix" is to use MNT_WAIT instead of MNT_NOWAIT but this runs the risk of blocking indefinitely if (e.g.) a NFS server becomes non-responsive.
A reasonable workaround is probably to stick with using MNT_NOWAIT but explicitly call statfs(2) on each non-NFS (or other "unsafe") filesystem.
I have raised https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273094 suggesting that the current behaviour is a POLA violation.

discordianfish · 2023-08-13T15:55:34Z

The most obvious "fix" is to use MNT_WAIT instead of MNT_NOWAIT but this runs the risk of blocking indefinitely if (e.g.) a NFS server becomes non-responsive.

Feature parity with linux :). I'd say we go with this and consider using the stale mount handling implemented in #997 for linux

discordianfish · 2023-08-13T15:56:47Z

Now I'm confused though. @Baughn seems to run into this on linux, right? Or is there a similar bug in both?

`getfsstat(2)` spec mentions that using `MNT_NOWAIT` will return the information it has available without requesting an update from each file system. Hence, use `MNT_WAIT` in place of the earlier used mode, and make changes to the affected collectors to avoid being stuck for long intervals. Fixes: prometheus#1498 Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>

discordianfish added the bug label Oct 14, 2019

erikschwalbe mentioned this issue Sep 11, 2023

node_filesystem_{size,avail}_bytes is not updated for UFS filesystem #2800

Closed

rexagod linked a pull request Mar 18, 2024 that will close this issue

collector/filesystem: s/MNT_NOWAIT/MNT_WAIT #2960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_filesystem_{size,avail}_bytes report wrong values for ZFS filesystems #1498

node_filesystem_{size,avail}_bytes report wrong values for ZFS filesystems #1498

Baughn commented Sep 24, 2019

knweiss commented Dec 7, 2019

peterjeremy commented Aug 13, 2023

discordianfish commented Aug 13, 2023

discordianfish commented Aug 13, 2023

node_filesystem_{size,avail}_bytes report wrong values for ZFS filesystems #1498

node_filesystem_{size,avail}_bytes report wrong values for ZFS filesystems #1498

Comments

Baughn commented Sep 24, 2019

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

knweiss commented Dec 7, 2019

peterjeremy commented Aug 13, 2023

discordianfish commented Aug 13, 2023

discordianfish commented Aug 13, 2023

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`