Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infiniband metrics: still not collected when irdma is loaded (PE 1.7.0) #2846

Open
mtds opened this issue Nov 14, 2023 · 12 comments
Open

Infiniband metrics: still not collected when irdma is loaded (PE 1.7.0) #2846

mtds opened this issue Nov 14, 2023 · 12 comments

Comments

@mtds
Copy link

mtds commented Nov 14, 2023

Host operating system: output of uname -a

Linux (...) 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Host operating system: Rocky Linux 8.8

node_exporter version: output of node_exporter --version

~$ node_exporter --version
node_exporter, version 1.7.0 (branch: HEAD, revision: 7333465abf9efba81876303bb57e6fadb946041b)
  build user:       root@35918982f6d8
  build date:       20231112-23:53:35
  go version:       go1.21.4
  platform:         linux/amd64
  tags:             netgo osusergo static_build

node_exporter command line flags

--no-collector.arp --collector.netdev.device-include=ib0 \
--collector.textfile.directory /var/lib/prometheus/node-exporter/textfile_collector \
--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run|cvmfs|d|u|lustre|WWW|etc|misc)($|/)

node_exporter log output

  • At launch, on a test run with a difference default port used for listening:
~# node_exporter --web.disable-exporter-metrics --web.listen-address=":9111" --log.level=debug --collector.disable-defaults --collector.infiniband
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:192 level=info msg="Starting node_exporter" version="(version=1.7.0, branch=HEAD, revision=7333465abf9efba81876303bb57e6fadb946041b)"
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:193 level=info msg="Build context" build_context="(go=go1.21.4, platform=linux/amd64, user=root@35918982f6d8, date=20231112-23:53:35, tags=netgo osusergo static_build)"
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:195 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unprivileged user, root is not required."
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:198 level=debug msg="Go MAXPROCS" procs=1                                                                                                                                                                 
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:110 level=info msg="Enabled collectors"                                                                                                                                                                   
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:117 level=info collector=infiniband                                                                                                                                                                       
ts=2023-11-13T08:50:17.923Z caller=tls_config.go:274 level=info msg="Listening on" address=0.0.0.0:9111                                                                                                                                                       
ts=2023-11-13T08:50:17.923Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=0.0.0.0:9111

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

There's no error whatsoever: the exporter is just not able to collect IB metrics (see next section).

What did you expect to see?

When the irdma module is not loaded, Node Exporter correctly collects and reports IB metrics:

ts=2023-11-14T10:56:03.868Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"
ts=2023-11-14T10:56:03.874Z caller=collector.go:173 level=debug msg="collector succeeded" name=infiniband duration_seconds=0.006788827

What did you see instead?

Infiniband metrics are not collected when the irdma module is loaded:

(...)
ts=2023-11-13T08:50:33.312Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"                                                                                                                                      
ts=2023-11-13T08:50:33.312Z caller=infiniband_linux.go:119 level=debug collector=infiniband msg="infiniband statistics not found, skipping"                                                                                                                   
ts=2023-11-13T08:50:33.313Z caller=collector.go:167 level=debug msg="collector returned no data" name=infiniband duration_seconds=0.000573153 err="collector returned no data"

Workaround

  • Explicitly unload the irdma module:
modprobe -r irdma

References

@dswarbrick
Copy link
Contributor

dswarbrick commented Nov 15, 2023

For the collector to return no data, it means that the FS.InfiniBandClass function in procfs is returning os.ErrNotExist.

func (c *infinibandCollector) Update(ch chan<- prometheus.Metric) error {
	devices, err := c.fs.InfiniBandClass()
	if err != nil {
		if errors.Is(err, os.ErrNotExist) {
			level.Debug(c.logger).Log("msg", "infiniband statistics not found, skipping")
			return ErrNoData
		}
...

There are multiple places in the InfiniBandClass procfs collector which could potentially return os.ErrNotExist.

Can you please paste a recursive directory listing of your /sys/class/infiniband? It seems that the collector may still be assuming the presence of certain files that are not present with the irdma module.

@mtds
Copy link
Author

mtds commented Nov 16, 2023

~$ ls -lR /sys/class/infiniband
/sys/class/infiniband:
total 0
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma0 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.0/infiniband/irdma0
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma1 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.3/infiniband/irdma1
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma2 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.1/infiniband/irdma2
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 irdma3 -> ../../devices/pci0000:17/0000:17:00.0/0000:18:00.0/0000:19:03.0/0000:1a:00.2/infiniband/irdma3
lrwxrwxrwx. 1 root root 0 Nov  1 15:37 mlx5_0 -> ../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/infiniband/mlx5_0

In comparison, when the irdma module is unloaded, there's only one symbolic link:

~$ ls -lR /sys/class/infiniband
/sys/class/infiniband:
total 0
lrwxrwxrwx. 1 root root 0 Nov  8 12:58 mlx5_0 -> ../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/infiniband/mlx5_0

Content of the directory related to the IB driver:

~$ ls -l /sys/class/infiniband/mlx5_0/
total 0
-r--r--r--. 1 root root 4096 Nov 14 11:55 board_id
lrwxrwxrwx. 1 root root    0 Nov  8 17:02 device -> ../../../0000:3b:00.0
-r--r--r--. 1 root root 4096 Nov 16 12:46 fw_pages
-r--r--r--. 1 root root 4096 Nov 14 11:55 fw_ver
-r--r--r--. 1 root root 4096 Nov 14 11:55 hca_type
-r--r--r--. 1 root root 4096 Nov 16 12:46 hw_rev
-rw-r--r--. 1 root root 4096 Nov 16 12:46 node_desc
-r--r--r--. 1 root root 4096 Nov 13 11:02 node_guid
-r--r--r--. 1 root root 4096 Nov  8 17:02 node_type
drwxr-xr-x. 3 root root    0 Nov  8 12:58 ports
drwxr-xr-x. 2 root root    0 Nov 16 12:46 power
-r--r--r--. 1 root root 4096 Nov 16 12:46 reg_pages
lrwxrwxrwx. 1 root root    0 Nov 10 08:25 subsystem -> ../../../../../../class/infiniband
-r--r--r--. 1 root root 4096 Nov 13 11:02 sys_image_guid
-rw-r--r--. 1 root root 4096 Nov 10 08:25 uevent

The irdmaX sub-directories shows less files:

~$ ls -la /sys/class/infiniband/irdma0/
total 0
drwxr-xr-x. 4 root root    0 Nov  8 15:06 .
drwxr-xr-x. 3 root root    0 Nov  8 15:06 ..
lrwxrwxrwx. 1 root root    0 Nov  8 17:02 device -> ../../../0000:1a:00.0
-r--r--r--. 1 root root 4096 Nov 16 12:46 fw_ver
-rw-r--r--. 1 root root 4096 Nov 16 12:46 node_desc
-r--r--r--. 1 root root 4096 Nov  8 17:02 node_guid
-r--r--r--. 1 root root 4096 Nov  8 17:02 node_type
drwxr-xr-x. 3 root root    0 Nov  8 15:18 ports
drwxr-xr-x. 2 root root    0 Nov 16 12:46 power
lrwxrwxrwx. 1 root root    0 Nov 13 19:51 subsystem -> ../../../../../../../../class/infiniband
-r--r--r--. 1 root root 4096 Nov  8 17:02 sys_image_guid
-rw-r--r--. 1 root root 4096 Nov 13 19:51 uevent

@dswarbrick
Copy link
Contributor

dswarbrick commented Nov 16, 2023

board_id and hca_type are absent for irdmaX devices, but that's fine because the procfs package tolerates that and continues (cf. prometheus/procfs#556).

Can you also dig a bit deeper into the ports directory? The collector looks for state, phys_state and rate files in the enumerated port subdirectories. Can you also list the contents of the counters directory of one of those port subdirectories?

There is one other bit of code in the procfs collector that might be bailing out:

	// Parse legacy counters
	path = filepath.Join(portPath, "counters_ext")
	files, err = os.ReadDir(path)
	if err != nil && !os.IsNotExist(err) {
		return nil, err
	}

There is a good chance that the irdma module does not implement these legacy counters, since it was a ground-up rewrite relatively recently. From a quick peek at the IB module source in kernel 6.6, it seems that only the qib, mlx4, mlx5 and hfi1 drivers expose counters_ext.

@mtds
Copy link
Author

mtds commented Nov 16, 2023

Here are the listing of the ports directories:

  • case for mlx5_0:
cd /sys/class/infiniband
# ls -la mlx5_0/ports/1/
total 0
drwxr-xr-x. 11 root root    0 Nov  8 15:18 .
drwxr-xr-x.  3 root root    0 Nov  8 15:18 ..
-r--r--r--.  1 root root 4096 Nov 16 17:43 cap_mask
drwxr-xr-x.  2 root root    0 Nov 16 17:43 cm_rx_duplicates
drwxr-xr-x.  2 root root    0 Nov 16 17:43 cm_rx_msgs
drwxr-xr-x.  2 root root    0 Nov 16 17:43 cm_tx_msgs
drwxr-xr-x.  2 root root    0 Nov 16 17:43 cm_tx_retries
drwxr-xr-x.  2 root root    0 Nov 16 17:43 counters
drwxr-xr-x.  4 root root    0 Nov  8 17:02 gid_attrs
drwxr-xr-x.  2 root root    0 Nov  8 17:02 gids
drwxr-xr-x.  2 root root    0 Nov 16 17:43 hw_counters
-r--r--r--.  1 root root 4096 Nov  8 17:02 lid
-r--r--r--.  1 root root 4096 Nov  8 17:02 lid_mask_count
-r--r--r--.  1 root root 4096 Nov 16 17:43 link_layer
-r--r--r--.  1 root root 4096 Nov  8 15:18 phys_state
drwxr-xr-x.  2 root root    0 Nov  8 17:02 pkeys
-r--r--r--.  1 root root 4096 Nov  8 15:18 rate
-r--r--r--.  1 root root 4096 Nov 16 17:43 sm_lid
-r--r--r--.  1 root root 4096 Nov 16 17:43 sm_sl
-r--r--r--.  1 root root 4096 Nov  8 15:18 state
#] cd  mlx5_0/ports/1/
#] cat state phys_state rate 
4: ACTIVE
5: LinkUp
100 Gb/sec (2X HDR)
  • case for irdm0 (the other irdmaX expose the same structure)
#]  ls -la irdma0/ports/1/
total 0
drwxr-xr-x. 5 root root    0 Nov  8 15:18 .
drwxr-xr-x. 3 root root    0 Nov  8 15:18 ..
-r--r--r--. 1 root root 4096 Nov 16 17:44 cap_mask
drwxr-xr-x. 4 root root    0 Nov 16 17:44 gid_attrs
drwxr-xr-x. 2 root root    0 Nov  8 17:02 gids
drwxr-xr-x. 2 root root    0 Nov 16 17:44 hw_counters
-r--r--r--. 1 root root 4096 Nov  8 17:02 lid
-r--r--r--. 1 root root 4096 Nov  8 17:02 lid_mask_count
-r--r--r--. 1 root root 4096 Nov 16 17:44 link_layer
-r--r--r--. 1 root root 4096 Nov  8 15:18 phys_state
-r--r--r--. 1 root root 4096 Nov  8 15:18 rate
-r--r--r--. 1 root root 4096 Nov 16 17:44 sm_lid
-r--r--r--. 1 root root 4096 Nov 16 17:44 sm_sl
-r--r--r--. 1 root root 4096 Nov  8 15:18 state
#] cd irdma0/ports/1/
#] cat state phys_state rate 
1: DOWN
3: Disabled
100 Gb/sec (4X EDR)

@dswarbrick
Copy link
Contributor

Aha, I also misread the code I quoted in my previous comment, since it would tolerate os.ErrNotExist for the counters_ext directory.

However, this code will bail out on the irdma devices, since they do not expose a counters directory - only hw_counters (which is currently only parsed for mlx5 devices):

func parseInfiniBandCounters(portPath string) (*InfiniBandCounters, error) {
	var counters InfiniBandCounters

	path := filepath.Join(portPath, "counters")
	files, err := os.ReadDir(path)
	if err != nil {
		return nil, err
	}
...

@mtds
Copy link
Author

mtds commented Nov 16, 2023

I would have assumed that Node Exporter will go through all the paths under /sys/class/infiniband/<Name>, despite the fact that counters is not present for irdmaX cards (not configured in our case).

Why the exporter is giving up (seemingly) after its first try?

@mtds mtds closed this as completed Nov 16, 2023
@mtds mtds reopened this Nov 16, 2023
@dswarbrick
Copy link
Contributor

@mtds The behaviour is due to fairly generic error handling in the procfs code, whereby it bails out upon pretty much any error.

I suspect that the code was originally written by somebody who only had access to Mellanox HCAs, since they are (in my experience) by far the most common IB hardware in use for about the last 10 years. The Intel irdma driver has opted to only implement hw_counters, rather than the older counters described in https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-class-infiniband.

This should be a fairly easy fix, but unfortunately will require another release cycle of both procfs and node_exporter.

@mtds
Copy link
Author

mtds commented Nov 16, 2023

@dswarbrick Thanks, it's clear now. For the time being, I guess we can easily implement the workaround on our side (unload the irdma module and put it into a blacklist).

Should I open a bug report on the procfs repository as well? The problem is indeed on that component and not on the node exporter itself. Or it would generate too much 'noise'?

@dswarbrick
Copy link
Contributor

@mtds I would recommend opening an issue on the procfs repository and reference this one, also keeping it open as a placeholder until a new node_exporter is released with a fix.

@mtds
Copy link
Author

mtds commented Nov 16, 2023

For reference: procfs#589 issue.

@blixuga
Copy link

blixuga commented Apr 5, 2024

Just pulled and built master, even with the procfs issue resolved, node_exporter still does not work if irdma is loaded.

@dswarbrick
Copy link
Contributor

@blixuga Can you please provide debug logs so that we can try to resolve this? The more info, the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants