Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for several /sys/class/sas_* classes #453

Merged
merged 8 commits into from Dec 17, 2022

Conversation

scottlaird
Copy link
Contributor

This adds support for several SAS classes from /sys/class:

  • sas_host
  • sas_device
  • sas_end_device
  • sas_expander
  • sas_phy
  • sas_port

These are for issue #452, which is needed for prometheus/node_exporter#2386. Once that is complete, it'll be possible to track SAS communication errors on each individual SAS link in a system, and then map those links to specific block devices.

Use case: I recently discovered that I had a bad SAS cable in my system and was seeing weird, intermittent disk timeouts and errors which caused ZFS problems. Being able to track SAS issues for each node should make detecting and debugging this sort of problem much easier.

These parse several SAS classes from /sys/class/
  - sas_host
  - sas_device
  - sas_end_device
  - sas_expander
  - sas_port
  - sas_phy

The included fixtures include examples of all of these.

Signed-off-by: Scott Laird <scott@sigkill.org>
… SAS device address to /dev/sd*.

Note that the fixtures for this rev are still *way* too big; fixtures.ttar went from ~300k to 11M.  Cleaning that up is next.

Signed-off-by: Scott Laird <scott@sigkill.org>
Still about 2x larger than before, but *much* better than the previous commit.  There's a lot of additional SAS data included in /sys/devices, and there's a limit to how much smaller it can get without a lot of work and lost data.

Signed-off-by: Scott Laird <scott@sigkill.org>
I refactored the 3 different /sys/class entries in class_sas_device, because they're almost exactly the same except for the directory name and which devices are included in the directory.

Signed-off-by: Scott Laird <scott@sigkill.org>
@discordianfish
Copy link
Member

Some minor comments but LGTM in general. Thanks!

Signed-off-by: Scott Laird <scott@sigkill.org>
We weren't populating a couple fields due to dumb errors.  Also, I'd failed to add end_device to SASPort, which is needed for block device discovery.

Signed-off-by: Scott Laird <scott@sigkill.org>
@scottlaird
Copy link
Contributor Author

I started writing my node_exporter code, and found a couple bugs, which I just fixed. There are also a couple convenience functions that should really be in here. I may or may not have time for them today.

…n code that uses this library.

Signed-off-by: Scott Laird <scott@sigkill.org>
@scottlaird
Copy link
Contributor Author

Okay, that ended up being bigger than expected. OTOH, the node_exporter code shrank quite a bit, especially the bits where it has to chain through multiple classes to find some random bit of data. Let me know if there's a different pattern that you'd like to see or if there are any other changes that you'd suggest. Thanks.

@scottlaird
Copy link
Contributor Author

@discordianfish want to take another look?

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@scottlaird
Copy link
Contributor Author

@SuperQ any chance this could get a review?

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SuperQ SuperQ merged commit 85417ca into prometheus:master Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants