Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_exporter collector failure for zfs err="couldn't get sysctl: no such file or directory" #2847

Open
void-fm opened this issue Nov 15, 2023 · 3 comments

Comments

@void-fm
Copy link

void-fm commented Nov 15, 2023

Host operating system: output of uname -a

FreeBSD stable/14-n265566 aarch64 1400500 1400500

node_exporter version: output of node_exporter --version

% node_exporter --version
node_exporter, version (branch: , revision: unknown)
build user:
build date:
go version: go1.20.8
platform: freebsd/arm64
tags: unknown

node_exporter command line flags

(defaults)

node_exporter log output

Nov 15 17:14:22 REDACTED node_exporter[58452]: ts=2023-11-15T17:14:22.737Z caller=collector.go:169 level=error msg="collector failed" name=zfs duration_seconds=0.000465773 err="couldn't get sysctl: no such file or directory"

(every 15 seconds in /var/log/daemon.log)

Workaround for now is editing /usr/local/etc/rc.d/node_exporter and finding this line:

: ${node_exporter_args:=""

and editing it like so:

: ${node_exporter_args:="--no-collector.zfs"}
then restarting node_collector.

@eekay35
Copy link

eekay35 commented Dec 8, 2023

I also found this error after upgrading to FreeBSD 14.0-RELEASE (at least, I hadn't noticed it before that). You shouldn't be editing the RC file, though. Could cause problems in the future and will be overwritten on next node_exporter update. Just add the args line to /etc/rc.conf and restart the node_exporter service:

sysrc node_exporter_args="--no-collector.zfs"
service node_exporter restart

dekimsey added a commit to dekimsey/node_exporter that referenced this issue Mar 23, 2024
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to prometheus#2847

Signed-off-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
@dekimsey
Copy link
Contributor

dekimsey commented Mar 23, 2024

I pulled the list of mibs being scanned and passed them to sysctl on my FreeBSD 14.0 box, looks like kstat.zfs.misc.arcstats.p is the missing oid. The code suggests this is known, but it doesn't stop trying to access the value so at least the error is simply noisy, the rest of the zfs stats are being collected normally.

Additionally, I spiked a quick change that would use getUname to grab the OS Release and then add the appropriate sysctl stats. It's not clear to me if that is a road the project would want to go. I'd be more comfortable proposing it if I had more samples from the other *BSDs to allow a major minor function to return something sensible, but I don't have any.

@eekay35
Copy link

eekay35 commented Mar 27, 2024

Excellent, thank you! Once this new code goes into place via FBSD's ports/packages, I'll give it a try and verify. Looks good, though. I (and likely many others) appreciate it!

discordianfish pushed a commit that referenced this issue Apr 10, 2024
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to #2847

Signed-off-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
gitperr pushed a commit to gitperr/node_exporter that referenced this issue Apr 30, 2024
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to prometheus#2847

Signed-off-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
gitperr pushed a commit to gitperr/node_exporter that referenced this issue Apr 30, 2024
Signed-off-by: David O'Rourke <david.orourke@gmail.com>

chore:remove constant from function (prometheus#2884)

Signed-off-by: tyltr <tylitianrui@126.com>

build(deps): bump github.com/jsimonetti/rtnetlink from 1.4.0 to 1.4.1 (prometheus#2909)

Bumps [github.com/jsimonetti/rtnetlink](https://github.com/jsimonetti/rtnetlink) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/jsimonetti/rtnetlink/releases)
- [Commits](jsimonetti/rtnetlink@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: github.com/jsimonetti/rtnetlink
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

fix hwmon nil ptr (prometheus#2873)

* fix hwmon nil ptr

syslink maybe lost in some cases.

---------

Signed-off-by: TaoGe <6657718+yowenter@users.noreply.github.com>

Fix hwmon error capture (prometheus#2915)

Fix golangci-lint "ineffectual assignment" by correctly capturing any
errors within the hwmon gathering loop.

Signed-off-by: Ben Kochie <superq@gmail.com>

Update common Prometheus files (prometheus#2917)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

Revert "Add ZFS freebsd per dataset stats (prometheus#2753)" (prometheus#2925)

This reverts commit f34aaa6.

Signed-off-by: Caleb Webber <caleb@codingthemsoftly.com>

filesystem: fix mountTimeout not working issue (prometheus#2903)

Signed-off-by: DongWei <jiangxuege@hotmail.com>

Fix description for NodeDiskIOSaturation alert (prometheus#2929)

NodeDiskIOSaturation description should say 30m per the "for" clause

Signed-off-by: Taylor Sly <slyt@users.noreply.github.com>

Enforce no subprocess policy (prometheus#2926)

Add depguard to golangci-lint to enforce the no-os/exec policy.

Signed-off-by: Ben Kochie <superq@gmail.com>

filesystem: surface device errors (prometheus#2923)

filesystem: surface filesystem device error

Fixes: prometheus#2918
---------

Signed-off-by: Pamela Mei i540369 <pamela.mei@sap.com>

Revert "filesystem: fix mountTimeout not working issue (prometheus#2903)" (prometheus#2932)

This reverts commit 9f1f791.

Signed-off-by: Ben Kochie <superq@gmail.com>

Update common Prometheus files (prometheus#2939)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

Update common Prometheus files (prometheus#2946)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

Update common Prometheus files (prometheus#2949)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

Add multi-cluster support for Nodes dashboard (prometheus#2945)

Signed-off-by: Adrian Berger <adria.berger94@gmail.com>

disable selinux,fix end-to-end-test.sh error(prometheus#2934) (prometheus#2937)

Signed-off-by: heyitao <heyitao@uniontech.com>
Co-authored-by: heyitao <heyitao@uniontech.com>

Add new collector and metrics for watchdog (prometheus#2309) (prometheus#2880)

Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>

Enable watchdog module by default; Add no data error (prometheus#2953)

Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>

Update common Prometheus files (prometheus#2954)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

build(deps): bump google.golang.org/protobuf from 1.32.0 to 1.33.0 (prometheus#2955)

Bumps google.golang.org/protobuf from 1.32.0 to 1.33.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Update common Prometheus files (prometheus#2959)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

Sanitize ethtool metric name keys

Apply the same metric name sanitization to the keys as to the metric
names. This avoids conflicting help strings in the metric registry.

Fixes: prometheus#2893

Signed-off-by: Ben Kochie <superq@gmail.com>

Update common Prometheus files

Signed-off-by: prombot <prometheus-team@googlegroups.com>

chore: fix some typos (prometheus#2974)

Signed-off-by: occupyhabit <wangmengjiao@outlook.com>

collector/textfile: Avoid inconsistent help-texts (prometheus#2962)

Avoid metrics with inconsistent help-texts. The earlier behaviour has
been preserved in the sense that the first encountered instance is still
used to generate metrics, whereas the subsequent inconsistent ones are
ignored along with a few peripheral changes.

```
 # HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
 #TYPE node_scrape_collector_duration_seconds gauge
 node_scrape_collector_duration_seconds{collector="textfile"} 0.0004005
 # HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
 # TYPE node_scrape_collector_success gauge
 node_scrape_collector_success{collector="textfile"} 1
 # HELP node_textfile_mtime_seconds Unixtime mtime of textfiles successfully read.
 # TYPE node_textfile_mtime_seconds gauge
 node_textfile_mtime_seconds{file="/Users/rexagod/repositories/misc/node_exporter/ne-bar.prom"} 1.710812009e+09
 node_textfile_mtime_seconds{file="/Users/rexagod/repositories/misc/node_exporter/ne-foo.prom"} 1.710811982e+09
 # HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
 # TYPE node_textfile_scrape_error gauge
 node_textfile_scrape_error 1
 # HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
 # TYPE promhttp_metric_handler_errors_total counter
 promhttp_metric_handler_errors_total{cause="encoding"} 0
 promhttp_metric_handler_errors_total{cause="gathering"} 0
 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
 # TYPE promhttp_metric_handler_requests_in_flight gauge
 promhttp_metric_handler_requests_in_flight 1
 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
 # TYPE promhttp_metric_handler_requests_total counter
 promhttp_metric_handler_requests_total{code="200"} 0
 promhttp_metric_handler_requests_total{code="500"} 0
 promhttp_metric_handler_requests_total{code="503"} 0
 # HELP tau_infrastructure_performing_maintenance_task At what timestamp a given task started or stopped, the last time it was run.
 # TYPE tau_infrastructure_performing_maintenance_task gauge
 tau_infrastructure_performing_maintenance_task{main_task="nightly",start_or_stop="start",sub_task="main"} 1.64728080198446e+09
```

Fixes: prometheus#2317

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>

Update common Prometheus files (prometheus#2973)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

zfs: Log mib when sysctl read fails on FreeBSD

When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to prometheus#2847

Signed-off-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>

chore: fix typo in comment

Signed-off-by: looklose <shishuaiqun@yeah.net>

fibre_channel: update procfs to take into account optional attributes (prometheus#2933)

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

refactor: Optimize code by using built-in constants in the standard library (prometheus#2989)

Signed-off-by: coderwander <770732124@qq.com>

os_release.go: Removed caching of modtime/filename of os-release file. (prometheus#2987)

Signed-off-by: Jonathan Davies <jpds@protonmail.com>

fix: data race of NetClassCollector metrics initialization when multiple requests happen (prometheus#2995)

Signed-off-by: John Guo <john@johng.cn>

Update common Prometheus files (prometheus#2992)

Signed-off-by: prombot <prometheus-team@googlegroups.com>

Update build (prometheus#3000)

* Update Go to 1.22.
* Update Go modules.
* Use new version collector.
* Use standard library slices package.

Signed-off-by: Ben Kochie <superq@gmail.com>

Fix watchdog_test lint and test failures on macos. (prometheus#3003)

Ensure identical build flags embedded in both files.

Signed-off-by: Chris Cleeland <chris.cleeland@gmail.com>

Release v1.8.0 (prometheus#3002)

* [CHANGE] exec_bsd: Fix labels for `vm.stats.sys.v_syscall` sysctl prometheus#2895
* [CHANGE] diskstats: Ignore zram devices on linux systems prometheus#2898
* [CHANGE] textfile: Avoid inconsistent help-texts  prometheus#2962
* [CHANGE] os: Removed caching of modtime/filename of os-release file prometheus#2987
* [FEATURE] xfrm: Add new collector prometheus#2866
* [FEATURE] watchdog: Add new collector prometheus#2880
* [ENHANCEMENT] cpu_vulnerabilities: Add mitigation information label prometheus#2806
* [ENHANCEMENT] nfsd: Handle new `wdeleg_getattr` attribute prometheus#2810
* [ENHANCEMENT] netstat: Add TCPOFOQueue to default netstat metrics prometheus#2867
* [ENHANCEMENT] filesystem: surface device errors prometheus#2923
* [ENHANCEMENT] os: Add support end parsing prometheus#2982
* [ENHANCEMENT] zfs: Log mib when sysctl read fails on FreeBSD prometheus#2975
* [ENHANCEMENT] fibre_channel: update procfs to take into account optional attributes prometheus#2933
* [BUGFIX] cpu: Fix debug log in cpu collector prometheus#2857
* [BUGFIX] hwmon: Fix hwmon nil ptr prometheus#2873
* [BUGFIX] hwmon: Fix hwmon error capture prometheus#2915
* [BUGFIX] zfs: Revert "Add ZFS freebsd per dataset stats prometheus#2925
* [BUGFIX] ethtool: Sanitize ethtool metric name keys prometheus#2940
* [BUGFIX] fix: data race of NetClassCollector metrics initialization prometheus#2995

Signed-off-by: Ben Kochie <superq@gmail.com>

Add logging for ethtool device include/exclude and metrics include flags (prometheus#2979)

Signed-off-by: Sam Leiken <sam.k.leiken@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants