Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sysctl collector #2425

Merged
merged 4 commits into from Jul 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -5,6 +5,7 @@
* [ENHANCEMENT]
* [BUGFIX]

* [FEATURE] Add sysctl collector #2425
* [ENHANCEMENT] Add node_softirqs_total metric #2221
* [ENHANCEMENT] Add device filter flags to arp collector #2254
* [ENHANCEMENT] Add rapl zone name label option #2401
Expand Down
102 changes: 77 additions & 25 deletions README.md
Expand Up @@ -155,6 +155,35 @@ and does not time out. In addition, monitor the
`scrape_samples_post_metric_relabeling` metric to see the changes in
cardinality.

Name | Description | OS
---------|-------------|----
buddyinfo | Exposes statistics of memory fragments as reported by /proc/buddyinfo. | Linux
cgroups | A summary of the number of active and enabled cgroups | Linux
devstat | Exposes device statistics | Dragonfly, FreeBSD
drbd | Exposes Distributed Replicated Block Device statistics (to version 8.4) | Linux
ethtool | Exposes network interface information and network driver statistics equivalent to `ethtool`, `ethtool -S`, and `ethtool -i`. | Linux
interrupts | Exposes detailed interrupts statistics. | Linux, OpenBSD
ksmd | Exposes kernel and system statistics from `/sys/kernel/mm/ksm`. | Linux
lnstat | Exposes stats from `/proc/net/stat/`. | Linux
logind | Exposes session counts from [logind](http://www.freedesktop.org/wiki/Software/systemd/logind/). | Linux
meminfo\_numa | Exposes memory statistics from `/proc/meminfo_numa`. | Linux
mountstats | Exposes filesystem statistics from `/proc/self/mountstats`. Exposes detailed NFS client statistics. | Linux
network_route | Exposes the routing table as metrics | Linux
ntp | Exposes local NTP daemon health to check [time](./docs/TIME.md) | _any_
perf | Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings). | Linux
processes | Exposes aggregate process statistics from `/proc`. | Linux
qdisc | Exposes [queuing discipline](https://en.wikipedia.org/wiki/Network_scheduler#Linux_kernel) statistics | Linux
runit | Exposes service status from [runit](http://smarden.org/runit/). | _any_
slabinfo | Exposes slab statistics from `/proc/slabinfo`. Note that permission of `/proc/slabinfo` is usually 0400, so set it appropriately. | Linux
supervisord | Exposes service status from [supervisord](http://supervisord.org/). | _any_
sysctl | Expose sysctl values from `/proc/sys`. Use `--collector.sysctl.include(-info)` to configure. | Linux
systemd | Exposes service and system status from [systemd](http://www.freedesktop.org/wiki/Software/systemd/). | Linux
tcpstat | Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6`. (Warning: the current version has potential performance issues in high load situations.) | Linux
wifi | Exposes WiFi device and station statistics. | Linux
zoneinfo | Exposes NUMA memory zone metrics. | Linux

### Perf Collector

The `perf` collector may not work out of the box on some Linux systems due to kernel
configuration and security settings. To allow access, set the following `sysctl`
parameter:
Expand Down Expand Up @@ -190,33 +219,56 @@ found using [`perf list`](http://man7.org/linux/man-pages/man1/perf.1.html) or
from debugfs. And example usage of this would be
`--collector.perf.tracepoint="sched:sched_process_exec"`.

### Sysctl Collector

Name | Description | OS
---------|-------------|----
buddyinfo | Exposes statistics of memory fragments as reported by /proc/buddyinfo. | Linux
cgroups | A summary of the number of active and enabled cgroups | Linux
devstat | Exposes device statistics | Dragonfly, FreeBSD
drbd | Exposes Distributed Replicated Block Device statistics (to version 8.4) | Linux
ethtool | Exposes network interface information and network driver statistics equivalent to `ethtool`, `ethtool -S`, and `ethtool -i`. | Linux
interrupts | Exposes detailed interrupts statistics. | Linux, OpenBSD
ksmd | Exposes kernel and system statistics from `/sys/kernel/mm/ksm`. | Linux
lnstat | Exposes stats from `/proc/net/stat/`. | Linux
logind | Exposes session counts from [logind](http://www.freedesktop.org/wiki/Software/systemd/logind/). | Linux
meminfo\_numa | Exposes memory statistics from `/proc/meminfo_numa`. | Linux
mountstats | Exposes filesystem statistics from `/proc/self/mountstats`. Exposes detailed NFS client statistics. | Linux
network_route | Exposes the routing table as metrics | Linux
ntp | Exposes local NTP daemon health to check [time](./docs/TIME.md) | _any_
perf | Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings). | Linux
processes | Exposes aggregate process statistics from `/proc`. | Linux
qdisc | Exposes [queuing discipline](https://en.wikipedia.org/wiki/Network_scheduler#Linux_kernel) statistics | Linux
runit | Exposes service status from [runit](http://smarden.org/runit/). | _any_
slabinfo | Exposes slab statistics from `/proc/slabinfo`. Note that permission of `/proc/slabinfo` is usually 0400, so set it appropriately. | Linux
supervisord | Exposes service status from [supervisord](http://supervisord.org/). | _any_
systemd | Exposes service and system status from [systemd](http://www.freedesktop.org/wiki/Software/systemd/). | Linux
tcpstat | Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6`. (Warning: the current version has potential performance issues in high load situations.) | Linux
wifi | Exposes WiFi device and station statistics. | Linux
zoneinfo | Exposes NUMA memory zone metrics. | Linux
The `sysctl` collector can be enabled with `--collector.sysctl`. It supports exposing numeric sysctl values
as metrics using the `--collector.sysctl.include` flag and string values as info metrics by using the
`--collector.sysctl.include-info` flag. The flags can be repeated. For sysctl with multiple numeric values,
an optional mapping can be given to expose each value as its own metric. Otherwise an `index` label is used
to identify the different fields.

#### Examples
##### Numeric values
###### Single values
Using `--collector.sysctl.include=vm.user_reserve_kbytes`:
`vm.user_reserve_kbytes = 131072` -> `node_sysctl_vm_user_reserve_kbytes 131072`

###### Multiple values
A sysctl can contain multiple values, for example:
```
net.ipv4.tcp_rmem = 4096 131072 6291456
```
Using `--collector.sysctl.include=net.ipv4.tcp_rmem` the collector will expose:
```
node_sysctl_net_ipv4_tcp_rmem{index="0"} 4096
node_sysctl_net_ipv4_tcp_rmem{index="1"} 131072
node_sysctl_net_ipv4_tcp_rmem{index="2"} 6291456
```
If the indexes have defined meaning like in this case, the values can be mapped to multiple metrics by appending the mapping to the --collector.sysctl.include flag:
Using `--collector.sysctl.include=net.ipv4.tcp_rmem:min,default,max` the collector will expose:
```
node_sysctl_net_ipv4_tcp_rmem_min 4096
node_sysctl_net_ipv4_tcp_rmem_default 131072
node_sysctl_net_ipv4_tcp_rmem_max 6291456
```

##### String values
String values need to be exposed as info metric. The user selects them by using the `--collector.sysctl.include-info` flag.

###### Single values
`kernel.core_pattern = core` -> `node_sysctl_info{key="kernel.core_pattern_info", value="core"} 1`

###### Multiple values
Given the following sysctl:
```
kernel.seccomp.actions_avail = kill_process kill_thread trap errno trace log allow
```
Setting `--collector.sysctl.include-info=kernel.seccomp.actions_avail` will yield:
```
node_sysctl_info{key="kernel.seccomp.actions_avail", index="0", value="kill_process"} 1
node_sysctl_info{key="kernel.seccomp.actions_avail", index="1", value="kill_thread"} 1
...
```

### Textfile Collector

Expand Down
6 changes: 3 additions & 3 deletions collector/filesystem_openbsd_amd64.go
Expand Up @@ -41,14 +41,14 @@ func (c *filesystemCollector) GetStats() (stats []filesystemStats, err error) {

stats = []filesystemStats{}
for _, v := range mnt {
mountpoint := int8ToString(v.F_mntonname[:])
mountpoint := string(v.F_mntonname[:])
if c.excludedMountPointsPattern.MatchString(mountpoint) {
level.Debug(c.logger).Log("msg", "Ignoring mount point", "mountpoint", mountpoint)
continue
}

device := int8ToString(v.F_mntfromname[:])
fstype := int8ToString(v.F_fstypename[:])
device := string(v.F_mntfromname[:])
fstype := string(v.F_fstypename[:])
if c.excludedFSTypesPattern.MatchString(fstype) {
level.Debug(c.logger).Log("msg", "Ignoring fs type", "type", fstype)
continue
Expand Down
28 changes: 28 additions & 0 deletions collector/fixtures/e2e-64k-page-output.txt
Expand Up @@ -3052,6 +3052,7 @@ node_scrape_collector_success{collector="slabinfo"} 1
node_scrape_collector_success{collector="sockstat"} 1
node_scrape_collector_success{collector="softnet"} 1
node_scrape_collector_success{collector="stat"} 1
node_scrape_collector_success{collector="sysctl"} 1
node_scrape_collector_success{collector="tapestats"} 1
node_scrape_collector_success{collector="textfile"} 1
node_scrape_collector_success{collector="thermal_zone"} 1
Expand Down Expand Up @@ -3185,6 +3186,33 @@ node_softnet_times_squeezed_total{cpu="0"} 1
node_softnet_times_squeezed_total{cpu="1"} 10
node_softnet_times_squeezed_total{cpu="2"} 85
node_softnet_times_squeezed_total{cpu="3"} 50
# HELP node_sysctl_fs_file_nr sysctl fs.file-nr
# TYPE node_sysctl_fs_file_nr untyped
node_sysctl_fs_file_nr{index="0"} 1024
node_sysctl_fs_file_nr{index="1"} 0
node_sysctl_fs_file_nr{index="2"} 1.631329e+06
# HELP node_sysctl_fs_file_nr_current sysctl fs.file-nr, field 1
# TYPE node_sysctl_fs_file_nr_current untyped
node_sysctl_fs_file_nr_current 0
# HELP node_sysctl_fs_file_nr_max sysctl fs.file-nr, field 2
# TYPE node_sysctl_fs_file_nr_max untyped
node_sysctl_fs_file_nr_max 1.631329e+06
# HELP node_sysctl_fs_file_nr_total sysctl fs.file-nr, field 0
# TYPE node_sysctl_fs_file_nr_total untyped
node_sysctl_fs_file_nr_total 1024
# HELP node_sysctl_info sysctl info
# TYPE node_sysctl_info gauge
node_sysctl_info{index="0",name="kernel.seccomp.actions_avail",value="kill_process"} 1
node_sysctl_info{index="1",name="kernel.seccomp.actions_avail",value="kill_thread"} 1
node_sysctl_info{index="2",name="kernel.seccomp.actions_avail",value="trap"} 1
node_sysctl_info{index="3",name="kernel.seccomp.actions_avail",value="errno"} 1
node_sysctl_info{index="4",name="kernel.seccomp.actions_avail",value="user_notif"} 1
node_sysctl_info{index="5",name="kernel.seccomp.actions_avail",value="trace"} 1
node_sysctl_info{index="6",name="kernel.seccomp.actions_avail",value="log"} 1
node_sysctl_info{index="7",name="kernel.seccomp.actions_avail",value="allow"} 1
# HELP node_sysctl_kernel_threads_max sysctl kernel.threads-max
# TYPE node_sysctl_kernel_threads_max untyped
node_sysctl_kernel_threads_max 7801
# HELP node_tape_io_now The number of I/Os currently outstanding to this device.
# TYPE node_tape_io_now gauge
node_tape_io_now{device="st0"} 1
Expand Down
28 changes: 28 additions & 0 deletions collector/fixtures/e2e-output.txt
Expand Up @@ -3074,6 +3074,7 @@ node_scrape_collector_success{collector="slabinfo"} 1
node_scrape_collector_success{collector="sockstat"} 1
node_scrape_collector_success{collector="softnet"} 1
node_scrape_collector_success{collector="stat"} 1
node_scrape_collector_success{collector="sysctl"} 1
node_scrape_collector_success{collector="tapestats"} 1
node_scrape_collector_success{collector="textfile"} 1
node_scrape_collector_success{collector="thermal_zone"} 1
Expand Down Expand Up @@ -3207,6 +3208,33 @@ node_softnet_times_squeezed_total{cpu="0"} 1
node_softnet_times_squeezed_total{cpu="1"} 10
node_softnet_times_squeezed_total{cpu="2"} 85
node_softnet_times_squeezed_total{cpu="3"} 50
# HELP node_sysctl_fs_file_nr sysctl fs.file-nr
# TYPE node_sysctl_fs_file_nr untyped
node_sysctl_fs_file_nr{index="0"} 1024
node_sysctl_fs_file_nr{index="1"} 0
node_sysctl_fs_file_nr{index="2"} 1.631329e+06
# HELP node_sysctl_fs_file_nr_current sysctl fs.file-nr, field 1
# TYPE node_sysctl_fs_file_nr_current untyped
node_sysctl_fs_file_nr_current 0
# HELP node_sysctl_fs_file_nr_max sysctl fs.file-nr, field 2
# TYPE node_sysctl_fs_file_nr_max untyped
node_sysctl_fs_file_nr_max 1.631329e+06
# HELP node_sysctl_fs_file_nr_total sysctl fs.file-nr, field 0
# TYPE node_sysctl_fs_file_nr_total untyped
node_sysctl_fs_file_nr_total 1024
# HELP node_sysctl_info sysctl info
# TYPE node_sysctl_info gauge
node_sysctl_info{index="0",name="kernel.seccomp.actions_avail",value="kill_process"} 1
node_sysctl_info{index="1",name="kernel.seccomp.actions_avail",value="kill_thread"} 1
node_sysctl_info{index="2",name="kernel.seccomp.actions_avail",value="trap"} 1
node_sysctl_info{index="3",name="kernel.seccomp.actions_avail",value="errno"} 1
node_sysctl_info{index="4",name="kernel.seccomp.actions_avail",value="user_notif"} 1
node_sysctl_info{index="5",name="kernel.seccomp.actions_avail",value="trace"} 1
node_sysctl_info{index="6",name="kernel.seccomp.actions_avail",value="log"} 1
node_sysctl_info{index="7",name="kernel.seccomp.actions_avail",value="allow"} 1
# HELP node_sysctl_kernel_threads_max sysctl kernel.threads-max
# TYPE node_sysctl_kernel_threads_max untyped
node_sysctl_kernel_threads_max 7801
# HELP node_tape_io_now The number of I/Os currently outstanding to this device.
# TYPE node_tape_io_now gauge
node_tape_io_now{device="st0"} 1
Expand Down
1 change: 1 addition & 0 deletions collector/fixtures/proc/sys/kernel/seccomp/actions_avail
@@ -0,0 +1 @@
kill_process kill_thread trap errno user_notif trace log allow