Skip to content

Absent Metrics Operator creates metric absence alerts atop Kubernetes

License

Notifications You must be signed in to change notification settings

sapcc/absent-metrics-operator

Repository files navigation

Absent Metrics Operator

GitHub Release CI Coverage Status Go Report Card

In this document:

In other documents:

Terminology

An absence alert rule is an alert rule that alerts on the absence of a metric.

A PrometheusRule is a custom resource defined by the Prometheus operator, it is used to define a set of alerting rules. Within the absent metrics operator documentation and source code, an AbsencePrometheusRule is a PrometheusRule resource created (and managed) by the absent metrics operator that defines corresponding absence alert rules for the metrics that were used in the alert rule definitions in a PrometheusRule.

Overview

The absent metrics operator is a companion operator for the Prometheus Operator.

It monitors all the PrometheusRule resources deployed across a Kubernetes cluster and creates corresponding absence alert rules for the alert rules defined in those resources.

Motivation

Consider the following alert rule definition:

alert: ImportantAlert
expr: foo_bar > 0
for: 5m
labels:
  support_group: network
  service: foo
  severity: critical
annotations:
  summary: Data center is on fire!

This alert would never trigger if the metric foo_bar does not exist in Prometheus.

This can be avoided by using the absent() function with the or operator so the alert rule expression becomes:

absent(foo_bar) or foo_bar > 0

However, this gets tedious if you have hundreds of alerts deployed across the cluster. There is also the element of human error, e.g. typo or forgetting to include the absent function in the alert expression.

This problem is resolved by the absent metrics operator as it automatically creates the corresponding alert rules that check and alert on metric absence.

For example, considering the alert rule mentioned above, the operator would generate the following absence alert rule:

alert: AbsentNetworkFooBar
expr: absent(foo_bar)
for: 10m
labels:
  context: absent-metrics
  severity: info
  support_group: network
  service: foo
annotations:
  summary: missing foo_bar
  description: The metric 'foo_bar' is missing. 'ImportantAlert' alert using it may not fire as intended.

Refer to the absence alert rule definition documentation for more information on how these alerts are generated and defined.

Installation

You can build with make, install with make install, or docker build.

The make install target understands the conventional environment variables for choosing install locations: DESTDIR and PREFIX.

Usage

For usage instructions:

absent-metrics-operator --help

In case of a false positive, the operator can be disabled for a specific alert rule or the entire PrometheusRule resource. Refer to the playbook for operators for instructions.

Metrics

Metrics are exposed at port 9659. This port has been allocated for the operator.

Metric Labels
absent_metrics_operator_successful_reconcile_time prometheusrule_namespace, prometheusrule_name