Skip to content

Application Health Monitoring

Henne Vogelsang edited this page Aug 12, 2022 · 44 revisions

We collect metrics about the usage of OBS, such as logins of users, creation of packages and projects and alike.

An entry point to view those metrics is our Application Health Overview Dashboard on https://obs-measure.opensuse.org/.

You can login with your GitHub account and should get the Editor role.

Our AHM stack consists of:

RabbitMQ

Metrics are sent by our application to the openSUSE RabbitMQ running at https://rabbit.opensuse.org/.

Some of those metrics are:

How to Send Metrics

# Replace $METRIC_NAME$, $METRIC_VALUE$ with the according name/value.
# Tags are useful to filter metrics, add them as needed as seen below with $TAG_{1,2}$ and their respective value.
RabbitmqBus.send_to_bus('metrics', "$METRIC_NAME$,$TAG_1$=$TAG_1_VALUE$,$TAG_2$=$TAG_2_VALUE$ value=$METRIC_VALUE$")

You find more information about the line protocol in the Influx documentation.

Best Practices

Whenever possible, we extract instrumentation code to the src/api/app/instrumentations directory. This is possible with the use of concerns which are then included in the class which they instrument.

In controllers, rely on filters (all filters are listed here). In ActiveRecord models, rely on callbacks from ActiveRecord::Callbacks. In ActiveModel models, you can also rely on callbacks, but they are from ActiveModel::Callbacks. Using ActiveModel::Callbacks is also possible in other classes by extending the classes with ActiveModel::Callbacks.

Please also mind our general Best practices for monitoring

Telegraf

Telegraf fetches these metrics using the amqp_consumer input plugin and reports them to InfluxDB using the influxdb output plugin.

InfluxDB

InfluxDB stores the time series data we collect (data source InfluxDB-ahm).

Grafana

Grafana is used to create graphs to visualize the collected data.

Development Environment Setup

Instructions for setting up the development environment including application health monitoring can be found on Site-Reliability#development-environment

Clone this wiki locally