Migrated to newer prometheus operator #9074

Zelldon · 2022-04-07T12:36:27Z

Description

Our current gke clusters using an outdated and deprecated prometheus operator. We should migrate to the newer maintained version here. As part of this we should update our setup docs

Checking the status of the current installation shows the deprecation status as well.

$ helm status metrics
NAME: metrics
LAST DEPLOYED: Mon Jun 28 08:50:37 2021
NAMESPACE: default
STATUS: deployed
REVISION: 24
NOTES:

*** DEPRECATED ****

stable/prometheus-operator chart is deprecated.

Further development has moved to https://github.com/prometheus-community/helm-charts

The chart has been renamed kube-prometheus-stack to more clearly reflect

that it installs the kube-prometheus project stack, within which Prometheus

Operator is only one component.

The Prometheus Operator has been installed. Check its status by running:
kubectl --namespace default get pods -l "release=metrics"

Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.

$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
metrics default 24 2021-06-28 08:50:37.787649576 +0200 CEST deployed prometheus-operator-9.3.2 0.38.1

The text was updated successfully, but these errors were encountered:

npepinpe · 2022-04-12T07:53:52Z

I would propose to migrate to the managed Prometheus service from GCP instead to reduce maintenance. This means installing Grafana as a Helm chart alone, but that's hopefully less work to migrate to.

I will for now leave it in the backlog however and we can look into it for Q3, when (hopefully) SREs are mostly through migrating the prod monitoring stack.

Zelldon · 2022-06-21T11:34:25Z

I guess would be nice to tackle this sooner than later. Today we run again into an issue with Prometheus, since it was not able to boot because the WAL was too big and the readiness probe caused an crashloop.

Found a good guide for migrating, seem to be not that hard
https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#config-mgd-collection

Example for migrating the service monitor https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#gmp-servicemonitor

One thing we need to keep in mind that this will increase our costs, we need to verify that I guess first https://cloud.google.com/stackdriver/pricing#mgd-prometheus-pricing-summary

Zelldon added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. Impact: Testing labels Apr 7, 2022

Zelldon mentioned this issue Apr 7, 2022

Make new overview dashboard available in zeebe [+long running] cluster #9075

Closed

npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Testing labels Apr 11, 2022

npepinpe added area/observability Marks an issue as observability related team/distributed and removed area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) labels Apr 12, 2022

menski removed the team/distributed label Jul 11, 2022

npepinpe self-assigned this Aug 21, 2022

npepinpe mentioned this issue Aug 21, 2022

Upgrade GKE prometheus set up to prometheus-community/kube-prometheus-stack #10132

Merged

15 tasks

zeebe-bors-camunda bot closed this as completed in 7bfa617 Aug 30, 2022

deepthidevaki added the release/8.1.0-alpha5 label Sep 6, 2022

Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrated to newer prometheus operator #9074

Migrated to newer prometheus operator #9074

Zelldon commented Apr 7, 2022

npepinpe commented Apr 12, 2022

Zelldon commented Jun 21, 2022 •

edited

Migrated to newer prometheus operator #9074

Migrated to newer prometheus operator #9074

Comments

Zelldon commented Apr 7, 2022

npepinpe commented Apr 12, 2022

Zelldon commented Jun 21, 2022 • edited

Zelldon commented Jun 21, 2022 •

edited