Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated to newer prometheus operator #9074

Closed
Zelldon opened this issue Apr 7, 2022 · 2 comments · Fixed by #10132
Closed

Migrated to newer prometheus operator #9074

Zelldon opened this issue Apr 7, 2022 · 2 comments · Fixed by #10132
Assignees
Labels
area/observability Marks an issue as observability related kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@Zelldon
Copy link
Member

Zelldon commented Apr 7, 2022

Description

Our current gke clusters using an outdated and deprecated prometheus operator. We should migrate to the newer maintained version here. As part of this we should update our setup docs

Checking the status of the current installation shows the deprecation status as well.

$ helm status metrics
NAME: metrics
LAST DEPLOYED: Mon Jun 28 08:50:37 2021
NAMESPACE: default
STATUS: deployed
REVISION: 24
NOTES:


*** DEPRECATED ****


  • stable/prometheus-operator chart is deprecated.
  • Further development has moved to https://github.com/prometheus-community/helm-charts
  • The chart has been renamed kube-prometheus-stack to more clearly reflect
  • that it installs the kube-prometheus project stack, within which Prometheus
  • Operator is only one component.

The Prometheus Operator has been installed. Check its status by running:
kubectl --namespace default get pods -l "release=metrics"

Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.

$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
metrics default 24 2021-06-28 08:50:37.787649576 +0200 CEST deployed prometheus-operator-9.3.2 0.38.1

@Zelldon Zelldon added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. Impact: Testing labels Apr 7, 2022
@npepinpe npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Testing labels Apr 11, 2022
@npepinpe
Copy link
Member

I would propose to migrate to the managed Prometheus service from GCP instead to reduce maintenance. This means installing Grafana as a Helm chart alone, but that's hopefully less work to migrate to.

I will for now leave it in the backlog however and we can look into it for Q3, when (hopefully) SREs are mostly through migrating the prod monitoring stack.

@npepinpe npepinpe added area/observability Marks an issue as observability related team/distributed and removed area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) labels Apr 12, 2022
@Zelldon
Copy link
Member Author

Zelldon commented Jun 21, 2022

I guess would be nice to tackle this sooner than later. Today we run again into an issue with Prometheus, since it was not able to boot because the WAL was too big and the readiness probe caused an crashloop.

Found a good guide for migrating, seem to be not that hard
https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#config-mgd-collection

Example for migrating the service monitor https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#gmp-servicemonitor

One thing we need to keep in mind that this will increase our costs, we need to verify that I guess first https://cloud.google.com/stackdriver/pricing#mgd-prometheus-pricing-summary

@npepinpe npepinpe self-assigned this Aug 21, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Marks an issue as observability related kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants