Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
10132: Upgrade GKE prometheus set up to prometheus-community/kube-prometheus-stack r=npepinpe a=npepinpe ## Description This PR updates the `prometheus-values.yaml` we use to set up our monitoring stack on our GKE clusters. These are the latest values used, adapted for the new chart. At the same time, I've already migrated us from the old deprecated chart to the new chart (prometheus-community/kube-prometheus-stack), and upgraded from 9.x to 16.0.0. In order to migrate, I did the following (based on [this issue from our SREs](https://github.com/camunda-cloud/monitoring/issues/524)): - [x] Modify the PV reclaim policy to `retain` instead of delete; this allows us to delete the old PVC but keep the persistent volume, retaining our data - [x] Pre-create the PVC that the new chart expects; it will then pick up on creation and won't create a new one, and we keep the old PV/data intact. - [x] Follow these unofficial [upgrade instructions](prometheus-community/helm-charts#250 (comment)); essentially we need to re-create the CRDs as `helm upgrade` doesn't install CRDs, so we need to pick up the CRDs from the updated operator version. - [x] Migrate from the old chart to the new chart using `helm upgrade metrics --debug --namespace default --dependency-update -f prometheus-operator-values.yml --version 10.0.0 prometheus-community/kube-prometheus-stack` (first run with a `--dry-run` to ensure the PVC and so on will be kept) - [x] Once done, [follow the upgrade instructions](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#upgrading-chart) for each major version upgrade as you go along, using the command above but updating the version. This was done until version 16.0.0, which removes the last component using deprecated APIs (kube-state-metrics). With that done, we could then upgrade the Kubernetes clusters to 1.23 without any issues. The next time we need to do all of this will be when upgrading to k8s 1.25, which removes further APIs. While it's possible to upgrade k8s first and then fix the Helm release, it's easier to first upgrade the charts to make sure nothing using the deprecated APIs, and then upgrade k8s. One last thing: we could upgrade to 17.x and remove our pinned version of Grafana to upgrade Grafana to 8.x (like we have in SaaS). To do that, just edit the values file, remove the pinned tag for Grafana, update the necessary CRDs as described on the chart readme (link is above), and then run `helm upgrade metrics --debug --namespace default --dependency-update -f prometheus-operator-values.yml --version 17.0.0 prometheus-community/kube-prometheus-stack`. ## Related issues closes #9074 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
- Loading branch information