-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade GKE prometheus set up to prometheus-community/kube-prometheus-stack #10132
Conversation
d7c3e92
to
51e6f30
Compare
@@ -9,6 +9,9 @@ grafana: | |||
userKey: admin-user | |||
passwordKey: admin-password | |||
grafana.ini: | |||
server: | |||
# REPLACE THIS WITH THE ACTUAL ROOT URL | |||
root_url: "http://localhost:3000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use some suggestions here. You need to set the root URL correctly as otherwise the GitHub authentication will not work. If you configure nothing, then it will use localhost:3000
, which it will pass to GitHub as the "redirect_uri", and GitHub will reject authentication calls saying it doesn't match what's configured in the OAuth app.
We could hardcode the right value, but since we use the same values file for each cluster, this has the danger than we overwrite the URL in one cluster or the other. Any ideas? I'd like to keep it simple so we can keep using the same file, but I don't know. Maybe I missed some config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether it get it. So you have to set here the ingress URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here it should be the ingress URL for each ingress that we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one of the issue I mentioned was this: if we leave it to localhost:3000 and someone upgrades without changing it, then we break the OAuth. But if we leave it to a particular URL, then we risk breaking the OAuth for one of the two Grafana we have.
I'm thinking of looking into Helmfile to manage our multiple deployments, sharing the same file with some overrides.
51e6f30
to
c102d36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for doing it 🚀 🤗
@@ -1,5 +1,5 @@ | |||
alertmanager: | |||
enabled: false | |||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for your benchmark on the long running cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in general it'll be useful in the future to have alerts :)
I would like that we do it right now, so we are on sync with saas and check whether our dashboards work etc. |
We had this week issues with accessing grafana. Our medic (@saig0) was not able to connect to our instances. Idk why I was able to connect 🤷 What I did to fix it for now (hotfix): $ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
metrics default 56 2022-08-21 17:33:28.495923251 +0200 CEST deployed kube-prometheus-stack-16.0.0 0.47.1
$ helm get values metrics > values.yaml # get installed values
$ vim values.yaml # replace localhost with out grafana instance ingress URL
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts # add the helm repo
$ helm repo update
$ helm get manifest metrics > actual.yaml # get the actual manifest
$ helm template metrics prometheus-community/kube-prometheus-stack -f values.yaml --version 16.0.0 > afterUpgrade.yaml # do an dry-run to compare the output Running Shows: 536c536
< root_url = http://localhost:3000
---
> root_url = http://34.77.165.228/
37244c37244
< checksum/config: 2f61443ba5962e030d1d7a31c6e76232fb7a9f3dffcd94eb7673d4c0c5dde3c4
---
> checksum/config: e4fb69f5cf1d57bf13e53d7e87ce8c96ac084c62adfddd2b5599655c545198dc
40716c40716,41035
<
---
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/psp.yaml
> apiVersion: policy/v1beta1
> kind: PodSecurityPolicy
> metadata:
> name: metrics-kube-prometheus-st-admission
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission
>
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> spec:
> privileged: false
> # Required to prevent escalations to root.
> # allowPrivilegeEscalation: false
> # This is redundant with non-root + disallow privilege escalation,
> # but we can provide it for defense in depth.
> #requiredDropCapabilities:
> # - ALL
> # Allow core volume types.
> volumes:
> - 'configMap'
> - 'emptyDir'
> - 'projected'
> - 'secret'
> - 'downwardAPI'
> - 'persistentVolumeClaim'
> hostNetwork: false
> hostIPC: false
> hostPID: false
> runAsUser:
> # Permits the container to run with root privileges as well.
> rule: 'RunAsAny'
> seLinux:
> # This policy assumes the nodes are using AppArmor rather than SELinux.
> rule: 'RunAsAny'
> supplementalGroups:
> rule: 'MustRunAs'
> ranges:
> # Forbid adding the root group.
> - min: 0
> max: 65535
> fsGroup:
> rule: 'MustRunAs'
> ranges:
> # Forbid adding the root group.
> - min: 0
> max: 65535
> readOnlyRootFilesystem: false
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/serviceaccount.yaml
> apiVersion: v1
> kind: ServiceAccount
> metadata:
> name: metrics-kube-prometheus-st-admission
> namespace: default
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/clusterrole.yaml
> apiVersion: rbac.authorization.k8s.io/v1
> kind: ClusterRole
> metadata:
> name: metrics-kube-prometheus-st-admission
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> rules:
> - apiGroups:
> - admissionregistration.k8s.io
> resources:
> - validatingwebhookconfigurations
> - mutatingwebhookconfigurations
> verbs:
> - get
> - update
> - apiGroups: ['policy']
> resources: ['podsecuritypolicies']
> verbs: ['use']
> resourceNames:
> - metrics-kube-prometheus-st-admission
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/clusterrolebinding.yaml
> apiVersion: rbac.authorization.k8s.io/v1
> kind: ClusterRoleBinding
> metadata:
> name: metrics-kube-prometheus-st-admission
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> roleRef:
> apiGroup: rbac.authorization.k8s.io
> kind: ClusterRole
> name: metrics-kube-prometheus-st-admission
> subjects:
> - kind: ServiceAccount
> name: metrics-kube-prometheus-st-admission
> namespace: default
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/role.yaml
> apiVersion: rbac.authorization.k8s.io/v1
> kind: Role
> metadata:
> name: metrics-kube-prometheus-st-admission
> namespace: default
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> rules:
> - apiGroups:
> - ""
> resources:
> - secrets
> verbs:
> - get
> - create
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/rolebinding.yaml
> apiVersion: rbac.authorization.k8s.io/v1
> kind: RoleBinding
> metadata:
> name: metrics-kube-prometheus-st-admission
> namespace: default
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> roleRef:
> apiGroup: rbac.authorization.k8s.io
> kind: Role
> name: metrics-kube-prometheus-st-admission
> subjects:
> - kind: ServiceAccount
> name: metrics-kube-prometheus-st-admission
> namespace: default
> ---
> # Source: kube-prometheus-stack/charts/grafana/templates/tests/test.yaml
> apiVersion: v1
> kind: Pod
> metadata:
> name: metrics-grafana-test
> labels:
> helm.sh/chart: grafana-6.9.1
> app.kubernetes.io/name: grafana
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "7.4.5"
> app.kubernetes.io/managed-by: Helm
> annotations:
> "helm.sh/hook": test-success
> namespace: default
> spec:
> serviceAccountName: metrics-grafana-test
> containers:
> - name: metrics-test
> image: "bats/bats:v1.1.0"
> imagePullPolicy: "IfNotPresent"
> command: ["/opt/bats/bin/bats", "-t", "/tests/run.sh"]
> volumeMounts:
> - mountPath: /tests
> name: tests
> readOnly: true
> volumes:
> - name: tests
> configMap:
> name: metrics-grafana-test
> restartPolicy: Never
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml
> apiVersion: batch/v1
> kind: Job
> metadata:
> name: metrics-kube-prometheus-st-admission-create
> namespace: default
> annotations:
> "helm.sh/hook": pre-install,pre-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission-create
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> spec:
> template:
> metadata:
> name: metrics-kube-prometheus-st-admission-create
> labels:
> app: kube-prometheus-stack-admission-create
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> spec:
> containers:
> - name: create
> image: k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.2.0
> imagePullPolicy: IfNotPresent
> args:
> - create
> - --host=metrics-kube-prometheus-st-operator,metrics-kube-prometheus-st-operator.default.svc
> - --namespace=default
> - --secret-name=metrics-kube-prometheus-st-admission
> resources:
> {}
> restartPolicy: OnFailure
> serviceAccountName: metrics-kube-prometheus-st-admission
> securityContext:
> runAsGroup: 2000
> runAsNonRoot: true
> runAsUser: 2000
> ---
> # Source: kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-patchWebhook.yaml
> apiVersion: batch/v1
> kind: Job
> metadata:
> name: metrics-kube-prometheus-st-admission-patch
> namespace: default
> annotations:
> "helm.sh/hook": post-install,post-upgrade
> "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
> labels:
> app: kube-prometheus-stack-admission-patch
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> spec:
> template:
> metadata:
> name: metrics-kube-prometheus-st-admission-patch
> labels:
> app: kube-prometheus-stack-admission-patch
> app.kubernetes.io/managed-by: Helm
> app.kubernetes.io/instance: metrics
> app.kubernetes.io/version: "16.0.0"
> app.kubernetes.io/part-of: kube-prometheus-stack
> chart: kube-prometheus-stack-16.0.0
> release: "metrics"
> heritage: "Helm"
> spec:
> containers:
> - name: patch
> image: k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.2.0
> imagePullPolicy: IfNotPresent
> args:
> - patch
> - --webhook-name=metrics-kube-prometheus-st-admission
> - --namespace=default
> - --secret-name=metrics-kube-prometheus-st-admission
> - --patch-failure-policy=Fail
> resources:
> {}
> restartPolicy: OnFailure
> serviceAccountName: metrics-kube-prometheus-st-admission
> securityContext:
> runAsGroup: 2000
> runAsNonRoot: true
> runAsUser: 2000 I think it is ok to ignore the webhooks, they are potentially used during the upgrade. $ helm upgrade metrics prometheus-community/kube-prometheus-stack -f values.yaml --version 16.0.0
W0830 10:33:31.971785 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:32.829207 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:43.925216 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.594887 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.616641 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.673887 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.761684 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.782592 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.830896 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.913698 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.934810 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:44.982406 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.071017 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.091741 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.180680 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.267297 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.288054 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.334801 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.444645 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.466939 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.517075 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.606472 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.628519 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:33:45.683109 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:34:04.751433 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:34:05.552517 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0830 10:34:16.316840 175112 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Release "metrics" has been upgraded. Happy Helming!
NAME: metrics
LAST DEPLOYED: Tue Aug 30 10:33:24 2022
NAMESPACE: default
STATUS: deployed
REVISION: 57
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace default get pods -l "release=metrics"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. |
Looks like we need to configure a github oauth application https://grafana.com/docs/grafana/v9.0/setup-grafana/configure-security/configure-authentication/github/ Did you do that @npepinpe ?! |
@Zelldon the grafana instance is still setting that
Maybe this helps us here? https://grafana.com/tutorials/run-grafana-behind-a-proxy/ |
It's related to the comment I mentioned about setting the root URL. I thought it was fixed since I tested it 🤔 |
Summary: The problem was that the pods were stuck in init because the PV was claimed by the previous pod, which is why my upgrade didn't worked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
bors merge |
Build succeeded: |
Description
This PR updates the
prometheus-values.yaml
we use to set up our monitoring stack on our GKE clusters. These are the latest values used, adapted for the new chart.At the same time, I've already migrated us from the old deprecated chart to the new chart (prometheus-community/kube-prometheus-stack), and upgraded from 9.x to 16.0.0. In order to migrate, I did the following (based on this issue from our SREs):
retain
instead of delete; this allows us to delete the old PVC but keep the persistent volume, retaining our datahelm upgrade
doesn't install CRDs, so we need to pick up the CRDs from the updated operator version.helm upgrade metrics --debug --namespace default --dependency-update -f prometheus-operator-values.yml --version 10.0.0 prometheus-community/kube-prometheus-stack
(first run with a--dry-run
to ensure the PVC and so on will be kept)With that done, we could then upgrade the Kubernetes clusters to 1.23 without any issues. The next time we need to do all of this will be when upgrading to k8s 1.25, which removes further APIs. While it's possible to upgrade k8s first and then fix the Helm release, it's easier to first upgrade the charts to make sure nothing using the deprecated APIs, and then upgrade k8s.
One last thing: we could upgrade to 17.x and remove our pinned version of Grafana to upgrade Grafana to 8.x (like we have in SaaS). To do that, just edit the values file, remove the pinned tag for Grafana, update the necessary CRDs as described on the chart readme (link is above), and then run
helm upgrade metrics --debug --namespace default --dependency-update -f prometheus-operator-values.yml --version 17.0.0 prometheus-community/kube-prometheus-stack
.Related issues
closes #9074
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.