-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GEP-19] Migrate monitoring stack to prometheus-operator
#9065
Labels
area/dev-productivity
Developer productivity related (how to improve development)
area/ipcei
IPCEI (Important Project of Common European Interest)
area/monitoring
Monitoring (including availability monitoring and alerting) related
kind/enhancement
Enhancement, improvement, extension
kind/epic
Large multi-story topic
Comments
gardener-prow
bot
added
area/dev-productivity
Developer productivity related (how to improve development)
area/monitoring
Monitoring (including availability monitoring and alerting) related
kind/enhancement
Enhancement, improvement, extension
labels
Jan 23, 2024
This was referenced Feb 6, 2024
This was referenced Feb 16, 2024
This was referenced Feb 26, 2024
This was referenced Mar 4, 2024
This was referenced Mar 14, 2024
This was referenced Apr 15, 2024
This was referenced Apr 30, 2024
Merged
rfranzke
added
kind/epic
Large multi-story topic
area/ipcei
IPCEI (Important Project of Common European Interest)
labels
May 23, 2024
All tasks have been completed. |
@rfranzke: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/dev-productivity
Developer productivity related (how to improve development)
area/ipcei
IPCEI (Important Project of Common European Interest)
area/monitoring
Monitoring (including availability monitoring and alerting) related
kind/enhancement
Enhancement, improvement, extension
kind/epic
Large multi-story topic
How to categorize this issue?
/area dev-productivity monitoring
/kind enhancement
What would you like to be added:
The monitoring stack should be migrated from the current custom-built Helm charts to the
prometheus-operator
as proposed in GEP-19.Why is this needed:
GEP-19 has been accepted and merged a long while ago, hence we should strive for completing its implementation. Also, the garden cluster (managed via
gardener-operator
, ref #7016) does not have a monitoring stack yet. Also, this increases the development productivity by cleaning up technical debt and improving the code.Tasks:
prometheus-operator
prometheus-operator
in garden and seed clusters #9067gardener-operator
: [GEP-19] Introduceprometheus-operator
in garden and seed clusters #9067gardenlet
: [GEP-19] Introduceprometheus-operator
in garden and seed clusters #9067kube-state-metrics
#9179prometheus-operator
#9188prometheus-cache
andalertmanager-seed
ManagedResource
s #9189prometheus-seed
ClusterRoleBinding
#9193networkingv1.Ingress
instead ofnetworkingv1beta1.Ingress
#9299prometheus-cache
andalertmanager-seed
ManagedResource
s #9189replicas=1
for seed alertmanager #9298Garden
controller #9301blackbox-exporter
deployments intoGarden
controller #9543Garden
controller #9606alertmanager-shoot
when Gardener>= 1.90
#9335PersistentVolume
s #9338gardenlet
provider-alicloud
: [GEP-19] Adapt monitoring configuration gardener-extension-provider-alicloud#720provider-aws
: [GEP-19] Adapt monitoring configuration gardener-extension-provider-aws#946provider-azure
: [GEP-19] Adapt monitoring configuration gardener-extension-provider-azure#853provider-gcp
: [GEP-19] Adapt monitoring configuration gardener-extension-provider-gcp#754provider-openstack
: [GEP-19] Adapt monitoring configuration gardener-extension-provider-openstack#766provider-equinix-metal
: [GEP-19] Adapt monitoring configuration gardener-extension-provider-equinix-metal#307networking-calico
: [GEP-19] Adapt monitoring configuration gardener-extension-networking-calico#394networking-cilium
: [GEP-19] Adapt monitoring configuration gardener-extension-networking-cilium#307shoot-cert-service
: [GEP-19] Adapt monitoring configuration gardener-extension-shoot-cert-service#257shoot-oidc-service
: [GEP-19] Adapt monitoring configuration gardener-extension-shoot-oidc-service#193shoot-lakom-service
: [GEP-19] Adapt monitoring configuration gardener-extension-shoot-lakom-service#87shoot-networking-problemdetector
: [GEP-19] Adapt monitoring configuration gardener-extension-shoot-networking-problemdetector#142shoot-rsyslog-relp
: [GEP-19] Adapt monitoring configuration gardener-extension-shoot-rsyslog-relp#99registry-cache
: [GEP-19] Switch to the new contract of providing monitoring configuration gardener-extension-registry-cache#187gardener-resource-manager
for newPrometheus
andAlertmanager
resources #9163Consider deployment of admission webhook server(abandoned for now due to other, more important topics)ConfigMap
s dynamically #9624General notes for the migration (taken from #6319):
pvc
and itspv
and setpersistentVolumeReclaimPolicy=Retain
.pvc
.volumeClaimTemplate
that references thepv
withvolumeName=<existing-pv>
additionalScrapeConfig
. This will allow us to switch to theprometheus-operator
without creatingPodMonitors
andServiceMonitors
for each component and instead do that migration step by step.additionalScrapeConfig
. This will allow extensions time to migrate as well.PrometheusRules
.additionalScrapeConfig
can be migrated toPodMonitors
andServiceMonitors
.The text was updated successfully, but these errors were encountered: