-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: OCPBUGS-18282: Reject reserved labels used as external labels #2097
base: master
Are you sure you want to change the base?
Conversation
@slashpai: This pull request references Jira Issue OCPBUGS-18282, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: slashpai The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
prometheus and prometheus_replica are reserved labels, if a user configures this as a label in externalLabels in cluster-monitoring-configmap warn user about this and make operator status Degraded Signed-off-by: Jayapriya Pai <janantha@redhat.com>
Signed-off-by: Jayapriya Pai <janantha@redhat.com>
@@ -736,6 +736,9 @@ func (o *Operator) sync(ctx context.Context, key string) error { | |||
} else if config.HasInconsistentAlertmanagerConfigurations() { | |||
degradedConditionMessage = client.UserAlermanagerConfigMisconfiguredMessage | |||
degradedConditionReason = client.UserAlermanagerConfigMisconfiguredReason | |||
} else if config.HasPrometheusReservedExternalLabelsConfigured() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about integrating this to the PrometheusTask? Make the whole task fail if the "wrong" labels are set? This way we "never" apply that faulty config and user will have to step in.
Or maybe while loading the config, we can add such checks (once we have a CRD I expect a CR containing these labels to be rejected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I'm not sure. Iiuc if we fail this in a task the visibility is the same, i.e the operator reports this.
Only difference is that any pending config changes never get applied? since this doesn't break the CMO setup entirely I'm not sure what the better approach is.
One orthogonal issue though: This condition reporting here being a big else-if, doesn't this potentially hide issues. E.g. if no storage is configured and the forbidden external labels are set, we only ever see the storageNotConfiguredMessage
and CannotUseReservedExternalLabelsMessage
only pops up after storage was configured. Shouldn't we collect all conditions and report them all at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this won't prevent the prometheus task from applying the faulty labels (which may in some setups break the delivery of alerts perhaps.).
Adding the check at the beginning of the prometheus task will prevent that.
Now that I think about it, doing so may also block legitimate/needed prometheus syncs...
By the way, the SetRollOutDone
below sets Degraded=false
, so no "default" alert will be triggered for this.
One orthogonal issue though: This condition reporting here being a big else-if, doesn't this potentially hide issues. E.g. if no storage is configured and the forbidden external labels are set, we only ever see the storageNotConfiguredMessage and CannotUseReservedExternalLabelsMessage only pops up after storage was configured. Shouldn't we collect all conditions and report them all at once?
Good point, we could group them under some "umbrella" reason and only join the messages, it's worth a ticket so we can discuss this.
@slashpai: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
prometheus and prometheus_replica are reserved labels, if a user configures this as a label in externalLabels in cluster-monitoring-configmap warn user about this and make operator status Degraded