Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] GrafanaAlertRuleGroup changes when applied #1492

Closed
AlexEndris opened this issue Apr 15, 2024 · 6 comments
Closed

[Bug] GrafanaAlertRuleGroup changes when applied #1492

AlexEndris opened this issue Apr 15, 2024 · 6 comments
Labels
grafana-upstream Issues non-operator related, should be logged in the grafana product repo

Comments

@AlexEndris
Copy link

Describe the bug
Deploying a GrafanaAlertRuleGroup in a k8s cluster using ArgoCD, due to applying defaults (see diff in screenshot) the resource changes and is flagged as "OutOfSync" by ArgoCD.

Version
v5.8.1

To Reproduce
Apply a GrafanaAlertRuleGroup that has been exported by Grafana:

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaAlertRuleGroup
metadata:
  name: grafanaalertrulegroup-sample
spec:
  folderRef: test-folder
  instanceSelector:
    matchLabels:
      dashboards: "grafana"
  interval: 5m
  rules:
    - uid: cdi3wts6ubqbkc
      title: Test
      condition: C
      data:
        - refId: A
          relativeTimeRange:
            from: 600
            to: 0
          datasourceUid: Prometheus
          model:
            disableTextWrap: false
            editorMode: builder
            expr: alertmanager_alerts{instance="10.244.1.21:9093"}
            fullMetaSearch: false
            includeNullMetadata: true
            instant: true
            intervalMs: 1000
            legendFormat: __auto
            maxDataPoints: 43200
            range: false
            refId: A
            useBackend: false
        - refId: B
          relativeTimeRange:
            from: 600
            to: 0
          datasourceUid: __expr__
          model:
            conditions:
                - evaluator:
                    params: []
                    type: gt
                  operator:
                    type: and
                  query:
                    params:
                        - B
                  reducer:
                    params: []
                    type: last
                  type: query
            datasource:
                type: __expr__
                uid: __expr__
            expression: A
            intervalMs: 1000
            maxDataPoints: 43200
            reducer: last
            refId: B
            type: reduce
        - refId: C
          relativeTimeRange:
            from: 600
            to: 0
          datasourceUid: __expr__
          model:
            conditions:
                - evaluator:
                    params:
                        - 0
                    type: gt
                  operator:
                    type: and
                  query:
                    params:
                        - C
                  reducer:
                    params: []
                    type: last
                  type: query
            datasource:
                type: __expr__
                uid: __expr__
            expression: B
            intervalMs: 1000
            maxDataPoints: 43200
            refId: C
            type: threshold
      noDataState: NoData
      execErrState: Error
      for: 5m
      annotations:
        description: ""
        runbook_url: ""
        summary: ""
      labels:
        "": ""
      isPaused: false

Expected behavior
Defaults like to: 0 from the "relativeTimeRange" or isPaused: false or interval: 5m don't change.

Actual behavior
to: 0 and isPaused: false vanish, probably due to being defaults, and interval: 5m changes to interval: 5m0s.

Suspect component/Location where the bug might be occurring
Please provide this if you know where this bug might occur otherwise leave as unknown

Screenshots
Left is the changed resource, right is the actual resource as a file:
image

@AlexEndris AlexEndris added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 15, 2024
@pb82
Copy link
Collaborator

pb82 commented Apr 16, 2024

@AlexEndris it sounds like the Operator removes fields that are not in the spec and then ArgoCD applies them again. In that case, wouldn't it make sense to remove those fields (and use the 5m0s format for the other)?

@pb82 pb82 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 16, 2024
@AlexEndris
Copy link
Author

@pb82 Thank you for your reply. My point was that the yaml is just a copy & paste from what grafana itself exports in version 10.4.1. I assumed that those would be the same.

@theSuess
Copy link
Member

The duration fields in the Kubernetes custom resource are using a different duration type than the Grafana export, that's why the saved resource shows a different to the Grafana export.

Does this cause any problems or is it just a cosmetic issue?

@AlexEndris
Copy link
Author

@theSuess It actually causes ArgoCD to detect a difference and tries to constantly sync the state, overwriting the manifest over and over. We had to adjust the export manually to stop this.

@theSuess
Copy link
Member

I see. I don't think there's anything we can do about this from an operator perspective (as the data representation is handled differently in k8s vs Grafana).

I'll close this issue and check with the alerting team to see if we can add a separate export format for an operator compatible CR into Grafana

@theSuess theSuess added grafana-upstream Issues non-operator related, should be logged in the grafana product repo and removed bug Something isn't working triage/needs-information Indicates an issue needs more information in order to work on it. labels Apr 30, 2024
@AlexEndris
Copy link
Author

@theSuess Thank you for the consideration! That would definitely help reducing manually having to change this. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grafana-upstream Issues non-operator related, should be logged in the grafana product repo
Projects
None yet
Development

No branches or pull requests

3 participants