Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Dashboards are constantly deleted and re-created #686

Closed
johanderss opened this issue Feb 15, 2022 · 10 comments
Closed

[Bug] Dashboards are constantly deleted and re-created #686

johanderss opened this issue Feb 15, 2022 · 10 comments
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@johanderss
Copy link
Contributor

johanderss commented Feb 15, 2022

Describe the bug
When using the --namespaces arg, dashboards keep being deleted and re-created during reconciliation.

Version
4.1.1

To Reproduce

  1. Add the --namespaces arg when running the operator. E.g. --namespaces=ns1,ns2
  2. Dashboards are deleted and re-created during re-conciliation

Expected behavior
Dashboards are created only once

Suspect component/Location where the bug might be occuring
My feeling is there might be something in the grafanadashboard_controller.go reconciliation methods causing this. It looks like a separate controller is created for each watched namespace and existing dashboards not found in the current namespace are deleted. Maybe there's some error in the comparison.

Log excerpt

2022-02-15T13:22:08.898Z	INFO	running periodic dashboard resync
2022-02-15T13:22:09.622Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GrafanaDashboard","namespace":"applications","name":"loki-api-dashboard","uid":"252a1da8-5cba-4083-af55-cc7595a81528","apiVersion":"integreatly.org/v1alpha1","resourceVersion":"84609452"}, "reason": "Success", "message": "dashboard applications/loki-api-dashboard successfully submitted"}
2022-02-15T13:22:09.622Z	INFO	dashboard successfully submitted	{"name": "loki-api-dashboard", "namespace": "applications"}
2022-02-15T13:22:09.861Z	INFO	dashboard successfully submitted	{"name": "trace-metrics-dashboard", "namespace": "applications"}
2022-02-15T13:22:09.861Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GrafanaDashboard","namespace":"applications","name":"trace-metrics-dashboard","uid":"602ad786-b61f-4eed-b18b-d17d2813b76e","apiVersion":"integreatly.org/v1alpha1","resourceVersion":"84609449"}, "reason": "Success", "message": "dashboard applications/trace-metrics-dashboard successfully submitted"}
2022-02-15T13:22:09.989Z	INFO	dashboard successfully submitted	{"name": "jaeger-dashboard", "namespace": "applications"}
2022-02-15T13:22:09.989Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GrafanaDashboard","namespace":"applications","name":"jaeger-dashboard","uid":"f4448a85-6389-4366-b088-9141dac39496","apiVersion":"integreatly.org/v1alpha1","resourceVersion":"84609450"}, "reason": "Success", "message": "dashboard applications/jaeger-dashboard successfully submitted"}
2022-02-15T13:22:10.251Z	INFO	running periodic notificationchannel resync
2022-02-15T13:22:10.338Z	INFO	dashboard successfully submitted	{"name": "analytics-dashboard", "namespace": "applications"}
2022-02-15T13:22:10.447Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GrafanaDashboard","namespace":"applications","name":"logs-dashboard","uid":"2a3fa28f-2e77-4c2b-98dc-1a7c5310ec5f","apiVersion":"integreatly.org/v1alpha1","resourceVersion":"84609453"}, "reason": "Success", "message": "dashboard applications/logs-dashboard successfully submitted"}

2022-02-15T13:22:11.609Z	INFO	running periodic dashboard resync
2022-02-15T13:22:12.198Z	INFO	delete result was Dashboard Loki API deleted
2022-02-15T13:22:12.367Z	INFO	delete result was Dashboard Trace metrics deleted
2022-02-15T13:22:12.446Z	INFO	delete result was Dashboard Jaeger deleted
2022-02-15T13:22:12.766Z	INFO	delete result was Dashboard Logs deleted
2022-02-15T13:22:12.964Z	INFO	running periodic notificationchannel resync

Runtime (please complete the following information):

  • OS: bitnami/grafana-operator:4.1.1-debian-10-r35 Docker image
  • Grafana Operator Version: 4.1.1
  • Environment: Kubernetes
  • Deployment type: Deployed
  • Other:
    • Grafana Operator Helm Chart version: 2.2.3
    • Grafana version: 8.3.6

Additional context
I'm using the Bitnami Helm Chart to install the operator but the arguments added seem consistent with the Operator documentation so I don't think there's an issue in the chart.

One of the watched namespaces currently contains no dashboards but I don't know whether or not that is a prerequisite.

A possible workaround is to scan for dashboards in all namespaces instead using the --scan-all argument.

Below are the deployment.yaml-files generated by Helm using --namespaces and --scan-all respectively:
deployment-namespaces.txt
deployment-scan-all.txt

@johanderss johanderss added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 15, 2022
@NissesSenap
Copy link
Collaborator

Yeah I manage to recreate this using our image. The funny thing is that also manage to see restarts of the grafana instance. This might have something to do with the grafanadatasource, but I'm unsure.

But changing over and running - --scan-all=True and it works as intended.

@NissesSenap
Copy link
Collaborator

I need to run but I think that when defining specific namespaces they all don't get collected in to a single list.
Instead they are parsed per namespace and then they hit the dashboardsToDelete and the operator will remove them. Then they are automatically re-added again.

https://github.com/grafana-operator/grafana-operator/blob/1b94fe6dc41907a4a3ffae3eca4cfe5fab859e35/controllers/grafanadashboard/grafanadashboard_controller.go#L242-L264

@robshelly or @HubertStefanski is this something that you might have time to look in to? If not I will try to look in to it during the weekend but I'm unsure If I will get the time.

@pb82 pb82 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 1, 2022
@JCL38-ORANGE
Copy link

Hello,

Do you plan a new operator release soon in order to deliver PR#690 ?
Thanks and regards,
Jean-Christophe.

@NissesSenap
Copy link
Collaborator

Sorry for the deplay @JCL38-ORANGE , hopefully I will be able to cut a new release tomorrow: #734.

@JCL38-ORANGE
Copy link

Hello,
It seems that the problem is still present and more frequently with the 4.3.0 operator release :

Dashboard Changed
This dashboard has been modifed by another session

Did I miss something ?
Thanks and regards,
Jean-Christophe.

@vosmax
Copy link
Contributor

vosmax commented May 5, 2022

@JCL38-ORANGE @NissesSenap @addreas I am experiecing the same issue with v4.3.0
Looks like PR #690 doesn't solve the issue.
A bunch of dashboards continuously are recreating with the message dashboard successfully submitted
Operator starts with arg - --scan-all
image

@addreas
Copy link
Contributor

addreas commented May 5, 2022

This is consistent with what I have observed as well, no longer deleting and recreating, but still submitting and causing new "versions" to be created. Haven't had time to look deeper into it yet, unfortunately.

Perhaps this could be a new issue, since the behaviour changed after #690?

Edit: ended up digging a bit and got a PR with a fix i think. If you are daring enough an image for testing is here: ghcr.io/addreas/grafana-operator:v4.3.0-690-bugfix

@addreas addreas mentioned this issue May 5, 2022
7 tasks
@NissesSenap
Copy link
Collaborator

Awesome @addreas! I will verify this tomorrow. And if it looks good i will merge it and cut a new release straight away.

@vosmax
Copy link
Contributor

vosmax commented May 5, 2022

@addreas I've tested your image and it works like a charm. Well done! Thank you for the quick reaction!

2022-05-05T16:58:40.675Z INFO running periodic dashboard resync
2022-05-05T16:58:48.307Z INFO running periodic notificationchannel resync
2022-05-05T16:58:49.903Z INFO running periodic dashboard resync
2022-05-05T16:58:58.307Z INFO running periodic notificationchannel resync
2022-05-05T16:58:58.860Z INFO running periodic dashboard resync
2022-05-05T16:59:08.306Z INFO running periodic notificationchannel resync

@vosmax
Copy link
Contributor

vosmax commented May 16, 2022

I'm with bad news, @addreas
I've done a deployment to another environment and the issue appeared again.
Even - --namespaces= doesn't help, some dashboards are continuously recreating
Double checked I'm on v4.4.0

2022-05-16T10:20:05.269Z	INFO	running periodic dashboard resync
2022-05-16T10:20:05.484Z	INFO	running periodic notificationchannel resync
2022-05-16T10:20:05.699Z	INFO	running periodic dashboard resync
2022-05-16T10:20:05.921Z	INFO	running periodic notificationchannel resync
2022-05-16T10:20:06.147Z	INFO	running periodic dashboard resync
2022-05-16T10:20:06.218Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:06.354Z	INFO	running periodic notificationchannel resync
2022-05-16T10:20:06.441Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:07.251Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:07.421Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:07.857Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:07.975Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:08.067Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:08.204Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:08.416Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:08.582Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:08.637Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:08.845Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:08.887Z	INFO	running periodic dashboard resync
2022-05-16T10:20:09.104Z	INFO	running periodic notificationchannel resync
2022-05-16T10:20:09.323Z	INFO	running periodic dashboard resync
2022-05-16T10:20:09.541Z	INFO	running periodic notificationchannel resync
2022-05-16T10:20:09.549Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:09.676Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:09.759Z	INFO	running periodic dashboard resync
2022-05-16T10:20:09.937Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:10.065Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:10.306Z	INFO	Dashboard ****** has been deleted via grafana console. Recreating.
2022-05-16T10:20:10.436Z	INFO	dashboard successfully submitted	{"name": "******", "namespace": "******"}
2022-05-16T10:20:10.715Z	INFO	running periodic notificationchannel resync
2022-05-16T10:20:10.855Z	INFO	Dashboard flash-sales-service has been deleted via grafana console. Recreating.

Upd:
Helped manual dashboards deletion from UI.
Reproduce:

  1. Delete dashboard from Grafana UI
  2. Operator creates dashboard based on crds
  3. No more Dashboard ****** has been deleted via grafana console. Recreating. messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants