Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana operator endless (in loop) get dashboards when using url #683

Closed
Halytskyi opened this issue Feb 13, 2022 · 6 comments
Closed

Grafana operator endless (in loop) get dashboards when using url #683

Halytskyi opened this issue Feb 13, 2022 · 6 comments
Assignees
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Halytskyi
Copy link

Halytskyi commented Feb 13, 2022

Describe the bug
I storing dashboards in my local git (Gitea). For download dashboards to Grafana I use simple manifest like:

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: haproxy-instance-summary
  labels:
    app: grafana
spec:
  url: https://git.mydomain.com/haproxy-instance-summary.json

Everything works, after downloading dashboards I don't see any new messages in Grafana Operator. But when I checked logs of Gitea I noticed that Grafana Operator send endless request to get dashboards. I don't think that it's expected behavior and from other side it making unnecessary load on git.
It added unnecessary load on CPU:

Screen Shot 2022-02-13 at 12 18 52 PM

Additional memory usage (+ ~120MiB) and traffic (+ ~450kB/s):

Screen Shot 2022-02-13 at 12 17 09 PM

And this is just for 37 dashboards. If someone keep dashboards on external resources it's a big problem at least from traffic consumption side.

Logs with request from Gitea:

2022/02/13 18:17:00 Started GET /oleh/grafana-dashboards/raw/branch/master/monitoring/dashboards-rules-alerts/kubernetes-cr-workload.json for 10.153.0.1:38172
2022/02/13 18:17:00 ...orm@v1.2.5/engine.go:1139:Get() [I] [SQL] SELECT `id`, `lower_name`, `name`, `full_name`, `email`, `keep_email_private`, `email_notifications_preference`, `passwd`, `passwd_hash_algo`, `must_change_password`, `login_type`, `login_source`, `login_name`, `type`, `location`, `website`, `rands`, `salt`, `language`, `description`, `created_unix`, `updated_unix`, `last_login_unix`, `last_repo_visibility`, `max_repo_creation`, `is_active`, `is_admin`, `is_restricted`, `allow_git_hook`, `allow_import_local`, `allow_create_organization`, `prohibit_login`, `avatar`, `avatar_email`, `use_custom_avatar`, `num_followers`, `num_following`, `num_stars`, `num_repos`, `num_teams`, `num_members`, `visibility`, `repo_admin_change_team_access`, `diff_view_style`, `theme`, `keep_activity_private` FROM `user` WHERE `lower_name`=? LIMIT 1 [oleh] - 4.97708ms
2022/02/13 18:17:00 ...orm@v1.2.5/engine.go:1139:Get() [I] [SQL] SELECT `id`, `owner_id`, `owner_name`, `lower_name`, `name`, `description`, `website`, `original_service_type`, `original_url`, `default_branch`, `num_watches`, `num_stars`, `num_forks`, `num_issues`, `num_closed_issues`, `num_pulls`, `num_closed_pulls`, `num_milestones`, `num_closed_milestones`, `num_projects`, `num_closed_projects`, `is_private`, `is_empty`, `is_archived`, `is_mirror`, `status`, `is_fork`, `fork_id`, `is_template`, `template_id`, `size`, `is_fsck_enabled`, `close_issues_via_commit_in_any_branch`, `topics`, `trust_model`, `avatar`, `created_unix`, `updated_unix` FROM `repository` WHERE `owner_id`=? AND `lower_name`=? LIMIT 1 [1 grafana-dashboards] - 4.368625ms
2022/02/13 18:17:00 ...ls/repo/repo_unit.go:226:getUnitsByRepoID() [I] [SQL] SELECT `id`, `repo_id`, `type`, `config`, `created_unix` FROM `repo_unit` WHERE (repo_id = ?) [4] - 3.928649ms
2022/02/13 18:17:00 ...s/repo/pushmirror.go:102:GetPushMirrorsByRepoID() [I] [SQL] SELECT `id`, `repo_id`, `remote_name`, `interval`, `created_unix`, `last_update`, `last_error` FROM `push_mirror` WHERE (repo_id=?) [4] - 3.914207ms
2022/02/13 18:17:00 ...ules/context/repo.go:480:RepoAssignment() [I] [SQL] SELECT count(*) FROM `release` WHERE repo_id=? AND is_draft=? [4 false] - 3.217961ms
2022/02/13 18:17:00 ...ules/context/repo.go:487:RepoAssignment() [I] [SQL] SELECT count(*) FROM `release` WHERE repo_id=? AND is_draft=? AND is_tag=? [4 false false] - 3.656478ms
2022/02/13 18:17:00 Completed GET /oleh/grafana-dashboards/raw/branch/master/monitoring/dashboards-rules-alerts/kubernetes-cr-workload.json 200 OK in 38.658112ms
2022/02/13 18:17:00 Started GET /oleh/grafana-dashboards/raw/branch/master/monitoring/dashboards-rules-alerts/kubernetes-apiserver.json for 10.153.0.1:38172
2022/02/13 18:17:00 ...orm@v1.2.5/engine.go:1139:Get() [I] [SQL] SELECT `id`, `lower_name`, `name`, `full_name`, `email`, `keep_email_private`, `email_notifications_preference`, `passwd`, `passwd_hash_algo`, `must_change_password`, `login_type`, `login_source`, `login_name`, `type`, `location`, `website`, `rands`, `salt`, `language`, `description`, `created_unix`, `updated_unix`, `last_login_unix`, `last_repo_visibility`, `max_repo_creation`, `is_active`, `is_admin`, `is_restricted`, `allow_git_hook`, `allow_import_local`, `allow_create_organization`, `prohibit_login`, `avatar`, `avatar_email`, `use_custom_avatar`, `num_followers`, `num_following`, `num_stars`, `num_repos`, `num_teams`, `num_members`, `visibility`, `repo_admin_change_team_access`, `diff_view_style`, `theme`, `keep_activity_private` FROM `user` WHERE `lower_name`=? LIMIT 1 [oleh] - 4.136699ms
2022/02/13 18:17:00 ...orm@v1.2.5/engine.go:1139:Get() [I] [SQL] SELECT `id`, `owner_id`, `owner_name`, `lower_name`, `name`, `description`, `website`, `original_service_type`, `original_url`, `default_branch`, `num_watches`, `num_stars`, `num_forks`, `num_issues`, `num_closed_issues`, `num_pulls`, `num_closed_pulls`, `num_milestones`, `num_closed_milestones`, `num_projects`, `num_closed_projects`, `is_private`, `is_empty`, `is_archived`, `is_mirror`, `status`, `is_fork`, `fork_id`, `is_template`, `template_id`, `size`, `is_fsck_enabled`, `close_issues_via_commit_in_any_branch`, `topics`, `trust_model`, `avatar`, `created_unix`, `updated_unix` FROM `repository` WHERE `owner_id`=? AND `lower_name`=? LIMIT 1 [1 grafana-dashboards] - 3.855659ms
2022/02/13 18:17:00 ...ls/repo/repo_unit.go:226:getUnitsByRepoID() [I] [SQL] SELECT `id`, `repo_id`, `type`, `config`, `created_unix` FROM `repo_unit` WHERE (repo_id = ?) [4] - 3.600365ms
2022/02/13 18:17:00 ...s/repo/pushmirror.go:102:GetPushMirrorsByRepoID() [I] [SQL] SELECT `id`, `repo_id`, `remote_name`, `interval`, `created_unix`, `last_update`, `last_error` FROM `push_mirror` WHERE (repo_id=?) [4] - 3.131367ms
2022/02/13 18:17:00 ...ules/context/repo.go:480:RepoAssignment() [I] [SQL] SELECT count(*) FROM `release` WHERE repo_id=? AND is_draft=? [4 false] - 3.633094ms
2022/02/13 18:17:00 ...ules/context/repo.go:487:RepoAssignment() [I] [SQL] SELECT count(*) FROM `release` WHERE repo_id=? AND is_draft=? AND is_tag=? [4 false false] - 3.85562ms
2022/02/13 18:17:00 Completed GET /oleh/grafana-dashboards/raw/branch/master/monitoring/dashboards-rules-alerts/kubernetes-apiserver.json 200 OK in 36.999237ms

Tried re-deploy/restart Grafana Operator - didn't help.
Tried with different Grafana versions 7.x (7.5.11, 7.5.15); 8.x (8.3.4, 8.3.6)

Runtime (please complete the following information):

  • OS: Linux
  • Grafana Operator Version v4.1.1
  • Environment: Kubernetes
  • Deployment type: deployed
@Halytskyi Halytskyi added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 13, 2022
@Halytskyi
Copy link
Author

Halytskyi commented Feb 13, 2022

When I enabled debug logs for operator I see that it running "periodic dashboard resync" EACH 10 seconds.

2022-02-13T18:07:28.190Z	INFO	running periodic dashboard resync
2022-02-13T18:07:29.577Z	INFO	running periodic notificationchannel resync
2022-02-13T18:07:33.551Z	DEBUG	action-runner	(    0)    SUCCESS update grafana service
2022-02-13T18:07:33.569Z	DEBUG	action-runner	(    1)    SUCCESS update grafana data persistentVolumeClaim
2022-02-13T18:07:33.595Z	DEBUG	action-runner	(    2)    SUCCESS update grafana service account
2022-02-13T18:07:33.615Z	DEBUG	action-runner	(    3)    SUCCESS update grafana config
2022-02-13T18:07:33.615Z	DEBUG	action-runner	(    4)    SUCCESS plugins unchanged
2022-02-13T18:07:33.651Z	DEBUG	action-runner	(    5)    SUCCESS update grafana deployment
2022-02-13T18:07:33.651Z	DEBUG	action-runner	found value for GF_SECURITY_ADMIN_USER in secret grafana-credentials
2022-02-13T18:07:33.651Z	DEBUG	action-runner	found value for GF_SECURITY_ADMIN_PASSWORD in secret grafana-credentials
2022-02-13T18:07:33.651Z	DEBUG	action-runner	(    6)    SUCCESS looking for admin credentials in secret grafana-credentials
2022-02-13T18:07:33.651Z	DEBUG	action-runner	(    7)    SUCCESS check deployment readiness
2022-02-13T18:07:33.651Z	DEBUG	grafana-controller	desired cluster state met
2022-02-13T18:07:38.191Z	INFO	running periodic dashboard resync
2022-02-13T18:07:39.577Z	INFO	running periodic notificationchannel resync
2022-02-13T18:07:43.687Z	DEBUG	action-runner	(    0)    SUCCESS update grafana service
2022-02-13T18:07:43.709Z	DEBUG	action-runner	(    1)    SUCCESS update grafana data persistentVolumeClaim
2022-02-13T18:07:43.748Z	DEBUG	action-runner	(    2)    SUCCESS update grafana service account
2022-02-13T18:07:43.777Z	DEBUG	action-runner	(    3)    SUCCESS update grafana config
2022-02-13T18:07:43.777Z	DEBUG	action-runner	(    4)    SUCCESS plugins unchanged
2022-02-13T18:07:43.817Z	DEBUG	action-runner	(    5)    SUCCESS update grafana deployment
2022-02-13T18:07:43.817Z	DEBUG	action-runner	found value for GF_SECURITY_ADMIN_USER in secret grafana-credentials
2022-02-13T18:07:43.817Z	DEBUG	action-runner	found value for GF_SECURITY_ADMIN_PASSWORD in secret grafana-credentials
2022-02-13T18:07:43.817Z	DEBUG	action-runner	(    6)    SUCCESS looking for admin credentials in secret grafana-credentials
2022-02-13T18:07:43.817Z	DEBUG	action-runner	(    7)    SUCCESS check deployment readiness
2022-02-13T18:07:43.818Z	DEBUG	grafana-controller	desired cluster state met

Why we need that so often? Is it possible disable or increase "dashboard resync" time to minutes/hours to decrease load (and traffic) on git? With few dashboards this is not a big problem, but with >100 it could be a problem.

Update: found that this sync time interval seems hardcoded :(

./grafana-operator/controllers/config/controller_config.go:	RequeueDelay                            = time.Second * 10

Any thoughts?

@NissesSenap
Copy link
Collaborator

In some ways this might be related to: #597
@Halytskyi we don't have a solution for this out of the box.

If you want a workaround I would recommend to stop using this feature and instead just use the standard yaml to import your dashboard.

@pb82
Copy link
Collaborator

pb82 commented Feb 15, 2022

@Halytskyi Ok, so there shouldn't be any reason to download the dashboard after the first time it succeeds. It could be because we're unable to determine if the dashboard is already in Grafana. We might need to at least specify the name in addition to the URL. Needs some investigation.

@pb82 pb82 added triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 15, 2022
@robshelly robshelly self-assigned this Feb 15, 2022
@addreas
Copy link
Contributor

addreas commented Feb 16, 2022

I think #689 might help with this, since I ended up adding some caching functionality at the same time as the rate limiting.

@NissesSenap
Copy link
Collaborator

@Halytskyi I will close this since I think this should be solved now.

@Halytskyi
Copy link
Author

Halytskyi commented Sep 21, 2022

@NissesSenap, this is issue still not fixed.
Tried with grafana-operator 4.6.0 version and Grafana 7.5.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants