Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a field name selected cache #687

Closed
wants to merge 1 commit into from
Closed

Conversation

qinqon
Copy link
Member

@qinqon qinqon commented Jan 29, 2021

Is this a BUG FIX or a FEATURE ?:

Uncomment only one, leave it on its own line:

/kind bug

/kind enhancement

What this PR does / why we need it:
The controller-runtime is missing filtering it's cache by labels or
fields [1], this means that all the kubernetes-nmstate-handlers will
read all the nodes and nodenetworkstates every period, clearly dies does
not scale since kubernetes-nmstate-handler runs at as daemonset meaning
that there is one handler running at every node so the bigger the
cluster the bigger the problem.

This change replace the default controller-runtime cache with an
implementation that can be configured to use some field selectors
depending on the resource, this way we can filter by "metadata.name"
using the node name for "node" and "nodenetworkstate" so only one
instance of them is feteched.

There is a PR at controller-runtime with a possible solution [2] in case something like that
make it to controller-runtime we will use that.

[1] kubernetes-sigs/controller-runtime#244
[2] kubernetes-sigs/controller-runtime#1404

Special notes for your reviewer:

Release note:

Filter cache at handler by node name in case of "nodes" and "nodenetworkstates"

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XL labels Jan 29, 2021
@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign qinqon
You can assign the PR to them by writing /assign @qinqon in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qinqon
Copy link
Member Author

qinqon commented Jan 29, 2021

/hold

We have to really verify that the number of apiserver queries drops

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2021
@qinqon qinqon force-pushed the filter-cache branch 2 times, most recently from 3ae5e08 to 99ab801 Compare February 1, 2021 09:32
@qinqon
Copy link
Member Author

qinqon commented Feb 1, 2021

/retest

@qinqon
Copy link
Member Author

qinqon commented Feb 4, 2021

So from the printed events we can verify that handler at node01 receive only events related for that node

[ellorent@localhost kubernetes-nmstate]$ ./cluster/kubectl.sh logs -n nmstate nmstate-handler-vmwlf |grep event
{"level":"info","ts":1612444571.6819208,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444571.682038,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444571.6939173,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444572.6410956,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444572.656799,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444600.5195234,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444600.519618,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444600.5198877,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444694.9091215,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444694.9715192,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444695.025449,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444695.1237388,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444698.1988435,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444699.215245,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444699.2582269,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444700.280619,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444701.3504474,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444702.1633139,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444702.2546232,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444702.3680952,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444702.4069588,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444708.6236877,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node01","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444708.6363127,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444708.6610396,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}

And the same for node02 handler.

[ellorent@localhost kubernetes-nmstate]$ ./cluster/kubectl.sh logs -n nmstate nmstate-handler-q4p4b |grep event
{"level":"info","ts":1612444570.7046072,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444570.704893,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444570.7057064,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444572.0696883,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444572.1323745,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444591.860165,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444591.860235,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444591.860277,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"node02","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444694.9093442,"logger":"controller-runtime.source.EventHandler.OnAdd","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444694.9713833,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444695.0265727,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444695.121923,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444698.199609,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444699.215747,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444699.2589383,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444700.2792325,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}
{"level":"info","ts":1612444701.3514907,"logger":"controller-runtime.source.EventHandler.OnUpdate","msg":"event received","name":"linux-bridge","kind":"&TypeMeta{Kind:,APIVersion:,}"}

The controller-runtime is missing filtering it's cache by labels or
fields [1], this means that all the kubernetes-nmstate-handlers will
read all the nodes and nodenetworkstates every period, clearly dies does
not scale since kubernetes-nmstate-handler runs at as daemonset meaning
that there is one handler running at every node so the bigger the
cluster the bigger the problem.

This change replace the default controller-runtime cache with an
implementation that can be configured to use some field selectors
depending on the resource, this way we can filter by "metadata.name"
using the node name for "node" and "nodenetworkstate" so only one
instance of them is feteched.

[1] kubernetes-sigs/controller-runtime#244

Signed-off-by: Quique Llorente <ellorent@redhat.com>
@qinqon
Copy link
Member Author

qinqon commented Mar 1, 2021

/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2021
@kubevirt-bot
Copy link
Collaborator

@qinqon: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-nmstate-e2e-handler-k8s-future 99bf43f link /test pull-kubernetes-nmstate-e2e-handler-k8s-future
pull-kubernetes-nmstate-e2e-handler-k8s 99bf43f link /test pull-kubernetes-nmstate-e2e-handler-k8s

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 8, 2021
@kubevirt-bot
Copy link
Collaborator

@qinqon: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@qinqon qinqon closed this May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants