New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/kubelet: improve the node informer sync check #99336
pkg/kubelet: improve the node informer sync check #99336
Conversation
/kind regression |
cc @adisky |
given the problem shown in #99336 (comment) i'm marking this as release blocking for 1.21. in the meantime we are discussing and trying to fix it. |
/triage accepted |
flake #98856 |
just to correct my stmt above, static pods do go through kubelet pod admission (mirror pods do not), but by definition static pods need to work well with the kubelet local view of the node, so I think we do not have any risks. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, neolit123 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm thanks for all the iterations and care |
thank you to all who helped with review and the discussion. apparently there will be another 1.18 PATCH release, so this can be backported there too. |
cherry picks: ^ need review / LGTM / approval. all patches tested with a local cluster. |
picks reviewed, need approval from kubelet owners and then release branch acks |
…9336-origin-release-1.20 Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
…9336-origin-release-1.19 Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
…9336-origin-release-1.18 Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
…9336-origin-release-1.21 Automated cherry pick of #99336: pkg/kubelet: improve the node informer sync check
What this PR does / why we need it:
GetNode() is called in a lot of places including a hot loop in
fastStatusUpdateOnce. Having a poll in it is delaying
the kubelet /readyz status=200 report.
If a client is available attempt to wait for the sync to happen,
before starting the list watch for pods at the apiserver.
This is done to avoid caching of Node objects.
Some test data for a kubeadm setup that manages the kube-apiserver as a static pod -
waiting for the kube-apiserver and kubelet to report 200 at /healthz.
(it restores the old fast timing / behavior)
Which issue(s) this PR fixes:
xref kubernetes/kubeadm#2395
xref kubernetes/kubelet#23
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: