New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163
Conversation
/sig network |
ok, the test reproduces the issue
|
/assign @thockin @danwinship |
epReady := 0 | ||
for _, ep := range epList { | ||
if ep.IsReady() { | ||
epReady++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... this doesn't take ProxyTerminatingEndpoints
into account...
Consider the case where an externalTrafficPolicy: Local
service has a single Serving-Terminating endpoint. Connections come in to that endpoint's node and are accepted and processed by the terminating pod. Then a new endpoint starts up and becomes Ready. Given the code here, that would be interpreted as "the service went from 0 endpoints to non-0 endpoints", and so the node with the Serving-Terminating endpoint would flush all conntrack entries for the service, breaking the existing connections to the Serving-Terminating pod.
(Also, this patch changes the rules for staleServices
, but there are terminating endpoints problems with staleEndpoints
too; we used to delete conntrack entries to endpoints as soon as the endpoint become non-ready, but now we don't delete them until the pod is fully deleted...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... this doesn't take
ProxyTerminatingEndpoints
into account...
this is a regression that needs to be backported and ProxyTerminatingEndpoints is an alpha feature (no backports for alpha are allowed). Also, after the analysis you did in your related PR, I don't think that is easy to solve both problems at the same time 😅
breaking the existing connections to the Serving-Terminating pod.
it is UDP, in the sense that is not breaking the connection per se, since it is connectionless, the new packet will create a new entry looking at the iptables rules, that should still exist ... is less performant because you process the packet through the iptables list again but not a big deal (at least I can't see how this can break something, UDP is unreliable)
(Also, this patch changes the rules for staleServices, but there are terminating endpoints problems with staleEndpoints too; we used to delete conntrack entries to endpoints as soon as the endpoint become non-ready, but now we don't delete them until the pod is fully deleted...)
that is fixed by the Equal
change to take into consideration Ready https://github.com/kubernetes/kubernetes/pull/106163/files#r743336980
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is UDP, in the sense that is not breaking the connection per se, since it is connectionless
UDP is connectionless at L4, but not necessarily at L7. That's the main reason UDP conntrack records exist. Eg, anything using Datagram TLS (like QUIC / HTTP/3) won't survive being switched to a different endpoint mid-communication, because the new endpoint won't have the encryption key it needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀 I can't argue against that, but seems we are going to have some fun soon 😄
af41c0b
to
61837ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
just some typos. (I would have let the "endpoints"/"endpoint" slide, but the "Ff" would annoy me 🙂)
The logic to detect stale endpoints was not assuming the endpoint readiness. We can have stale entries on UDP services for 2 reasons: - an endpoint was receiving traffic and is removed or replaced - a service was receiving traffic but not forwarding it, and starts to forward it. Add an e2e test to cover the regression
b7c76de
to
909925b
Compare
@aojea: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
so, it was that commit, @danwinship , this is ready to merge and backport, I squashed the e2e test with the kube-proxy changes and removed the commit that was causing issues with the e2e framework |
/remove-area kubeadm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/triage accepted |
…3-upstream-release-1.22 Automated cherry pick of #106163: kube-proxy: fix stale detection logic
/kind bug
/kind regression
What this PR does / why we need it:
The init container makes that the server pod is not ready, however, the endpoint slices are created, it is just
that the Endpoint conditions Ready is false.
If the kube-proxy conntrack logic doesn't check readiness, it will delete the conntrack entries for the UDP server
when the endpoint slice has been created, however, the iptables rules will not installed until at least one
endpoint is ready. If some traffic arrives to since kube-proxy clear the entries (see the endpoint slice) and
installs the corresponding iptables rules (the endpoint is ready), a conntrack entry will be generated blackholing
subsequent traffic.
Fixes #105657