kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163

aojea · 2021-11-04T23:41:16Z

/kind bug
/kind regression

What this PR does / why we need it:

Create an UDP Service
Client Pod sending traffic to the UDP service
Create an UDP server associated to the Service created in 1. with an init container that sleeps for some time

The init container makes that the server pod is not ready, however, the endpoint slices are created, it is just
that the Endpoint conditions Ready is false.
If the kube-proxy conntrack logic doesn't check readiness, it will delete the conntrack entries for the UDP server
when the endpoint slice has been created, however, the iptables rules will not installed until at least one
endpoint is ready. If some traffic arrives to since kube-proxy clear the entries (see the endpoint slice) and
installs the corresponding iptables rules (the endpoint is ready), a conntrack entry will be generated blackholing
subsequent traffic.

Fixes #105657

fix a 1.22 kube-proxy regression on UDP services because the logic to detect stale connections was not considering if the endpoint was ready.

aojea · 2021-11-04T23:42:26Z

/sig network

aojea · 2021-11-05T00:11:50Z

ok, the test reproduces the issue

Kubernetes e2e suite: [sig-network] Conntrack should be able to preserve UDP traffic when initial unready endpoints get ready expand_less 1m40s
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/conntrack.go:293
Nov 5 00:08:20.758: Failed to connect to backend pod
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:113

aojea · 2021-11-05T00:20:51Z

/assign @thockin @danwinship

pkg/proxy/endpoints.go

danwinship · 2021-11-05T02:05:08Z

pkg/proxy/endpoints.go

+		epReady := 0
+		for _, ep := range epList {
+			if ep.IsReady() {
+				epReady++


So... this doesn't take ProxyTerminatingEndpoints into account...

Consider the case where an externalTrafficPolicy: Local service has a single Serving-Terminating endpoint. Connections come in to that endpoint's node and are accepted and processed by the terminating pod. Then a new endpoint starts up and becomes Ready. Given the code here, that would be interpreted as "the service went from 0 endpoints to non-0 endpoints", and so the node with the Serving-Terminating endpoint would flush all conntrack entries for the service, breaking the existing connections to the Serving-Terminating pod.

(Also, this patch changes the rules for staleServices, but there are terminating endpoints problems with staleEndpoints too; we used to delete conntrack entries to endpoints as soon as the endpoint become non-ready, but now we don't delete them until the pod is fully deleted...)

So... this doesn't take ProxyTerminatingEndpoints into account...

this is a regression that needs to be backported and ProxyTerminatingEndpoints is an alpha feature (no backports for alpha are allowed). Also, after the analysis you did in your related PR, I don't think that is easy to solve both problems at the same time 😅

breaking the existing connections to the Serving-Terminating pod.

it is UDP, in the sense that is not breaking the connection per se, since it is connectionless, the new packet will create a new entry looking at the iptables rules, that should still exist ... is less performant because you process the packet through the iptables list again but not a big deal (at least I can't see how this can break something, UDP is unreliable)

(Also, this patch changes the rules for staleServices, but there are terminating endpoints problems with staleEndpoints too; we used to delete conntrack entries to endpoints as soon as the endpoint become non-ready, but now we don't delete them until the pod is fully deleted...)

that is fixed by the Equal change to take into consideration Ready https://github.com/kubernetes/kubernetes/pull/106163/files#r743336980

it is UDP, in the sense that is not breaking the connection per se, since it is connectionless

UDP is connectionless at L4, but not necessarily at L7. That's the main reason UDP conntrack records exist. Eg, anything using Datagram TLS (like QUIC / HTTP/3) won't survive being switched to a different endpoint mid-communication, because the new endpoint won't have the encryption key it needs.

👀 I can't argue against that, but seems we are going to have some fun soon 😄

danwinship

/approve
just some typos. (I would have let the "endpoints"/"endpoint" slide, but the "Ff" would annoy me 🙂)

pkg/proxy/endpoints.go

The logic to detect stale endpoints was not assuming the endpoint readiness. We can have stale entries on UDP services for 2 reasons: - an endpoint was receiving traffic and is removed or replaced - a service was receiving traffic but not forwarding it, and starts to forward it. Add an e2e test to cover the regression

k8s-ci-robot · 2021-11-05T19:15:45Z

@aojea: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-triage-robot · 2021-11-05T19:32:42Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

aojea · 2021-11-05T22:10:19Z

a different one this time?

Kubernetes e2e suite: [sig-network] Proxy version v1 should proxy logs on node with explicit kubelet port using proxy subresource  expand_more

can it be related to fc85bb2 ?

so, it was that commit, @danwinship , this is ready to merge and backport, I squashed the e2e test with the kube-proxy changes and removed the commit that was causing issues with the e2e framework

neolit123 · 2021-11-08T15:48:58Z

/remove-area kubeadm
/remove-sig cluster-lifecycle

danwinship · 2021-11-08T16:46:06Z

/lgtm
/approve

k8s-ci-robot · 2021-11-08T16:46:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/proxy/OWNERS~~ [aojea,danwinship]
~~test/e2e/network/OWNERS~~ [aojea,danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fedebongio · 2021-11-09T21:05:43Z

/triage accepted

…3-upstream-release-1.22 Automated cherry pick of #106163: kube-proxy: fix stale detection logic

k8s-ci-robot requested review from cmluciano and pohly November 4, 2021 23:42

aojea mentioned this pull request Nov 5, 2021

misc iptables proxy fixes #106030

Merged

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2021

aojea changed the title ~~[WIP] kube-proxy consider endpoint readiness to delete UDP stale conntrack entries~~ kube-proxy consider endpoint readiness to delete UDP stale conntrack entries Nov 5, 2021

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2021

k8s-ci-robot assigned danwinship and thockin Nov 5, 2021

danwinship reviewed Nov 5, 2021

View reviewed changes

aojea force-pushed the conntrack_readiness branch from af41c0b to 61837ff Compare November 5, 2021 09:02

danwinship reviewed Nov 5, 2021

View reviewed changes

pkg/proxy/endpoints.go Outdated Show resolved Hide resolved

pkg/proxy/endpoints.go Outdated Show resolved Hide resolved

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Nov 5, 2021

aojea force-pushed the conntrack_readiness branch from b7c76de to 909925b Compare November 5, 2021 19:15

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2021

enj added this to Needs Triage in SIG Auth Old Nov 6, 2021

k8s-ci-robot removed area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Nov 8, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2021

k8s-ci-robot merged commit 0940dd6 into kubernetes:master Nov 8, 2021

SIG Auth Old automation moved this from Needs Triage to Closed / Done Nov 8, 2021

k8s-ci-robot added this to the v1.23 milestone Nov 8, 2021

aojea mentioned this pull request Nov 8, 2021

Automated cherry pick of #106163: kube-proxy: fix stale detection logic #106239

Merged

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 9, 2021

aojea mentioned this pull request Nov 9, 2021

UDP traffic to a single Pod behind a ClusterIP is blackholed when the destination Pod is recreated (wrong conntrack) #106274

Closed

k8s-ci-robot added a commit that referenced this pull request Nov 12, 2021

Merge pull request #106239 from aojea/automated-cherry-pick-of-#10616…

e8768d7

…3-upstream-release-1.22 Automated cherry pick of #106163: kube-proxy: fix stale detection logic

aroradaman mentioned this pull request Jul 12, 2023

Conntrack cleaning happens before network programming endpoints #119249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163

kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163

aojea commented Nov 4, 2021 •

edited by liggitt

aojea commented Nov 4, 2021

aojea commented Nov 5, 2021

aojea commented Nov 5, 2021

danwinship Nov 5, 2021

aojea Nov 5, 2021 •

edited

danwinship Nov 5, 2021

aojea Nov 5, 2021

danwinship left a comment

k8s-ci-robot commented Nov 5, 2021 •

edited

k8s-triage-robot commented Nov 5, 2021

aojea commented Nov 5, 2021

neolit123 commented Nov 8, 2021

danwinship commented Nov 8, 2021

k8s-ci-robot commented Nov 8, 2021

fedebongio commented Nov 9, 2021

kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163

kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163

Conversation

aojea commented Nov 4, 2021 • edited by liggitt

What this PR does / why we need it:

aojea commented Nov 4, 2021

aojea commented Nov 5, 2021

aojea commented Nov 5, 2021

danwinship Nov 5, 2021

Choose a reason for hiding this comment

aojea Nov 5, 2021 • edited

Choose a reason for hiding this comment

danwinship Nov 5, 2021

Choose a reason for hiding this comment

aojea Nov 5, 2021

Choose a reason for hiding this comment

danwinship left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 5, 2021 • edited

k8s-triage-robot commented Nov 5, 2021

aojea commented Nov 5, 2021

neolit123 commented Nov 8, 2021

danwinship commented Nov 8, 2021

k8s-ci-robot commented Nov 8, 2021

fedebongio commented Nov 9, 2021

aojea commented Nov 4, 2021 •

edited by liggitt

aojea Nov 5, 2021 •

edited

k8s-ci-robot commented Nov 5, 2021 •

edited