Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy consider endpoint readiness to delete UDP stale conntrack entries #106163

Merged
merged 1 commit into from Nov 8, 2021

Conversation

aojea
Copy link
Member

@aojea aojea commented Nov 4, 2021

/kind bug
/kind regression

What this PR does / why we need it:

  1. Create an UDP Service
  2. Client Pod sending traffic to the UDP service
  3. Create an UDP server associated to the Service created in 1. with an init container that sleeps for some time

The init container makes that the server pod is not ready, however, the endpoint slices are created, it is just
that the Endpoint conditions Ready is false.
If the kube-proxy conntrack logic doesn't check readiness, it will delete the conntrack entries for the UDP server
when the endpoint slice has been created, however, the iptables rules will not installed until at least one
endpoint is ready. If some traffic arrives to since kube-proxy clear the entries (see the endpoint slice) and
installs the corresponding iptables rules (the endpoint is ready), a conntrack entry will be generated blackholing
subsequent traffic.

Fixes #105657

fix a 1.22 kube-proxy regression on UDP services because the logic to detect stale connections was not considering if the endpoint was ready.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/regression Categorizes issue or PR as related to a regression from a prior release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 4, 2021
@aojea
Copy link
Member Author

aojea commented Nov 4, 2021

/sig network

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/network Categorizes an issue or PR as relevant to SIG Network. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 4, 2021
@aojea
Copy link
Member Author

aojea commented Nov 5, 2021

ok, the test reproduces the issue

Kubernetes e2e suite: [sig-network] Conntrack should be able to preserve UDP traffic when initial unready endpoints get ready expand_less 1m40s
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/conntrack.go:293
Nov 5 00:08:20.758: Failed to connect to backend pod
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/onsi/ginkgo/internal/leafnodes/runner.go:113

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2021
@aojea aojea changed the title [WIP] kube-proxy consider endpoint readiness to delete UDP stale conntrack entries kube-proxy consider endpoint readiness to delete UDP stale conntrack entries Nov 5, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2021
@aojea
Copy link
Member Author

aojea commented Nov 5, 2021

/assign @thockin @danwinship

pkg/proxy/endpoints.go Show resolved Hide resolved
epReady := 0
for _, ep := range epList {
if ep.IsReady() {
epReady++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... this doesn't take ProxyTerminatingEndpoints into account...

Consider the case where an externalTrafficPolicy: Local service has a single Serving-Terminating endpoint. Connections come in to that endpoint's node and are accepted and processed by the terminating pod. Then a new endpoint starts up and becomes Ready. Given the code here, that would be interpreted as "the service went from 0 endpoints to non-0 endpoints", and so the node with the Serving-Terminating endpoint would flush all conntrack entries for the service, breaking the existing connections to the Serving-Terminating pod.

(Also, this patch changes the rules for staleServices, but there are terminating endpoints problems with staleEndpoints too; we used to delete conntrack entries to endpoints as soon as the endpoint become non-ready, but now we don't delete them until the pod is fully deleted...)

Copy link
Member Author

@aojea aojea Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... this doesn't take ProxyTerminatingEndpoints into account...

this is a regression that needs to be backported and ProxyTerminatingEndpoints is an alpha feature (no backports for alpha are allowed). Also, after the analysis you did in your related PR, I don't think that is easy to solve both problems at the same time 😅

breaking the existing connections to the Serving-Terminating pod.

it is UDP, in the sense that is not breaking the connection per se, since it is connectionless, the new packet will create a new entry looking at the iptables rules, that should still exist ... is less performant because you process the packet through the iptables list again but not a big deal (at least I can't see how this can break something, UDP is unreliable)

(Also, this patch changes the rules for staleServices, but there are terminating endpoints problems with staleEndpoints too; we used to delete conntrack entries to endpoints as soon as the endpoint become non-ready, but now we don't delete them until the pod is fully deleted...)

that is fixed by the Equal change to take into consideration Ready https://github.com/kubernetes/kubernetes/pull/106163/files#r743336980

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is UDP, in the sense that is not breaking the connection per se, since it is connectionless

UDP is connectionless at L4, but not necessarily at L7. That's the main reason UDP conntrack records exist. Eg, anything using Datagram TLS (like QUIC / HTTP/3) won't survive being switched to a different endpoint mid-communication, because the new endpoint won't have the encryption key it needs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 I can't argue against that, but seems we are going to have some fun soon 😄

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
just some typos. (I would have let the "endpoints"/"endpoint" slide, but the "Ff" would annoy me 🙂)

pkg/proxy/endpoints.go Outdated Show resolved Hide resolved
pkg/proxy/endpoints.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Nov 5, 2021
The logic to detect stale endpoints was not assuming the endpoint
readiness.

We can have stale entries on UDP services for 2 reasons:
- an endpoint was receiving traffic and is removed or replaced
- a service was receiving traffic but not forwarding it, and starts
to forward it.

Add an e2e test to cover the regression
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/windows Categorizes an issue or PR as relevant to SIG Windows. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 5, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Nov 5, 2021

@aojea: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-conformance-image-test b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-conformance-image-test
pull-kubernetes-e2e-gce-csi-serial b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-gce-csi-serial
pull-kubernetes-e2e-gce-storage-slow b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-e2e-gce-storage-snapshot b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-gce-storage-snapshot
pull-kubernetes-local-e2e b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-local-e2e
pull-kubernetes-e2e-gce-network-proxy-grpc b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-gce-network-proxy-grpc
pull-kubernetes-e2e-gce-network-proxy-http-connect b7c76dee710fa513c09e26c08944507c2f9a7fba link true /test pull-kubernetes-e2e-gce-network-proxy-http-connect
pull-kubernetes-e2e-gci-gce-ipvs b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-gci-gce-ipvs
pull-kubernetes-e2e-aks-engine-azure-file-windows-dockershim b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-aks-engine-azure-file-windows-dockershim
pull-publishing-bot-validate b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-publishing-bot-validate
pull-kubernetes-e2e-capz-conformance b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-capz-conformance
pull-kubernetes-files-remake b7c76dee710fa513c09e26c08944507c2f9a7fba link true /test pull-kubernetes-files-remake
pull-kubernetes-e2e-capz-azure-disk-vmss b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-capz-azure-disk-vmss
pull-kubernetes-e2e-aks-engine-azure-disk-windows-dockershim b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-aks-engine-azure-disk-windows-dockershim
check-dependency-stats b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test check-dependency-stats
pull-kubernetes-e2e-capz-azure-file-vmss b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-capz-azure-file-vmss
pull-kubernetes-e2e-gce-alpha-features b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-gce-alpha-features
pull-kubernetes-e2e-aks-engine-windows-containerd b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-aks-engine-windows-containerd
pull-kubernetes-e2e-capz-azure-file b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-capz-azure-file
pull-kubernetes-e2e-capz-azure-disk b7c76dee710fa513c09e26c08944507c2f9a7fba link false /test pull-kubernetes-e2e-capz-azure-disk

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2021
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@aojea
Copy link
Member Author

aojea commented Nov 5, 2021

a different one this time?

Kubernetes e2e suite: [sig-network] Proxy version v1 should proxy logs on node with explicit kubelet port using proxy subresource  expand_more

can it be related to fc85bb2 ?

so, it was that commit, @danwinship , this is ready to merge and backport, I squashed the e2e test with the kube-proxy changes and removed the commit that was causing issues with the e2e framework

@enj enj added this to Needs Triage in SIG Auth Old Nov 6, 2021
@neolit123
Copy link
Member

/remove-area kubeadm
/remove-sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot removed area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Nov 8, 2021
@danwinship
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 0940dd6 into kubernetes:master Nov 8, 2021
SIG Auth Old automation moved this from Needs Triage to Closed / Done Nov 8, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 8, 2021
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 9, 2021
k8s-ci-robot added a commit that referenced this pull request Nov 12, 2021
…3-upstream-release-1.22

Automated cherry pick of #106163: kube-proxy: fix stale detection logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/conformance Issues or PRs related to kubernetes conformance tests area/dependency Issues or PRs related to dependency changes area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/ipvs area/kubectl area/kubelet area/network-policy Issues or PRs related to Network Policy subproject area/provider/gcp Issues or PRs related to gcp provider area/release-eng Issues or PRs related to the Release Engineering subproject area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
SIG Auth Old
Closed / Done
Development

Successfully merging this pull request may close these issues.

Wrong udp conntrack entries are populated after a pod is killed using --force
7 participants