Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HNS Load Balancer Healthchecks for ExternalTrafficPolicy: Local #99287

Merged
merged 1 commit into from Mar 17, 2022

Conversation

anfernee
Copy link
Member

@anfernee anfernee commented Feb 21, 2021

What type of PR is this?
/kind bug

What this PR does / why we need it:
In GCE, the current externalTrafficPolicy: Local logic does not work because the nodes that run the pods do not setup load balancer ports. This means that the GCLB does not understand which nodes are serving the pods that can accept traffic. Since all report unhealthy it'll direct traffic to any node. This PR configures the health check ports so that GCLB knows which nodes can handle the traffic.

See #62046 for details.
This work is based on #96998

Which issue(s) this PR fixes:

Fixes #62046

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

externalTrafficPolicy: local works on Windows hosts with DSR support.

Release Note

Add 2 new options for kube-proxy running in winkernel mode. 
`--forward-healthcheck-vip`, if specified as true, health check traffic whose destination is service VIP will be forwarded to kube-proxy's healthcheck service. `--root-hnsendpoint-name` specifies the name of the hns endpoint for the root network namespace.
This option enables the pass-through load balancers like Google's GCLB to correctly health check the backend services. Without this change, the health check packets is dropped, and Windows node will be considered to be unhealthy by those load balancers.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/test sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 21, 2021
@anfernee anfernee force-pushed the clientip branch 2 times, most recently from b4e8c9b to 98a3948 Compare February 21, 2021 23:12
@k8s-ci-robot k8s-ci-robot added area/provider/gcp Issues or PRs related to gcp provider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. labels Feb 21, 2021
@jeremyje
Copy link
Contributor

/sig windows
/priority important-soon

@k8s-ci-robot k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 22, 2021
@anfernee
Copy link
Member Author

/retest

@anfernee
Copy link
Member Author

@madhanrm @elweb9858 PTAL.

if gceGatewayEndpoint != nil {
hnsHealthCheckLoadBalancer, err := hns.getLoadBalancer(
[]endpointsInfo{*gceGatewayEndpoint},
loadBalancerFlags{isDSR: svcInfo.preserveDIP || proxier.isDSR, useMUX: svcInfo.preserveDIP, preserveDIP: svcInfo.preserveDIP},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the flags here be synced with the flags above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am not sure what this annotation means here: preserveDIP := service.Annotations["preserve-destination"] == "true", they are copied over.

The rest is not necessarily synced, because the only targeting client is GCLB health check, the requirement for the service itself doesn't apply to health check.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 1, 2021
@anfernee anfernee force-pushed the clientip branch 2 times, most recently from 4a102df to 2857436 Compare March 2, 2021 00:35
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2021
@marosset marosset added this to Reviewed - Needs Approval From Other SIGs in SIG-Windows Mar 5, 2021
@thockin
Copy link
Member

thockin commented Jan 29, 2022

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 29, 2022
@anfernee
Copy link
Member Author

anfernee commented Feb 2, 2022

Thanks everyone for all the help for this PR. Finally almost there. Special thanks for the original author of this PR @jeremyje

@anfernee
Copy link
Member Author

anfernee commented Feb 5, 2022

When can we merge this PR?

@jayunit100
Copy link
Member

/lgtm
/approve

@jayunit100
Copy link
Member

Agree it's fine -> Let's merge and keep an eye out

@jayunit100
Copy link
Member

Double checking that you got the earlier note: we should make sure our externalTrafficPolicy.... tests are covering stuff like this somehow, eventually, on the sig-win side

@anfernee
Copy link
Member Author

anfernee commented Feb 7, 2022

Double checking that you got the earlier note: we should make sure our externalTrafficPolicy.... tests are covering stuff like this somehow, eventually, on the sig-win side

I am not exactly familiar with the test setup but I'll check it out. If the test runs on GCE, then we can enable the option there. Existing e2e service test should be able to catch it.

@jsturtevant
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2022
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@anfernee
Copy link
Member Author

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

This change adds 2 options for windows:
--forward-healthcheck-vip: If true forward service VIP for health check
port
--root-hnsendpoint-name: The name of the hns endpoint name for root
namespace attached to l2bridge, default is cbr0

When --forward-healthcheck-vip is set as true and winkernel is used,
kube-proxy will add an hns load balancer to forward health check request
that was sent to lb_vip:healthcheck_port to the node_ip:healthcheck_port.
Without this forwarding, the health check from google load balancer will
fail, and it will stop forwarding traffic to the windows node.

This change fixes the following 2 cases for service:
- `externalTrafficPolicy: Cluster` (default option): healthcheck_port is
10256 for all services. Without this fix, all traffic won't be directly
forwarded to windows node. It will always go through a linux node and
get forwarded to windows from there.
- `externalTrafficPolicy: Local`: different healthcheck_port for each
service that is configured as local. Without this fix, this feature
won't work on windows node at all. This feature preserves client ip
that tries to connect to their application running in windows pod.

Change-Id: If4513e72900101ef70d86b91155e56a1f8c79719
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 12, 2022
@anfernee
Copy link
Member Author

Fixed pkg/generated/openapi/zz_generated.openapi.go. Pushed again.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anfernee, aojea, ibabou, jayunit100, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/code-generation sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Mar 12, 2022
@anfernee
Copy link
Member Author

This PR requires LGTM again @jayunit100 @jsturtevant

@jayunit100
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2022
@anfernee
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 41b29e6 into kubernetes:master Mar 17, 2022
SIG-Windows automation moved this from In Review (v1.24) to Done (v1.24) Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/provider/gcp Issues or PRs related to gcp provider area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG-Windows
  
Done (v1.24)
Development

Successfully merging this pull request may close these issues.

Local traffic policy not working on Windows