Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Load Balancer Health Checks #724

Merged
merged 3 commits into from
May 27, 2024
Merged

Conversation

bbassingthwaite
Copy link
Contributor

Our current health check implementation is flawed. The external load balancer should not be responsible for health checking the pods since Kubernetes is already doing that. It instead should be health checking the worker nodes themselves. This is also dependent on the externalTrafficPolicy set on the Service itself.

If the externalTrafficPolicy=Local then a health check node port is created to serve health checks for the node. If there are 1 or more active pods on the node, then the health check will return 200 and it will indicate how many healthy pods are running. If there is no active pods, then it will return a 503 and will fail the health checks. It is necessary for the LB to health check this dedicated endpoint to ensure we respect the lifecycle of the worker node and the pods running on it.

If the externalTrafficPolicy=Cluster then we should health check kube-proxy running on the node. This is controlled by --healthz-bind-address which defaults to 0.0.0.0:10256. Health checking kube-proxy allows us to follow the lifecycle of the node such as whether it is up and ready to serve traffic, or whether we should stop routing traffic because it is tainted, is being removed due to scaling down, or being removed due to maintenance. In this case, kube proxy is responsible for understanding the health of each individual pod and managing which pods should receive traffic. In our current behaviour, if a pod goes unhealthy, it can incorrectly mark entire worker nodes as unhealthy which isn't the case. If/when we make the switch to full cilium, we will need to make sure we set --kube-proxy-replacement-healthz-bind-address.

This change will delegate pod healthiness to Kubernetes, which it should have always done. It will instead start health checking the Kubernetes components themselves to ensure they are up and ready to process traffic, and we can stop sending traffic to them when they are no longer healthy or are being drained for any number of reasons. Pod unhealthiness will no longer mark nodes as unhealthy. This should also fix a number of edge cases customers are experiencing when using cluster autoscaling or pod autoscalers.

Copy link
Collaborator

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me in general.

A few quick points:

  • We should probably do a bit of extra QA/testing to make sure this behaves as expected end-to-end for both the Local and Cluster policies. I can help with that.
  • Should we add a quick section to our implementation details documentation describing how health checking is configured for each policy (and reasoning why maybe so that customers don't wonder why they can't configure it ony their own anymore)?
  • Could you please add a change log item already highlighting the change, especially as far as it concerns the "interface changes" (annotations being dropped in favor of a static and better LB health check configuration)?
  • We should check in with the corresponding team internally to make sure the changes here will be reflected in our production documentation page as well.

cloud-controller-manager/do/loadbalancers.go Show resolved Hide resolved
@timoreimann
Copy link
Collaborator

Forgot one point: can you please update the annotations documentation as well accordingly?

Copy link
Contributor

@gottwald gottwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks awesome. Thank you Braden.
The only thing I have slight concerns with is the removal of annotations w/o deprecation period.

cloud-controller-manager/do/loadbalancers.go Show resolved Hide resolved
Copy link
Collaborator

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR needs a rebase now.

@@ -2184,327 +2184,63 @@ func Test_buildHealthCheck(t *testing.T) {
errMsgPrefix string
}{
{
name: "tcp health check",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bring some/all of the tests back now that we still support it (through the fallback annotation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added them back

When `ExternalTrafficPolicy=Local`, the configured health check node port will be used which indicates whether the node has active pods.
In both scenarios, the change will have a positive effect on load balancing behaviour. This will ensure that during life cycle changes within the cluster such as
node autoscaling, node taints, pods going up and down will have the proper behaviour to ensure we don't send traffic to components that are not in a state to serve traffic.
* A new annotation was added `service.beta.kubernetes.io/do-loadbalancer-revert-to-old-health-check` has been added to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the new annotation to the annotations documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

Our current health check implementation is flawed. The external load balancer should not be responsible for health checking the pods since Kubernetes is already doing that. It instead should be health checking the worker nodes themselves. It is also dependent on the externalTrafficPolicy set on the Service itself.

If the `externalTrafficPolicy=Local` then a health check node port is created to serve health checks for the node. If there are 1 or more active pods on the node, then the health check will return 200 and it will indicate how many healthy pods are running. If there is no active pods, then it will return a 503 and will fail the health checks. It is necessary for the LB to health check this dedicated endpoint to ensure we respect the lifecycle of the worker node and the pods running on it.

If the `externalTrafficPolicy=Cluster` then we should health check kube-proxy running on the node. This is controlled by `--healthz-bind-address` which defaults to `0.0.0.0:10256`. Health checking kube-proxy allows us to follow the lifecycle of the node such as whether it is up and ready to serve traffic, or whether we should stop routing traffic because it is tainted, is being removed due to scaling down, or being removed due to maintenance. In this case, kube proxy is responsible for understanding the health of each individual pod and managing which pods should receive traffic. In our current behaviour, if a pod goes unhealthy, it can incorrectly mark entire worker nodes as unhealthy which isn't the case. If/when we make the switch to full cilium, we will need to make sure we set `--kube-proxy-replacement-healthz-bind-address`.

This change will delegate pod healthiness to Kubernetes, which it should have always done. It will instead start health checking the Kubernetes components themselves to ensure they are up and ready to process traffic, and we can stop sending traffic to them when they are no longer healthy or are being drained for any number of reasons. Pod unhealthiness will no longer mark nodes as unhealthy. This should also fix a number of edge cases customers are experiencing when using cluster autoscaling or pod autoscalers.
Copy link
Collaborator

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please squash-merge once we're ready.

@bbassingthwaite bbassingthwaite merged commit 47d22d7 into master May 27, 2024
3 checks passed
@bbassingthwaite bbassingthwaite deleted the bbass/LBAAS-3249/health branch May 27, 2024 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants