Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4631: LoadBalancer Service Status Improvements, initial proposal #4632

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danwinship
Copy link
Contributor

  • One-line PR description: Initial proposal for KEP-4631
  • Other comments:
While updating the e2e load balancer tests after the final removal of
in-tree cloud providers, we have run into three problems:

  1. The tests have hard-coded timeouts (that sometimes differ per
     cloud provider) for deciding how long to wait for the cloud
     provider to update the service. It would make much more sense for
     the cloud provider to just provide information about its status
     on the Service object, so the tests could just monitor that.

  2. The tests recognize that not all cloud providers can implement
     all load balancer features, but in the past this was handled by
     hard-coding the information into the individual tests. (e.g.,
     `e2eskipper.SkipUnlessProviderIs("gce", "gke", "aws")`) These
     skip rules no longer work in the providerless tree, and this
     approach doesn't scale anyway. OTOH, we don't want to have to
     provide a separate `Feature:` tag for each load balancer
     subfeature, or have each cloud provider have to maintain their
     own set of `-ginkgo.skip` rules. It would be better if the e2e
     tests themselves could just figure out, somehow, whether they
     were running under a cloud provider that intends to implement the
     feature they are testing, or a cloud provider that doesn't.

  3. In some cases, because the existing tests were only run on
     certain clouds, it is not clear what the expected semantics are
     on other clouds. For example, since `IPMode: Proxy` load
     balancers can't preserve the client source IP in the way that
     `ExternalTrafficPolicy: Local` expects, should they refuse to
     provision a load balancer at all, or should they provision a load
     balancer that fails to preserve the source IP?

This KEP proposes new additions to `service.Status.LoadBalancer` and
`service.Status.Conditions` to allow cloud providers to better
communicate the status of load balancer support and provisioning, and
new guidelines on how cloud providers should handle load balancers for
services that they cannot fully support.

/assign @aojea @thockin

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 13, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 13, 2024
@danwinship
Copy link
Contributor Author

/sig cloud-provider
/sig testing

@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 13, 2024
@danwinship danwinship changed the title KEP-4631: LoadBalancer Service Static Improvements, initial proposal KEP-4631: LoadBalancer Service Status Improvements, initial proposal May 13, 2024
approach doesn't scale anyway. OTOH, we don't want to have to
provide a separate `Feature:` tag for each load balancer
subfeature, or have each cloud provider have to maintain their
own set of `-ginkgo.skip` rules. It would be better if the e2e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "smart tests" that have different execution flows depending on what... current e2e for loadbalancers are not good tests as try to test a lot of different things in one execution ... the test should assert on a feature and a behavior, we may need to break existing tests down to see if we still need to do this


3. In some cases, because the existing tests were only run on
certain clouds, it is not clear what the expected semantics are
on other clouds. For example, since `IPMode: Proxy` load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin this was an interesting finding during the implementation of this mode in cloud-provider-kind, we need to try to flesh out more details before going to GA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem isn't with the IPMode KEP; the distinction between "proxy mode" and "VIP mode" already existed before that; it's just that before IPMode made it explicit, it was controlled implicitly by whether the LB set Hostname or IP.

(Which is to say, even if we dropped IPMode, the problem would still exist. For example, with the default cloud-provider-aws load balancers.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this specific feature: IMO, eTP=Local means what it says. It's not an awesomely designed feature because it presumes too much, but it says "if you are forwarding external traffic, only choose a local endpoint". Proxy-ish LB impls can't really retain the client IP, regardless of eTP, that doesn't change the meaning.

Looking at API docs for it, I think we can clarify:

     // externalTrafficPolicy describes how nodes distribute service traffic they
     // receive on one of the Service's "externally-facing" addresses (NodePorts,
     // ExternalIPs, and LoadBalancer IPs). If set to "Local", the proxy will configure
     // the service in a way that assumes that external load balancers will take care
     // of balancing the service traffic between nodes, and so each node will deliver
     // traffic only to the node-local endpoints of the service, without masquerading                                                                                                
-    // the client source IP. (Traffic mistakenly sent to a node with no endpoints will
+    // the source IP. (Traffic mistakenly sent to a node with no endpoints will
     // be dropped.) The default value, "Cluster", uses the standard behavior of
     // routing to all endpoints evenly (possibly modified by topology and other
     // features). Note that traffic sent to an External IP or LoadBalancer IP from
     // within the cluster will always get "Cluster" semantics, but clients sending to
     // a NodePort from within the cluster may need to take traffic policy into account
     // when picking a node.

For a Proxy-ish LB the source IP is the proxy itself. eTP=Local should still preserve that. Whether it is useful or not is a question for end users.

Comment on lines +139 to +150
- Allow cloud providers to indicate that they are working on
provisioning load balancer infrastructure, so that
users/operators/tests can distinguish the case of "it is taking a
while for the cloud to provision the load balancer" from "the cloud
has failed to provision a load balancer" and "there is no cloud
provider so load balancers don't work".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

provision a particular `LoadBalancer` service, and why.

- Allow cloud providers to indicate when they have provided an
"imperfect" load balancer that the user may or may not consider to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of degraded mode?

cc: @bowei

so this feels "un-Kubernetes-like", though at the same time, it's not
like the current Kubernetes networking configuration situation is
really great, and there has been some discussion of trying to provide
more explicit and well-defined cluster networking configuration in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is in the "train has left the station" section :)

Would it be better to add a `Conditions` field to
`v1.LoadBalancerIngress` so that we can specify conditions per element
of `.Status.LoadBalancer.Ingress`? IOW, should it be possible for a
load balancer to express that it has multiple IPs in different states?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only consider final states, and not partial ones


3. In some cases, because the existing tests were only run on
certain clouds, it is not clear what the expected semantics are
on other clouds. For example, since `IPMode: Proxy` load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this specific feature: IMO, eTP=Local means what it says. It's not an awesomely designed feature because it presumes too much, but it says "if you are forwarding external traffic, only choose a local endpoint". Proxy-ish LB impls can't really retain the client IP, regardless of eTP, that doesn't change the meaning.

Looking at API docs for it, I think we can clarify:

     // externalTrafficPolicy describes how nodes distribute service traffic they
     // receive on one of the Service's "externally-facing" addresses (NodePorts,
     // ExternalIPs, and LoadBalancer IPs). If set to "Local", the proxy will configure
     // the service in a way that assumes that external load balancers will take care
     // of balancing the service traffic between nodes, and so each node will deliver
     // traffic only to the node-local endpoints of the service, without masquerading                                                                                                
-    // the client source IP. (Traffic mistakenly sent to a node with no endpoints will
+    // the source IP. (Traffic mistakenly sent to a node with no endpoints will
     // be dropped.) The default value, "Cluster", uses the standard behavior of
     // routing to all endpoints evenly (possibly modified by topology and other
     // features). Note that traffic sent to an External IP or LoadBalancer IP from
     // within the cluster will always get "Cluster" semantics, but clients sending to
     // a NodePort from within the cluster may need to take traffic policy into account
     // when picking a node.

For a Proxy-ish LB the source IP is the proxy itself. eTP=Local should still preserve that. Whether it is useful or not is a question for end users.

- The Service has `AllocateLoadBalancerNodePorts: false`, but the
cloud only supports NodePort-based load balancing.

- The Service is `ExternalTrafficPolicy: Local` but the cloud cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not really about IP preservation but has everything to do with not implementing hCNP or some equivalent mechanism. Minor wording change requested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eTP=Local means what it says

lol, I thought that in the whole "what things should be traffic policy vs what things should be topology" debate (around PreferLocal) we had agreed that eTP=Local doesn't mean what it says, because the actual intent of the feature is "preserve client source IP", not "route traffic in a particular way"; the routing is purely a side effect of making it possible to implement "preserve client source IP". (IOW an implementation of eTP:Local that preserves client IP while doing routing in an unexpected way would be compliant, but an implementation that routes "correctly" but loses client IP is not.)

I think this is not really about IP preservation but has everything to do with not implementing hCNP or some equivalent mechanism.

What I was saying in that example was, if client IP preservation is considered a mandatory-to-implement aspect of eTP:Local, then there's no point in proxy-ish LBs implementing eTP:Local at all, and thus no reason for them to implement HCNP. (Whereas for VIP-ish LBs, there is really no good argument for not implementing HCNP.)

(If we don't think that client IP preservation is required for eTP:Local, then we should just assume all LBs will implement HCNP, and they're just buggy if they don't.)

keps/sig-network/4631-loadbalancerstatus/README.md Outdated Show resolved Hide resolved
supports single-stack load balancers, so it would only be able to
serve clients of one IP family.

- The Service is `ExternalTrafficPolicy: Local` but the cloud cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is not a warning - IPMode already indicates this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually no. Azure has this trick where the entire cloud network is aware of the load balancer NAT state, so the LB can DNAT packets to a NodePort without masquerading them, and then the reply packet will get un-DNAT-ed correctly even though logically speaking it doesn't pass through the LB.

<<[/UNRESOLVED]>>
```

#### The `LoadBalancerServing` Condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we arrange it so that, as much as possible, the behavior and names are the same between Services and Gateways?

update, even if the value of `LoadBalancerProvisioning` remains
`False`.

#### Terminating Condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not conviced this is useful - what does a user do with this information?

indicating that the load balancer for the service is already
available.

3. The cloud provider sets `LoadBalancerProvisioning=True`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about (true, true) ? Conditions are best when they are orthogonal. These are not. We have to spec it, since both fields COULD be set at the same time.

IOW, don't let this become a state-machine API. It's a collection of roughly independent observations.

I see it more clearly below, so now this feels duplicative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually specified below; (true, true) means it is both serving and provisioning. ie, it is continuing to serve while reprovisioning for an update.

IOW, don't let this become a state-machine API. It's a collection of roughly independent observations.

I feel like we definitely want the "provisioning" observation. I feel like the "serving" observation is also useful? Did you have some other idea for conditions?

@k8s-ci-robot k8s-ci-robot removed the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 23, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 23, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 24, 2024
@danwinship
Copy link
Contributor Author

Updates:

  • Updated for review comments, started filling in the rest of the template ("Test Plan", "Graduation", "Version Skew Strategy", PRR, etc)
  • Removed UNRESOLVED auto-skipping: Antonio agrees it's reasonable to have "tri-state" tests (pass/fail/skip) as long as the semantics are explicit. In some cases this will require rewriting the tests to put the "objectionable" bits at the start, so we can always skip immediately after the initial LB provisioning if the cloud doesn't support the feature.
  • The version skew section made me think about the behavior when a cloud provider is updated and finds itself in a cluster with pre-existing "bad" load balancers, and whether we should maybe add an explicit "unsupported" or "broken" condition.
  • Added a summary section to the top of the "Expected Behavior When the Cloud Provider Doesn't Know That It Can't Implement a Load Balancer" section, which I think is the big open question before this becomes implementable.
  • Added detailed notes to the e2e Test Plan section clarifying what skips will be needed for which tests. Also, while writing out the Test Plan section, I realized that to avoid regressing coverage, we're basically going to have to fork the LB tests, so we can have one hacky GCE-and-kind-only set, and one KEP-4631 set, which will initially be Alpha, but which will eventually replace the hacky ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants