Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore support for env-based API server discovery #1000

Closed
olix0r opened this issue Sep 6, 2022 · 12 comments · Fixed by #1001
Closed

Restore support for env-based API server discovery #1000

olix0r opened this issue Sep 6, 2022 · 12 comments · Fixed by #1001
Labels
bug Something isn't working

Comments

@olix0r
Copy link
Contributor

olix0r commented Sep 6, 2022

Current and expected behavior

In Linkerd, we have a report that dd0b258 breaks discovery for some configurations of Azure clusters. I'm not 100% clear on the details, but it sounds like they can't rely on the default kubernetes service.

Given that client-go still honors these environment variables, this manifests as only the Rust controllers being unable to communicate with the API server.

Possible solution

In the interest of compatibility (i.e., with client-go's behavior), I think it would be best to restore support for the environment variables.

If we really don't want to continue to support the environment variables, we should probably engage client-go about removing support for the environment variables so that there's really no expectation that they can be used.

Additional context

No response

Environment

...

Configuration and features

...

Affected crates

kube-client

Would you like to work on fixing this bug?

maybe

@olix0r olix0r added the bug Something isn't working label Sep 6, 2022
@olix0r olix0r changed the title Restore support for Restore support for environment-based API server discovery Sep 6, 2022
@olix0r olix0r changed the title Restore support for environment-based API server discovery Restore support for env-based API server discovery Sep 6, 2022
@olix0r
Copy link
Contributor Author

olix0r commented Sep 6, 2022

From recent AKS release notes https://github.com/Azure/AKS/releases/tag/2022-07-17:

The annotation kubernetes.azure.com/set-kube-service-host-fqdn can now be added to pods to set the KUBERNETES_SERVICE_HOST variable to the domain name of the API server instead of the in-cluster service IP. This is useful in cases where the cluster egress is via a layer 7 firewall, like Azure Firewall with Application Rules.

AKS clusters with egress firewalls basically require the use of the environment variables for discovery :(

@nightkr
Copy link
Member

nightkr commented Sep 6, 2022

Seems like the important section is https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic#required-outbound-network-rules-and-fqdns-for-aks-clusters. This seems like a pretty weird way to handle things on their end, but I can see other use cases for letting users override the address. And as you say, it's probably worth at least trying to stay aligned with client-go to minimize confusion.

I'd be +1 on reverting #876, but waiting for @clux's input since he seemed to have pretty strong opinions on the topic.

@olix0r
Copy link
Contributor Author

olix0r commented Sep 6, 2022

I opened an issue on AKS Azure/AKS#3183, but that's probably a longer conversation.

I'd be willing to get a PR together if we can make it into the next kube release :)

@clux
Copy link
Member

clux commented Sep 6, 2022

Well, the problem is that the old method also has problems, so not sure there's many great solutions here for us.

And as long as client-go has this TODO, it's not super tempting to switch our default way back to what is also broken.

What about some way to opt-into to old behaviour like some KUBE_LEGACY_CLUSTER_IP evar for AKS?

@nightkr
Copy link
Member

nightkr commented Sep 6, 2022

The AWS thing looks like an error on our side, Rust's IpAddr also seems to accept "bare" IPv6 addresses. I think a reasonable heurestic would be to apply [{host}]:{port} if host.contains(':'), otherwise stick to {host}:{port}.

The rustls situation is sadder, but outside of our control (and an opt-in feature).

@olix0r
Copy link
Contributor Author

olix0r commented Sep 6, 2022

@clux Thanks. This is helpful context.

What about some way to opt-into to old behaviour like some KUBE_LEGACY_CLUSTER_IP evar for AKS?

Something like this is probably a good middle-ground. In Linkerd, we could support an install flag that enables this.

Alternatively, we could do something more clever (but maybe too clever): only honor KUBERNETES_SERVICE_HOST when it is not an IP. E.g.

fn kube_dns() -> String {
    if let Ok(h) = std::env::var("KUBERNETES_SERVICE_HOST") {
        if !h.is_empty() && h.parse::<std::net::IpAddr>().is_err() {
            return h
        }
    }
    "kubernetes.default.svc".to_string()
}

@nightkr
Copy link
Member

nightkr commented Sep 6, 2022

Alternatively, we could do something more clever (but maybe too clever): only honor KUBERNETES_SERVICE_HOST when it is not an IP. E.g.

This feels waaaay too weird and magical IMO.

Maybe if we emit a loud warning when ignoring the address, but that would give 99% of users a warning for a non-issue that they have no control over.

@olix0r
Copy link
Contributor Author

olix0r commented Sep 6, 2022

I also opened kubernetes/kubernetes#112263 to discuss changing client-go's behavior.

@clux
Copy link
Member

clux commented Sep 6, 2022

too magical

Yeah, same. Given this is legacy behaviour, only affects a smaller number of users, and can be reasonably opted in by Linkerd, it is preferable for us to have it be an explicit opt-in in our codebase about what this is meant to do - and that it is not intended to stick around (unless client-go does a 180 at some point).

Logging a message for using non-standard Kubernetes behaviour when activating this code path is probably a good idea as well, but it can be less loud when it's opt-in.

(The IPv6 issue could also be fixed, but when we tried that fix it, we ran into cert validation issues, so making it opt-in avoids us spending time on that.)

@clux
Copy link
Member

clux commented Sep 6, 2022

Those upstream issues are great. Thanks a lot!

olix0r added a commit to olix0r/kube-rs that referenced this issue Sep 7, 2022
The `Config::from_cluster_env` constructor is misleadingly named: it
doesn't use the environment, it uses the default cluster configurations.

This change deprecates the `Config::from_cluster_env` constructor in
favor of `Config::load_in_cluster`. An additional constructor,
`Config::load_in_cluster_from_legacy_env`, uses the
`KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT` environment
variables to match client-go's behavior.

This changes does NOT alter the default inferred configuration in any
way. It simply allows users to opt-in to using the old behavior.

Related to kubernetes/kubernetes#112263
Closes kube-rs#1000

Signed-off-by: Oliver Gould <ver@buoyant.io>
olix0r added a commit to olix0r/kube-rs that referenced this issue Sep 7, 2022
The `Config::from_cluster_env` constructor is misleadingly named: it
doesn't use the environment, it uses the default cluster configurations.

This change deprecates the `Config::from_cluster_env` constructor in
favor of `Config::load_in_cluster`. An additional constructor,
`Config::load_in_cluster_from_legacy_env`, uses the
`KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT` environment
variables to match client-go's behavior.

This changes does NOT alter the default inferred configuration in any
way. It simply allows users to opt-in to using the old behavior.

Related to kubernetes/kubernetes#112263
Closes kube-rs#1000

Signed-off-by: Oliver Gould <ver@buoyant.io>
@olix0r
Copy link
Contributor Author

olix0r commented Sep 7, 2022

#1001 adds a separate config constructor; but kubernetes/kubernetes#112263 (comment) suggests that client-go's behavior may be "correct." If that's the case, #1001 at least fixes host/port formatting for IPv6; and we should probably try to fix the webpki problem in Rustls...

https://github.com/rustls/webpki/ is a recent fork of briansmith/webpki that will be maintained by the rustls folks. We may have a path forward to fix IP validation in that repo. But, from our experience in Linkerd, there are more than enough quirks in rustls that make it a poor choice for supporting a wide variety of cluster configurations.

@clux clux closed this as completed in #1001 Sep 9, 2022
clux pushed a commit that referenced this issue Sep 9, 2022
* client: Expose a Config constructor to support legacy configurations

The `Config::from_cluster_env` constructor is misleadingly named: it
doesn't use the environment, it uses the default cluster configurations.

This change deprecates the `Config::from_cluster_env` constructor in
favor of `Config::load_in_cluster`. An additional constructor,
`Config::load_in_cluster_from_legacy_env`, uses the
`KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT` environment
variables to match client-go's behavior.

This changes does NOT alter the default inferred configuration in any
way. It simply allows users to opt-in to using the old behavior.

Related to kubernetes/kubernetes#112263
Closes #1000

Signed-off-by: Oliver Gould <ver@buoyant.io>

* constify "https" scheme to make accidental "http" harder

Signed-off-by: Oliver Gould <ver@buoyant.io>

* Restore `Config::from_cluster_env` naming

Add `Config::from_cluster_dns` to support the current behavior.

Signed-off-by: Oliver Gould <ver@buoyant.io>

* Disable the in-cluster rustls test

Signed-off-by: Oliver Gould <ver@buoyant.io>

* fix typo

Signed-off-by: Oliver Gould <ver@buoyant.io>

* client: Make discovery conditional on the TLS impl

When `rustls-tls` is enabled, the `kubernetes.default.svc` DNS name is
used. Otherwise, the `KUBERNETES_SERVICE_{HOST,PORT}` environment
variables are used.

Signed-off-by: Oliver Gould <ver@buoyant.io>

* Review feedback

* Make `Config::incluster_env` and `Config::incluster_dns` public
  regardless of what features are enabled.
* Restrict visibility for `pub` helpers that are not actually publicly
  exported.

Signed-off-by: Oliver Gould <ver@buoyant.io>

* Add URI-formatting tests

Signed-off-by: Oliver Gould <ver@buoyant.io>

* fmt

Signed-off-by: Oliver Gould <ver@buoyant.io>

Signed-off-by: Oliver Gould <ver@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants