New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from 2.10.2 -> 2.11.0 - Kube Client "error trying to connect: tls handshake eof" in Policy Controller #7098
Comments
Thanks @sigurdfalk. This sounds similar to #7011. In this issue, we observed that the policy controller only works with a strict subset of ECDSA algorithms specified in the TLSv1.3 RFC:
Can you share an example of a certificate that did not work? |
@olix0r Thanks for the response. We figured that the Rust tls http client that is used for the Policy Controller don't support apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: linkerd-policy-validator
namespace: linkerd
spec:
secretName: linkerd-policy-validator-k8s-tls
duration: 24h
renewBefore: 1h
issuerRef:
name: webhook-issuer
kind: Issuer
commonName: linkerd-policy-validator.linkerd.svc
dnsNames:
- linkerd-policy-validator.linkerd.svc
isCA: false
privateKey:
algorithm: ECDSA
usages:
- server auth We then tried using apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: linkerd-policy-validator
namespace: ${local.linkerd_namespace}
spec:
secretName: linkerd-policy-validator-k8s-tls
duration: 24h
renewBefore: 1h
issuerRef:
name: webhook-issuer
kind: Issuer
commonName: linkerd-policy-validator.linkerd.svc
dnsNames:
- linkerd-policy-validator.linkerd.svc
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
usages:
- server auth |
Interesting! I think this issue is related to rustls/rustls#332 and kube-rs/kube#542. We see:
But rust's crypto libraries do not currently support PEM-formatted ECDSA private keys. See more discussion here. For the time being we'll have to require that webhook credentials are RSA (or that ECDSA keys are provided in in DER format, though I doubt cert-manager supports that out of the box). We'll probably want to followup on djc's suggestion to implement a standalone PEM decoder for these cases. |
Actually... If I configure my cert-manager certificate with: privateKey:
algorithm: ECDSA
encoding: PKCS8 I get credentials with:
Which might work. We'll give this a try later, or if you can try it and report back, that might save us some time ;) |
From some very brief testing, this seems to work. We should update the docs at https://linkerd.io/2.11/tasks/automatically-rotating-webhook-tls-credentials/ (https://github.com/linkerd/website) -- which doesn't even document the policy controller webhook config at the moment -- to reflect this. |
Docs updated linkerd/website#1221 |
@sigurdfalk The updated docs at https://linkerd.io/2.11/tasks/automatically-rotating-webhook-tls-credentials/#issuing-certificates-and-writing-them-to-secrets should get you a working cluster with 2.11. It would be great if you confirm that this all works as expected (i.e. by adding |
@olix0r Thats great, thank you! I'm gonna try verifying this tomorrow 🙏🏻 |
@olix0r The policy controller now accepts the cert with I did a fresh install of New dump of errors:
And we still see the requests coming in to our API Server, which responds with status
Not sure if relevant, but we have a Azure Firewall between the cluster and the API Server. But this has never been an issue before. |
@sigurdfalk the new policy controller is written in Rust using Can you share the output of:
I'm curious about the parameters used in API server's CA certificate... We've definitely seen Linkerd 2.11 working fine in AKS...
This is probably pedantic, but this is actually from the destination controller, not the policy controller (though both run in the same pod). So, I agree that probably confirms that it's not a firewall issue. You could try installing 2.11 with |
@olix0r We really appreciate Linkerd and are happy to keep debugging this 🙂 Cert output is:
The logs got really big when enabling TRACE, so a bit hard for me lacking knowledge og the Rust application to make too much sense of them. But I'll keep digging. I dumped all logs from application start in the attached file (1 second window): |
@sigurdfalk Thanks! that certificate looks basically normal, as do the logs. We'll do some digging on a working cluster and see if we can come up with any differences. |
@olix0r did you have some luck in your testing? |
@sigurdfalk Sorry, I don't think we have any leads on this issue yet. It's still on our radar, though. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
@olix0r we no longer have this issue with |
@sigurdfalk Excellent. Thanks for confirming! |
Bug Report
What is the issue?
We are having issues upgrading from
2.10.2
->2.11.0
. It seems to be related to the new Policy Controller. The controller fails starting up, apparently when the Kube Client is trying to watch resources: "kube::client: failed with error error trying to connect: tls handshake eof". Logs:We are able to track the request to our API Server which seem to respond with status
200
, but indicating that "Connection closed early":linkerd check
outputWe didn't save the output from
linkerd check
and have rolled back to2.10.2
now. However, when we ran the check while having the issues, it reported all ✅Environment
Additional context
Linker
2.10.2
has been running in the same cluster without any issues. We had some trouble creating a certificate for the policy controller using Cert Manager as the Rust TLS http client used apparently dont supportECDSA
. Solved this by switching toRSA
as discussed in this thread on Linkerd Slack.The text was updated successfully, but these errors were encountered: