Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac: My cluster isn't trusted #89

Closed
garyanaplan opened this issue Nov 14, 2019 · 20 comments · Fixed by #1044
Closed

Mac: My cluster isn't trusted #89

garyanaplan opened this issue Nov 14, 2019 · 20 comments · Fixed by #1044
Labels
bug Something isn't working config Kube config related help wanted Not immediately prioritised, please help! macos macos specific issues

Comments

@garyanaplan
Copy link

garyanaplan commented Nov 14, 2019

kube-rs: 0.17.1
I'm trying to interact with a GCP hosted cluster. The cluster certificate is self-signed. When I start my application I see errors like this:

Error: Error { inner: Error(Hyper(Error(Connect, Custom { kind: Other, error: Error { code: -67843, message: "The certificate was not trusted." } })), "https://<redacted cluster IP>/api/v1/persistentvolumes?")

Error executing request }

If I update my client OS and tell it to trust the certificate then the problem disappears, so I guess the problem is related to the library not realising that it needs to process the cluster certificate somehow. I had a trawl around in the source, but couldn't see anything obviously wrong. There seemed to be some calls to add_root_certificate, but I wasn't sure if they were being called or if I needed to configure my client somehow or...?

Wish I could file something more useful, but maybe that's enough detail for someone to point me towards a solution.

(BTW: I can't employ my certificate work-around in real life, that was just to help understand the problem.)

@garyanaplan
Copy link
Author

Some proper digging around led me to a solution that "works". If I comment out my convenient call to load_kube_config(), then I can build up a Configuration as follows:

// let cfg = config::incluster_config().or_else(|_| config::load_kube_config())?;
    let (builder, loader) = config::create_client_builder(config::ConfigOptions::default())?;
    let mut buf = Vec::new();
    File::open("cluster.crt")?.read_to_end(&mut buf)?;
    let cert = Certificate::from_pem(&buf)?;
    let cfg = config::Configuration::new(
        loader.cluster.server,
        builder.danger_accept_invalid_certs(true).build()?

and then proceed from there. It's nasty but I can work with it for now.

@clux
Copy link
Member

clux commented Nov 14, 2019

Oh, interesting. There's presumably a flag in your ~/.kube/config for accepting invalid certs as well, right?

I think that flag, if it exists, should map to the one exposed by reqwest, but it does not seem to be the case atm 🤔

@garyanaplan
Copy link
Author

garyanaplan commented Nov 15, 2019

No (although thank you for the tip about insecure-skip-tls-verify: true). I think (and I'm not an expert in this) that it works by distributing certificate authority data. My kubernetes config looks something like:

- cluster:
    certificate-authority-data: <cluster CA>
    server: https://<cluster ip>
  name: <cluster name>

My understanding is that kubectl makes use of the certificate-authority-data (similar to curl --cacert) which is a base64 encoded PEM to authenticate secured conversations to the server. This appears to be the bit that's missing.
I tried some more experimenting and used reqwest to add_root_certificate() a copy of certificate-authority-data, but I still get the untrusted error.

@clux clux added bug Something isn't working config Kube config related help wanted Not immediately prioritised, please help! labels Nov 20, 2019
@clux
Copy link
Member

clux commented Nov 20, 2019

Okay. It sounds like this is incorrectly wired up at the very least. Sorry, I'm not really able to help at the moment.

If you are able to help diagnose / find a fix for it, I'm happy to take PRs for it. I'm not really in possession of a cluster atm to test this out properly.

@garyanaplan
Copy link
Author

garyanaplan commented Nov 21, 2019

No problem. I thought I'd bring it to the attention of a wider audience in case a known solution was out there. It looks like it may be a problem in reqwest, since even adding the root certificate manually to build the connection didn't work.
I'll do more investigating when I have time. If I can figure out a fix, I'll file a PR here (or with reqwest). Either way, I'll aim to close the issue by end of December.

@davidB
Copy link
Contributor

davidB commented Dec 22, 2019

I investigated about this issue, because 2 persons report this issue to my kubectl plugin:

I setup a minimal cluster GKE, and a small reqwest app (I'll shutdown the cluster in few day, if you want to try) : GKE access failed via reqwest

What I found :

  • The issue is with at least GKE (Google) and EKS (AWS)

  • The issue doesn't exist with GKE cluster setup few month ago

  • The issue is only for client on mac OSX (Catalina) and not Linux

  • The issue is with features native-tls and rusttls (restored 2 days ago on reqwest)

  • The issue doesn't exist with kubectl, golang, curl

  • The issue also exist on Safari (after register + trust the root certificate on OS keychain)

    Imgur

I guess the cause is :

  • native-tls & rusttls feature of reqwest seems to delegate part of the certifact check to mac OS security-framework (via hyper-tls)

  • curl & golang doesn't delegate to OS

  • On mac OS X Catalina, some additional rules are applied Requirements for trusted certificates in iOS 13 and macOS 10.15 - Apple Support. At least one of the rules failed

    Additionally, all TLS server certificates issued after July 1, 2019 (as indicated in the NotBefore field of the certificate) must follow these guidelines:
    ...

    • TLS server certificates must have a validity period of 825 days or fewer (as expressed in the NotBefore and NotAfter fields of the certificate).

    because GKE generate too long certifact for API server: 5 years (see the screenshot) > 825 days

I see few (bad) work around (but no quick solution):

  • disable ssl verification of mac OSX ( => lost security)
  • request user to use certificate compliant with Catalina's policies ( => loose of user, I don't know how to do it on GKE)
  • search how to not use security-framework with hyper-tls
  • Other ideas ?

I can work on a PR if we agree about a solution/ work around (working on TLS is out of my skill).

EDIT: I modify my comment (above), because I'm not sur if rusttls use security-framework or if it's a side effect of reqwest, hyper-tls or ???

@davidB
Copy link
Contributor

davidB commented Dec 22, 2019

@davidB
Copy link
Contributor

davidB commented Dec 22, 2019

An other alternatives: use curl (via its crate) instead of reqwest:

--- with curl ---
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/\"","reason":"Forbidden","details":{},"code":403}
--- with reqwest ---
Error: reqwest::Error { kind: Request, url: "https://35.232.6.83/", source: hyper::Error(Connect, Error { code: -67843, message: "The certificate was not trusted." }) }

with a code like

// curl -i -v https://35.232.6.83 --cacert ./cert3.x509.crt
async fn main_with_curl() -> Result<(), Box<dyn std::error::Error>> {
    use curl::easy::Easy;
    use std::io::{stdout, Write};

    // Write the contents of rust-lang.org to stdout
    let mut easy = Easy::new();
    let cacert = std::path::Path::new("ca.x509.crt");
    std::fs::write(cacert, CACERT_PEM)?;
    easy.cainfo(cacert)?;
    easy.url(SERVERAPI_URL)?;
    easy.write_function(|data| {
        stdout().write_all(data).unwrap();
        Ok(data.len())
    })?;
    easy.perform()?;
    Ok(())
}

@clux
Copy link
Member

clux commented Dec 24, 2019

Yeah, we've run into quite a few catalina issues in general, didn't realize this bug was related to that. Appreciate all the digging you've done here @davidB - lots of really helpful info. Have merged your temporary hack, and will try to make a release with it.

@clux
Copy link
Member

clux commented Dec 24, 2019

Ugh, kind of needed a new reqwest version, might leave it in master for seanmonstar/reqwest#740 for now - they are meant to "release this week".

@ahmetb
Copy link

ahmetb commented Dec 30, 2019

FWIW GKE team is aware of the issue, and it's likely to be addressed. However, it's fair to assume that there will be a bunch of GKE clusters around with these certificates deemed as invalid for at least year.

@clux
Copy link
Member

clux commented Dec 31, 2019

FWIW GKE team is aware of the issue, and it's likely to be addressed. However, it's fair to assume that there will be a bunch of GKE clusters around with these certificates deemed as invalid for at least year.

Appreciate that. There are also long cert expires on my EKS clusters so don't think this is a GKE only problem, it sounds primarily like a mac-forcing-everyone-to-change problem.

Have released the hacky fix in kube 0.23.0.

I'm not sure if this helps the original issue with self-signed certificates though..

@clux
Copy link
Member

clux commented Jan 31, 2020

Am considering reverting this hack in the next version of kube for a few reasons:

Will probably just not port the fix to rustls and eventually drop it in native-tls as well.

Mitigations

Suppose you've got teleport auth, or something else that give you k8s certs with too long of a lifetime, you can just tell your system configuration to trust these certs regardless. Here for macos (where the issue was seen):

PORT=3026
DIR=teleport-certs
mkdir -p $DIR

# add your hosts here
HOSTS=(
  teleport-x.somehost
)
for HOST in ${HOSTS[@]}; do
  echo -n | openssl s_client -connect $HOST:$PORT \
      | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > $DIR/$HOST.crt
  echo "Adding $HOST certificate to keychain"
  security add-trusted-cert -p ssl -r trustAsRoot -k ~/Library/Keychains/login.keychain-db $DIR/$HOST.crt
done

adopt as necessary. yq on ~/.kube/config can probably also do the magic more generally. This was just a script that was floating around at $DAYJOB.

@davidB
Copy link
Contributor

davidB commented Feb 1, 2020

iirc, I tried the mitigation "trust the root certificate into the keychain store" and it didn't work (like the screenshot about safari at #89 (comment))

@clux
Copy link
Member

clux commented Oct 22, 2021

It's been close to two years since we put in the hacky fix. The original hack was to allow long certs (of lifetime > 825 days) on mac because of kubernetes clusters in the wild likely still using those certs.

GKE comment suggested that we wait at least a year. Possibly it's worth waiting longer, but it's also possible for users to set Config::accept_invalid_certs themselves on mac to avoid this (although that would push the difficulty onto the CLI ecosystem).

Thinking that we potentially draw a line in the sand here and suggest that we remove this hack at the end of 2022, unless there are compelling reasons why we might still keep it (i'll post a reminder in 3months before if there are no reasons against).

@clux clux changed the title My cluster isn't trusted Mac: My cluster isn't trusted Oct 22, 2021
@kazk kazk added the macos macos specific issues label Nov 5, 2021
@aviramha
Copy link
Contributor

Hey, we seem to have encountered this issue on our end. Would you accept a PR adding support for respecting certificate-authority attribute of the config file?

@clux
Copy link
Member

clux commented May 10, 2022

Hey @aviramha , we'd definitely be interested to see what you are trying. We parse into this part of the kubeconfig, if you are trying to fix for mac cert handling on rustls, go for it.

I'd be somewhat surprised if this is the same issue though, because this was connected to the lifetime of the certs being deemed invalid by macos, and we only have a hack for it for openssl due to long lived certs. If you're in a new system it might be another issue. Feel free to raise a more detailed issue if it's not.

@aviramha
Copy link
Contributor

Oh! You are correct. I thought that property is just being ignored (as that's the behavior we see when using kube rs + rust tls + Ubuntu).
We will debug it further and try to fix it and send a PR (if it is on kube rs end)
Thanks!

@clux clux added this to the 0.77.0 milestone Sep 29, 2022
@clux
Copy link
Member

clux commented Sep 29, 2022

It's been close to two years since we put in the hacky fix. The original hack was to allow long certs (of lifetime > 825 days) on mac because of kubernetes clusters in the wild likely still using those certs.

GKE comment suggested that we wait at least a year. Possibly it's worth waiting longer, but it's also possible for users to set Config::accept_invalid_certs themselves on mac to avoid this (although that would push the difficulty onto the CLI ecosystem).

Thinking that we potentially draw a line in the sand here and suggest that we remove this hack at the end of 2022, unless there are compelling reasons why we might still keep it (i'll post a reminder in 3months before if there are no reasons against).

Think it's been in long enough at this point, adding this to milestone for removal in 0.77.

@kazk
Copy link
Member

kazk commented Sep 29, 2022

// temporary catalina hack for openssl only
#[cfg(all(target_os = "macos", feature = "native-tls"))]
fn hacky_cert_lifetime_for_macos(ca: &[u8]) -> bool {
use openssl::x509::X509;
let ca = X509::from_der(ca).expect("valid der is a der");
ca.not_before()
.diff(ca.not_after())
.map(|d| d.days.abs() > 824)
.unwrap_or(false)
}

macOS with native-tls has #691 as well.

We should probably just remove native-tls as we discussed in #863 (comment)

@clux clux modified the milestones: 0.77.0, 0.76.0 Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working config Kube config related help wanted Not immediately prioritised, please help! macos macos specific issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants