New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Istio mTLS fails after Prometheus 2.20.1 #9068
Comments
As you've found, the v2.21.0 changelog only contains two changes that could affect this, the move to Go 1.15 (and as a result the depreciation of I wonder what would happen if you removed Line 378 in 6a055f1
Line 271 in 6a055f1
|
It's a great question, but I don't have a Go environment set up at the moment. I'll have to set something up and give it a try, though I'm not sure how quickly that'll happen. I'll do my best. |
Ah no worries - if you asked me about something .NET I would be in the same boat. I would offer to build the image for you but security. |
HTTP2 was removed because of multiple bugs causing harm to our users. I see that at least one of them is still open: golang/go#32388 . |
Yes, but possibly in this case not having it is also causing inconvenience. Maybe we should consider adding an optional flag as an experimental feature. |
Yeah I will revive prometheus/common#286 |
Working on building tag The
I'll see if I can continue troubleshooting the build, but that's where I am as far as trying to get a custom version up and running and tested. |
Seems to be something about building
I guess I'll see if I can get a Linux VM on Azure or something to try building with. |
So sorry about this, I've hit similar build issues before on Mac. I can get a Linux binary built through the project's CI system, and then hopefully it will be a simpler process to just add it to a docker image. |
OK, it took a bit but I was able to build a custom Prometheus container using a Linux VM. I built based on the current
I then deployed my custom container and I still get I backed it out and pushed official container v2.20.1 in just in case and, sure enough, it started working again. Maybe those two lines aren't enough to turn HTTP/2 back on... or maybe it's something else? |
As long as there error message is still coming from the same line ( Besides that, I don't really see anything else that could have affected it on the changelog, except maybe the move to Go 1.15. If it's not too much trouble, maybe try building with Go 1.14? In |
Actually maybe not, building with |
Yup, the error is exactly the same line - If necessary, I can provide some scripts to help deploy Istio and Prometheus in a configuration for testing. |
That would be great, thanks! FROM golang:1.14
WORKDIR /go/src/prometheus
COPY . .
RUN go get -v ./...
RUN go install -v ./...
ENTRYPOINT ["./prometheus"]
CMD [ "--config.file=/etc/prometheus/prometheus.yml", \
"--storage.tsdb.path=/prometheus", \
"--web.console.libraries=/usr/share/prometheus/console_libraries", \
"--web.console.templates=/usr/share/prometheus/consoles" ] I did just write up this Dockerfile for building on Go 1.14 that does work, so that might be worth a shot. |
I created a repo over here with some scripts and config to set up a barebones cluster with just Istio and Prometheus to demonstrate the issue. The only thing I didn't provide was a test app that you can configure to scrape; if I need to make something I can, but I figured you likely have something. Let me know if something doesn't work; I set up a whole fresh cluster and verified it, but I'm on Mac so there may be little bash-isms or something that I got wrong. |
Thank you so much, that looks great! |
I'm changing my mind and now I think HTTP/2 enabling does fix it. In making that repro, I created a whole new cluster with just Istio and Prometheus. Deploy v2.20.1 - scrapes fine. v2.28.1 - fails. Deploy my main-with-http2 container that I'd built in my VM... it works. Current hypothesis is that the
|
That's a relief, glad it's working for you now. For the time being, I guess you could use that custom container, it should be stable enough... maybe? You may want to make the same change on the @roidelapluie and I have been talking about ways to re-enable HTTP/2 and hopefully it will be out in the next release. |
Your app is in the |
Ahhhhhhhhh 🤦🏻♂️ I had changed the namespace in every single config except the one that worked.
|
I can confirm that enabling HTTP/2 does resolve the problem. Thanks for the easy setup! |
Anytime! I'm glad I could help figure it out... and that it wasn't just me! 😆 Thanks for taking the time to look into it, I really do appreciate it. |
It was fun! I didn't know anything about Istio until now. |
To go to the bottom of this, @LeviHarrison could you provide some tcpdumps with and without HTTP2 in prometheus? Thanks! |
Here are two with and without HTTP/2. Hopefully, they have all the information needed. If not please let me know. |
Thanks for investigating this issue. I'm currently also on the topic of enabling mTLS for our monitoring stack (kube-prometheus). I was wondering if you focus only on achieving mTLS while scraping with Prometheus, or did you also manage to get mTLS between Prometheus and AlertManager as well? |
I haven't got mTLS with anything working at the moment. This would be a question for the Prometheus team. Also, I'm not using Alertmanager so when I do get things working I still won't have the answer. |
@tillig any updates on this? |
@sakajunquality Not sure why I'd have any updates on it - I researched it but I'm not doing the coding. |
This will work in Prometheus 2.31 without magic. |
We are re-enabling HTTP 2 again. There has been a few bugfixes upstream in go, and we have also enabled ReadIdleTimeout. Fix prometheus#7588 Fix prometheus#9068 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We are re-enabling HTTP 2 again. There has been a few bugfixes upstream in go, and we have also enabled ReadIdleTimeout. Fix prometheus#7588 Fix prometheus#9068 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@tillig I face the same issue, when in a namespace with label |
I have not revisited scraping metrics using mTLS yet. However, Istio introduced metrics merging to solve some of this, I'd recommend checking it out. https://istio.io/latest/docs/ops/integrations/prometheus/#option-1-metrics-merging |
We are releasing Prometheus 2.31.0-rc.0 today that will fix the issues with istio |
sure will check out. |
Prometheus 2.31 is released and it should work directly here. |
Thanks will install this |
What did you do?
Using Istio 1.6.14 I am mounting the Istio sidecar manually without proxying any traffic so I can access the Istio mTLS certificates. I have a scrape configuration set up to use those certificates to scrape endpoints that have Istio sidecars.
Under Prometheus v2.20.1 this works perfectly. Under Prometheus v2.21.0 and above it fails with "connection reset by peer."
You can follow along on my troubleshooting attempt in the newsgroup but I've reached a point where I can't figure it out and I think there's a bug in here somewhere.
What did you expect to see?
I expected v 2.28.0 to continue scraping Istio pods just like v2.20.1 did, using the same scrape configuration and the same certificates.
What did you see instead? Under which circumstances?
In versions 2.21.0 through 2.28.0 any endpoint using Istio mTLS fails to be scraped with the message "connection reset by peer." Here's the debug log message under v2.28.0:
level=debug ts=2021-07-06T20:58:32.984Z caller=scrape.go:1236 component="scrape manager" scrape_pool=kubernetes-pods-istio-secure target=https://10.244.3.10:9102/metrics msg="Scrape failed" err="Get \"https://10.244.3.10:9102/metrics\": read tcp 10.244.4.89:36666->10.244.3.10:9102: read: connection reset by peer"
Environment
Linux 5.4.0-1047-azure x86_64
The relevant scrape job is here. The certificates are mounted at
/etc/istio-certs
. I have validated that the certificate files are there and properly mounted.Additional context / things I've tried:
I noticed in v2.21.0 that several things changed, and I'm not sure if any of them affect this issue.
I have tried setting
GODEBUG=x509ignoreCN=0
on the pod to see if it's the Go certificate common name handling that was causing the issue. It didn't help.I've verified that v2.20.1 is definitely working and none of the versions above that work. I've tried them all.
I've created a different container with both
curl
andopenssl
in them and mounted the certificates there just to make sure it wasn't a weird mounting problem. Bothcurl
andopenssl
work.curl https://10.244.3.10:9102/metrics --cacert /etc/istio-certs/root-cert.pem --cert /etc/istio-certs/cert-chain.pem --key /etc/istio-certs/key.pem --insecure openssl s_client -connect 10.244.3.10:9102 -cert /etc/istio-certs/cert-chain.pem -key /etc/istio-certs/key.pem -CAfile /etc/istio-certs/root-cert.pem -alpn "istio"
I noticed
openssl
doesn't work unless you set thatalpn
flag. I saw #6910 and thought this may be related, but I'm unsure. The fix for that one says it'll be out in 2.19.0 but that hasn't been released yet.Relevant
curl
output:The text was updated successfully, but these errors were encountered: