New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in Caddy HTTP/2 support #4017
Comments
How can this be reproduced minimally? What's the smallest possible config you can make it happen with, and I suppose along with an HTML file that triggers the requests that exhibit the behaviour? |
I really am not sure how. I only see it in our test suite where it does happen quite consistently. The test is loading a page (via selenium and headless chrome) - that page is loading some JS from a different site (that in our test set up has the same ip as the page). That's when it is happening. However, if I load the page in Chrome (same version) directly - it works fine. I really can not explain that... (Though as I'm writing this - I'm realizing that the test suite runs on the same server, where when I do it manually I'm coming over VPN...) The site the page is coming from and the site the JS widget come from are using the same IP address. And both using the same wildcard cert. However, the domain names are different. Lastly, as I understand it - this can only happen under Chrome or Firefox. (Safari and Edge do not do connection coalescing.) |
I think the fix is that this line needs to change to return status 421 instead of 403 caddy/modules/caddyhttp/server.go Line 292 in 792fca4
Or at least it should if the connection is doing HTTP/2. However, in reading about the code it seems fine to apply it to http/1.1. Basically, it is saying you are passing mismatched data and we can not process the request. Which is really what the strictHostChecking is doing! On the other hand, it "could" have backward compatibility issues? Maybe? IDK |
The only way that line could be hit is if you have We need a lot more evidence here before we can make any change. If we can't replicate it, we can't fix it. Feel free to make a custom build of Caddy with that change to see if it has any effect, to align with your hypothesis. But we'll still need to have proof that this change even makes sense, because without your config and a way to reproduce, we'd only be making a blind change, and that's not good. |
I DO NOT have that setting set anywhere!!! However, looking through
Caddy’s code it will turn that setting on for you in certain situations.
…On Sat, Feb 13, 2021 at 2:54 PM Francis Lavoie ***@***.***> wrote:
I think the fix is that this line needs to change to return status 421
instead of 403
The only way that line could be hit is if you have strict_sni_host
enabled. You didn't show that you did in your config, so I don't think
that's right.
We need a lot more evidence here before we can make any change. If we
can't replicate it, we can't fix it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4017 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKE5LWVBO27HDUVDDU243A3S637J5ANCNFSM4XR2QQOQ>
.
|
It only gets enabled implicitly if you also have TLS client auth enabled. Is that the case? Again - please provide us with your config! caddy/modules/caddyhttp/app.go Line 179 in 792fca4
|
@rayjlinden Please cooperate with us :) We want to fix something if it is broken. But we can't help you unless you help us help you. We will need your config. |
I do have client auth - but not on the domains having this problem. Here is the full config. It is quite large:
|
Okay so you do have We'll still need to understand how to replicate the issue though. Please try to pare down your config to the bare minimum (one host, as little functionality as possible) while being able to exhibit the behaviour you're describing. Edit: oh - you set your default loggers level to WARN. Yeah don't do that, INFO level messages are very important to see at startup. |
I do not have an easy way to reproduce the issue. The issue is certainly not with the config. It can only happen when using chrome (and firefox) in certain situations. Here is an article that explains how this happens: Sorry - I have been trying to figure this out for about 3 weeks now. It has me horrible behind on stuff we need to deliver. I really do not have the time to figure out how to get chrome to do this out side of our set up. (I have already tried a BUNCH of ideas...) Now that I know what is happening - I have a way to work around it. Which turns out to be simple - don't use http/2. I just disabled http2 on the headless chrome browser in our test suite and everything started working. |
BTW, thank you for your help on this. The bell really went off when you mentioned the spot where caddy could throw a 403.. I also did not know client auth set strict_host_matching for the whole server. So now it makes complete sense to me exactly how it happens. I really am extremely grateful for all of your help! |
We still want to find a fix though. A workaround is only useful for you. I think you're probably correct that we should respond with Can you try building Caddy with that change? All you need to do is download |
Ok - I did that. It works as expected. Here is a snippet of the access log (well, filtered to be viewable...):
As you can see after the 421 is returned a new connection is made and it gets a 200. |
We asked around on Twitter to see what some smart people would say: https://twitter.com/mholt6/status/1361460752675078144 (multiple threads to click through) |
Interesting. Would 50+ domains using a wildcard cert pointing to the same IP address be a "badly configured setup"? For production - probably! (Though I have seen something close for "partner sites" that give a sub-domain automatically when you join up.) For a test environment trying to work around the limitations of LE throttling - it does not seem so "badly configured" to me. Would love to hear any suggestions to alternatives that are reasonable. BTW, to be clear - this NEVER happens on an initial connection. It happens when existing connection is reused. Chrome determines that this IP already hosted a connection for the next request (though different domain) and reuses an exiting open connection to it. It is making an assumption that the SNI (which is already established and can not be changed) would be valid since you are severing both domains from the same IP. Now one could ask if the coalescing functionality of Chrome and Firefox is a bit too aggressive. Sure seems like it. But are you going to not support those browsers? Or in the light of browser coalescing basically putting another request on a reused connection with the wrong "cert". Perhaps the strict SNI checking needs to be less strict. If it instead allowed SNIs that also are configured to use the same IP address - that might be fine. Of course, this is only happening to me because strict SNI checking is implicitly turned on if you use client auth. But that is a server wide thing. The domains in question for me are not even using client auth. I can understand a argument to not issue a 421. However, I'd say two things about that:
|
Or another way to put it. A 421 is the appropriate response to a "badly configured" site. Test environments are sometimes exactly that - badly configured because they are shoving many sites on to fewer machines. A 421 is the appropriate response. Chrome and FireFox are doing aggressive coalescence of connections to counter sharding. There is a thin line between them being too aggressive against sharding and them breaking the intent of the configured web sites. A 421 response tells them when they have been too aggressive. It allows them to instead not reuse that open connection and instead open a new one. Returning a 421 tells the end user (chrome or anyone else) that the request you made is not cool. It is for resources that do not match (sni and domain). Either that is a config errors, and badly defined site., or telling Firefox and Chrome they were too aggressive in their anti-sharing and need to instead open a new connection. |
Maybe "badly configured" was a bit strong since I'm guessing the repeated quotes around that terms seems to have caused offence. But I'd argue by basically offering the ability to connection coalesce (by sharing IPs and Certs and supporting HTTP/2) when you specifically don't want it to happen, is inviting trouble. Maybe not specifically asking for trouble, but certainly leaving the door open for it to happen. It would be better to host this domain that requires client certificates on a separate IP or a separate certificate to avoid this even being an issue. Saying that, I do agree with you that 421 is the appropriate response here. The client has asked for a request under the assumption an existing connection can be used, and 421 is the response to say "you made an invalid assumption about that, please try again on a different connection". That's entirely what 421 was created for. Just be aware that support might not be fully there as it's not the most well-used feature and there are other, perhaps more robust, work-arounds as mentioned above. Also, since client certs are not supported over HTTP/2 the ALPN for that new connection should not offer |
Heh. I'm certainly not offended. Sorry for my overuse of the quotes. :) Was only trying to point out that what might be a perfectly reasonable set up in a test environment certainly might not be wise in a production environment. I think Caddy wants/needs to support both. The root problem in Caddy is how the strict_host_matching feature gets configured.
OMG - I was about to write a third bullet saying there is no way to turn it off. And I was like - wait is that true? Turns out it is not true - Caddy only turns on strict_sni_host implicitly if you have not set it in your config. So I set strict_sni_host = false - and everything works! Jeez I wish I had thought of this sooner... BTW, I do think caddy should still throw a 421 in this case. As Barry points out it can not be counted on that a browser will do anything with a 421. However, what I know for sure - is the browser will NOT retry the connection if you throw a 403 - but the major browsers that do connection coalescing right now do support handling a 421. |
I think the main issue is that
Problem is, Caddy has no way to know if a connection reuse is happening because the H2 implementation is in the Go stdlib and doesn't expose that. So "technically" Caddy only knows about the first case. That's why making this change is a bit 🤔 |
I'm taking the day off today, but just so you know I have been following this thread and doing research for the past few days. My conclusion so far is that our 403 is not wrong because it is specifically an authorization error, and that the underlying http2 library needs to use 421 since it deals with connections. It seems like the proper fix would be in the Go standard library's h2 implementation. That it's encountering our code is a coincodence, and connection coalescing needs to be handled by the h2 implementation code. |
I was looking at a thread on the Golang site and some folks were arguing
the 421 should be handled by the "application code". Not sure how that
would work if there is no way to know you are in an HTTP/2 connection...
If the enforcementHandler knew which ConnectionPolicy was used to initiate
the connection it could return 403 if it had a client cert, otherwise
return 421.
But I don't see an easy way for it to know that...
…On Tue, Feb 16, 2021 at 11:12 AM Francis Lavoie ***@***.***> wrote:
I think the main issue is that strict_sni_host code is being hit for 2
different reasons:
- When the SNI doesn't match the Host header, it should return 403
because it's forbidden to make a request with a mismatched host
- When H2 connection reuse happens and a subsequent request's Host
header doesn't match the SNI in the original TLS handshake, it should
return 421
Problem is, Caddy has no way to know if a connection reuse is happening
because the H2 implementation is in the Go stdlib and doesn't expose that.
So "technically" Caddy only knows about the first case. That's why making
this change is a bit 🤔
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4017 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKE5LWXCF6OPFS4WHCHMMHDS7K7RPANCNFSM4XR2QQOQ>
.
|
We do know when we're in HTTP/2, but we don't know when we're in a reused connection, i.e. a 2nd request in the same connection... I think. |
I think all we know is the overall state of the connection. https://golang.org/pkg/net/http/#ConnState |
Well a client cert can't ever be used in an HTTP/2 connection. So if you
just return 403 if not HTTP/2 and 421 if you are - wouldn't that work?
…On Tue, Feb 16, 2021 at 12:20 PM Francis Lavoie ***@***.***> wrote:
Not sure how that would work if there is no way to know you are in an
HTTP/2 connection
We *do* know when we're in HTTP/2, but we *don't* know when we're in a *reused
connection*, i.e. a 2nd request in the same connection.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4017 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKE5LWUZXUHYMWLFLHI4YV3S7LHPJANCNFSM4XR2QQOQ>
.
|
Hi there, just wanted to hop in and say that I just ran into the same issue. My setup uses client_auth for a few domains, which causes SNI host enforcement to be enabled.
With this, I was able to open Would be cool if someone could look into this and pick this issue back up, since it still seems to be relevant. |
Ok, good to know. The consensus from experts (i.e. not-myself) seems to be that officially, HTTP/2 doesn't support client certs at all (so for example, Apache disables HTTP/2 for servers with client auth enabled), but that there also isn't a good, official answer to the question as of now. The document was dropped out of lack of interest especially on the client-side: https://lists.w3.org/Archives/Public/ietf-http-wg/2020JulSep/0145.html (ref. https://datatracker.ietf.org/doc/draft-ietf-httpbis-http2-secondary-certs/). I could be wrong, but I don't think disabling HTTP/2 entirely is necessary, especially with our enforcement of Host names. I closed the PR due to lack of consensus but I don't think there will ever be consensus, honestly. At least not in the foreseeable future. There's no movement on it right now. So I'll just go ahead and merge it and see if it breaks anything, I guess. |
Potential fix for #4017 although the consensus is unclear. Made change to return status code 421 instead of 403 when StrictSNIHost matching is on.
Wow, we were hitting this very infrequently and scratching our heads because this was the only place that could have returned Thank you very much @rayjlinden for the investigation/fix and @mholt for merging this. We'll followup once we can get this version deployed. |
Please see this wiki thread: https://caddy.community/t/caddy-with-reverse-proxy-returns-unexpected-403/11477
In HTTP/2 some browsers like Firefox and Chrome implement connection sharing with http/2. So the browser rather than opening multiple connections for multiple requests may open one connection and send multiple requests through to it.
However, if you are proving multiple sites on the same proxy that a connection may come to the proxy server that shouldn't.
In my case, I had multiple sites that all shared a wildcard cert. A request to site A is opened and a page is read. It is then requesting a JS blog from a different site B. Both are served from the same Proxy server but have different virtual names with the the same wildcard cert.
The problem is Caddy in the request the tls.server_name does not match the virtual host name. (The tls connection is reused.). Caddy in such a case returns a 403. However, it should instead respond with a 421. This is part of the HTTP/2 spec and when the browser gets the 421 it simply opens a new connection and everything would fine.
However, with Caddy returning a 403 in such cases - the Browser thinks it just a normal 403 and shit breaks. In very weird ways...
The text was updated successfully, but these errors were encountered: