New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Puma 5.0.0 HTTP connection error: #<Puma::ConnectionError: Connection error detected during read> #2371
Comments
I see this too, on my app I just updated earlier today. It too runs on Heroku. I see many ConnectionError lines, not only directly after server start. Yeah, the app seems to work (it does not have a lot of traffic). |
This might be that more logging was added in 5.0.0 and that Puma has always operated in this way. |
pinging @schneems just because he may have more visibility here, possibly also related to the H13 issue |
I deployed Puma 5 into a production service yesterday and have been noticing increased timeouts. Matches exactly this issue so seems more than just logging. |
Getting this error on as well. Doing simple puma 5:
If I can assist troubleshooting this, let me know :) edit:
the root error is from |
Sounds like root cause is adb6170 then? |
@nateberkopec did manual revert of this commit (replace IO::WaitReadable with Errno::EAGAIN in my local installed version of puma) and still getting those timeouts (and same logs as above):
edit:
no timeouts, but in console: edit2:
|
|
If you can give me any guidance on how to give better visibility what's going on I'll be happy to help. |
@schneems I can see it as well after deploying to heroku, but I didn't reproduce it locally yet. |
@schneems , I can see it at my app on Heroku - https://dashboard.heroku.com/apps/florsan, but can't reproduce it locally. |
I'm trying to narrow down the scope of the problem. It seems the behavior is correlated to something since it's seen by some only on Heroku and some only on a certain type of app.
If we cannot reproduce this reliably then it will be hard to debug. The other possibility would be to try to bisect the git commits until we find one where it works with the one before it and fails on it. If you can reproduce safely via a staging app that would be fore the best. |
@schneems I created a ticket for my app, https://help.heroku.com/tickets/920245, if it can help in any way I can also reproduce(-ish) it locally in macOS. It happens more often on Heroku than I see locally, but I haven't tried replicating the same traffic locally (but the production app does not have a heavy load). The code for my app is public: https://github.com/Starkast/wikimum The steps I did in macOS to start the app (using Starkast/wikimum@0bc5933)
After Puma has booted, and when the There are no timeouts from
|
I see the same thing as @dentarg (one I also just used the same repo with Ruby 2.5.7 and ran the same test and see the same behavior. |
I used the above minimal repo to manually bisect which puma commit between v4.3.6 and v5.0.0 first results in the error being output in a local environment with
I've updated the Since the error is probably still happening before this commit, and just isn't logged, it would probably be trickier to bisect that. |
I confirm that I'm seeing a failure with the repro app
# works
gem 'puma', git: 'https://github.com/puma/puma.git', ref: 'c606471b5ae436a165ee2f083a35635c90a29b93'
# fails
gem 'puma', git: 'https://github.com/puma/puma.git', ref: '265f638d2beb9ee60d1af5e5c03ae876b0c3e4d8' For convenience here are commits in a historical view that contains both of those: https://github.com/puma/puma/commits/master?after=7d00e1dc3c71b77ea171f503e6c9be24836314eb+174&branch=master cc @alexeevit any idea what might be at play here? |
Seems the error was there just swallowed by the server, and the logging refactoring revealed it - alexeevit@2d7ff5d |
This is stream of concious, so bare with me a bit as I semi-live debug this. I'm not able to reproduce this with FWIW the error that's coming from: data = @io.read_nonblock(CHUNK_SIZE) Is Okay actually as i'm typing this it looks like I can repro with
Then when I run netcat without a header or body I get the same error:
Which made me think that maybe
This didn't change anything, I still saw the errors in Rails. I added some logging and saw that
So I tried that with netcat:
And i'm still not able to see the same problem with netcat. So i'm back at square one. I did realize though that when you run
When I bump it up to "3" threads I still only get 2 errors, so maybe it's not related. But it is interesting that it's always 2 errors from wrk for me, not more...not fewer. So my current theory is still something along the lines of wrk is opening a socket and puma is trying to read it but there's no data yet. I'm not sure how to test this theory. I could try to manually make socket objects in Ruby...maybe? The other alternative is to try to figure out why the same behavior doesn't exist in puma 4.x. |
If I switch to puma 4.3.6 and add the same logging I see an empty request being read and an EOFError:
So it's the same error (and two of them) but it's just not being logged. AFAIK it looks like this behavior is reasonable. Or at least it's not causing problems. Several of yall have said you've said stuff like
If that's the case, then either this TLDR; From my investigation it looks like the behavior of puma hasn't changed here between 4.x and 5.x, but logging did. |
I'm going to deploy puma 5.0.0 and see if I can get a reproduction, it's quite possible based on the description above that it was environmental in my case and was resolved by a rollback. I should know more tomorrow, and can report back. I know other services that have upgraded to puma 5.0.0 without issue. |
Yup, it seems like logging behavior was changed. The commit you've mentioned does not really reflect the changes I've made. That's better to take a look at the whole PR: https://github.com/puma/puma/pull/2250/files I'm also looking for the root of the problem. |
I'm okay with that. But I would feel better if we knew why If people are seeing this in prod it makes me wonder what the conditions that cause it are. Are browsers or people sending empty requests or is something else going on? There might be a case here that we should be handling different. If someone wants to go through wrk to try to figure out why the request is getting an EOF, it's open source (though written in C). |
@schneems It is not only |
I noted this behavior change introduced by #2250 as a potential regression in #2279 (comment). There are three code blocks that catch Lines 229 to 234 in 3cb00a6
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
If you have a different issue than the connection problem in the OP, please open a new issue. Let's not turn this into a "all 5.0 issues" megathread. |
I'm sorry about that. I'm having the "Puma :: ConnectionError: Connection error detected during read" error and I thought the suddenly increase of memory was related to it. Sorry again. |
No problem - it is definitely not related though. Open a new issue and we can get to it. |
A co-worker recommends checking out a tool called |
Looks like #959 contains the original reason this was ignored. |
The EOF can be triggered by opening a connection, not writing to it, then closing the connection. When this happens we can detect if an error was logged or not by inspecting the stderr string. This test fails
This is a continuation from #2382. One of the ways that an EOFError can be triggered is by opening a connection, not writing to it, then closing it (there may be others). This should be an expected and non-exceptional activity. When it happens we should not log it. This commit gets the test in the prior commit to pass. The change to client.rb makes sense to me as this is the initial place where we're reading from the socket and then finding out the stream has been closed. Im not quite sure why the code in server.rb is needed. I think this comes from when the server is shutting down and trying to finish out connections. I don't think this is the 100% right way to do things. I'm guessing that if we get an EOFError on a connection we should somehow consider it "dead" and never try to read from it again. I don't know if there's a better way to signify this in the `try_to_finish` method of client.rb
This is a continuation from #2382. One of the ways that an EOFError can be triggered is by opening a connection, not writing to it, then closing it (there may be others). This should be an expected and non-exceptional activity. When it happens we should not log it. This commit gets the test in the prior commit to pass. The change to client.rb makes sense to me as this is the initial place where we're reading from the socket and then finding out the stream has been closed. Im not quite sure why the code in server.rb is needed. I think this comes from when the server is shutting down and trying to finish out connections. I don't think this is the 100% right way to do things. I'm guessing that if we get an EOFError on a connection we should somehow consider it "dead" and never try to read from it again. I don't know if there's a better way to signify this in the `try_to_finish` method of client.rb
I was able to write a failing test that I believe reproduces the initial failure logging mode then I was able to fix it in #2384. Please take a look. I mention that one of my rescues is sub-optimal and there might be a way to do it. If someone else is more familiar with client.rb/server.rb and wants to take a look, I would appreciate it. Otherwise I think this will work for the short term. |
I started a puma server When |
Ok this looks like it might be the relevant code in https://github.com/wg/wrk/blob/0896020a2a28b84b1150e9b60933b746fe1dc761/src/wrk.c#L96-L100 That, I think calls the lua function https://github.com/wg/wrk/blob/7594a95186ebdfa7cb35477a8a811f84e2a31b62/src/wrk.lua#L12-L20 https://github.com/wg/wrk/blob/7594a95186ebdfa7cb35477a8a811f84e2a31b62/src/script.c#L466-L475 I think wrk just creates an initial TCP connection to make sure it can connect the the server before it begins its main testing phase. If that first connection fails, it'll print something out to the console and exit. I imagine Heroku is doing something similar to make sure your app is up and ready to accept connections, but I haven't come across documentation that makes that explicit. Edit: Heroku tries to ensure that you application binds to the expected In short, #2384 seems like the appropriate solution. Puma already correctly detects when a connection is established and immediately closed. Puma also already closes down the connection gracefully from a TCP perspective. This behavior is not really exceptional and users don't need to be notified. |
5.0.2 is out with the fix update ur deps! |
@cjlarose great investigation! Thanks! I was able to reproduce the behavior in a test by opening a socket, then closing it without writing. I spent about 30 minutes to an hour playing with mitmproxy but couldn't get it to work how I wanted and moved on. That's interesting wireshark output. |
After upgrading to
puma 5.0.0
I am seeingHTTP connection error: #<Puma::ConnectionError: Connection error detected during read>
error in server logs directly after server starts. This happens just onHeroku
. Regardless that app is working without any issues...The text was updated successfully, but these errors were encountered: