Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puma listen thread can fail with infinite loop. #2699

Closed
ioquatix opened this issue Sep 15, 2021 · 6 comments · Fixed by #2700
Closed

Puma listen thread can fail with infinite loop. #2699

ioquatix opened this issue Sep 15, 2021 · 6 comments · Fixed by #2700

Comments

@ioquatix
Copy link
Contributor

The following middleware can cause Puma to enter an infinite loop:

require 'objspace'

run lambda{|env|
	ios = ObjectSpace.each_object(IO).to_a.select{|io| io.is_a?(TCPServer)}
	
	ios.each(&:close)
	
	[200, [], [ios.inspect]]
}

This probably seems odd, but we are actually seeing some variant of this in a production system. We don't know who is calling close (suspect threading bug) but it causes an EBADF error in the list loop, followed by infinite IOErrors:

> puma -w4 -t4
[95960] Puma starting in cluster mode...
[95960] * Puma version: 5.4.0 (ruby 3.0.2-p107) ("Super Flight")
[95960] *  Min threads: 4
[95960] *  Max threads: 4
[95960] *  Environment: development
[95960] *   Master PID: 95960
[95960] *      Workers: 4
[95960] *     Restarts: (✔) hot (✔) phased
[95960] * Listening on http://0.0.0.0:9292
[95960] Use Ctrl-C to stop
[95960] - Worker 1 (PID: 95962) booted in 0.02s, phase: 0
[95960] - Worker 0 (PID: 95961) booted in 0.02s, phase: 0
[95960] - Worker 2 (PID: 95963) booted in 0.02s, phase: 0
[95960] - Worker 3 (PID: 95964) booted in 0.02s, phase: 0
2021-09-15 20:56:09 +1200 Listen loop: #<Errno::EBADF: Bad file descriptor>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>
2021-09-15 20:56:09 +1200 Listen loop: #<IOError: closed stream>

At the very least, we should probably handle this more gracefully (i.e. exit the loop as soon as the listening socket becomes invalid for any reason, rather than trying to select/accept again).

@ioquatix
Copy link
Contributor Author

I just whipped up a quick PR to try and fix this, but we should probably write some tests too.

@MSP-Greg
Copy link
Member

Need more coffee, I'll see about a test... Thanks.

@MSP-Greg
Copy link
Member

@ioquatix

I'm just seeing the infinite loop when Puma is shutdown. Do you see the same? Using a rackup of:

require 'objspace'

run lambda { |env|
  ios = ObjectSpace.each_object(::TCPServer).to_a.tap { |a| a.each(&:close) }
  [200, [], [ios.inspect]]
}

curl connects the first time, and then freezes. Or, the first connection deletes the listener sockets, but Puma keeps running, and only detects the closure when it's shutdown. Trying to see if there's a simple way to detect the closure while it's running...

@ioquatix
Copy link
Contributor Author

I'm not sure the exact sequence of events but it might depend on how puma is started.

@MSP-Greg
Copy link
Member

I started the server, I think using both single (no workers) and clustered (workers), and did requests with curl. The first request closed the listener sockets, the second hung, as there was nothing to accept the connection. When I stopped the server with Ctrl-C, I then got the loop.

I'd like to know how/why the production system shut the listener socket...

Regardless, I think we may need another thread to check whether any listener sockets have closed, then a means of re-opening them, which needs to work a bit differently for single and clustered.

Or, since it's unlikely that Puma closed the listeners, we consider it an external issue. Don't know. Then again, I have stated something like 'nothing should stop Puma' in the past...

@ioquatix
Copy link
Contributor Author

I'd like to know how/why the production system shut the listener socket...

We suspect thread corruption within a C extension but we don't know for sure yet. It's just one theory. Another is Ruby code doing something wonky with IOs.

I don't mind if "nothing stops puma" but my gut feeling is we should be very careful about having a catch all exception handler in a non-terminating loop. We can try to solve both problems, but my main concern is the latter.

ioquatix added a commit to ioquatix/puma that referenced this issue Sep 16, 2021
MSP-Greg pushed a commit to ioquatix/puma that referenced this issue Sep 16, 2021
nateberkopec pushed a commit that referenced this issue Sep 16, 2021
* More elaborate exception handling. Fixes #2699.

* Add TestIntegrationSingle#test_closed_listener

Co-authored-by: MSP-Greg <Greg.mpls@gmail.com>
JuanitoFatas pushed a commit to JuanitoFatas/puma that referenced this issue Sep 9, 2022
* More elaborate exception handling. Fixes puma#2699.

* Add TestIntegrationSingle#test_closed_listener

Co-authored-by: MSP-Greg <Greg.mpls@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants