Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with passenger holding on to connection indefinitely with no outside http connections holding #2460

Open
davidhatten opened this issue Dec 13, 2022 · 1 comment

Comments

@davidhatten
Copy link

Please follow the steps below if you have found an issue with Passenger, or would like to request a feature.

Post to Stack Overflow instead if you are not sure, or if you have a question about your configuration.


Issue report

Fill in as much as possible so that we can understand, find and fix the problem.

Are you sure this is a bug in Passenger?
Please read our troubleshooting and debugging guides first:
https://www.phusionpassenger.com/library/admin/troubleshooting/
https://www.phusionpassenger.com/library/admin/

Please try with the newest version of Passenger to avoid issues that have already been fixed

Question 1: What is the problem?

  • What is the expected behavior?
    -- active_client connections not persisting endlessly
  • What is the actual behavior?
    -- with low memory: linux kills passenger processes, sometimes pointing to a utils.rb#110 file
    -- with high memory:active_client connections persist for over a day. Processes hit max memory that they can. Pools hit max saturation. Passenger begins queueing requests internally and the app host becomes unresponsive to http traffic.
  • How can we reproduce it? Please try to provide a sample application (or Virtual Machine) demonstrating the issue. Otherwise, if we can't reproduce it, we might have to ask you a number of followup questions or run certain commands to try and figure out the problem.
    -- I can't consistently reproduce it yet. It only happens in prod and with low memory, I haven't been able to catch any processes that end up being killed by the OS. I do have a show=requests from a time when the error was in full swing. I believe this is the offending connection, though again, I don't have anything supporting this to tell a better story
"active_client_count" : 393,
		"active_clients" :
		{
			"1-114832" :
			{
				"connected_at" :
				{
					"local" : "Sun Dec  4 06:49:32 2022",
					"relative" : "41m 19s ago",
					"relative_timestamp" : -2479.3378040790558,
					"timestamp" : 1670154572.1215811
				},
				"connection_state" : "ACTIVE",
				"current_request" :
				{
					"app_response_http_state" : "PARSING_HEADERS",
					"app_sink_state" :
					{
						"callback_in_progress" : false,
						"initialized" : true,
						"io_watcher_active" : false
					},
					"app_source_state" :
					{
						"callback_in_progress" : false,
						"initialized" : true,
						"io_watcher_active" : true
					},
					"content_length" : 3191,
					"flags" :
					{
						"dechunk_response" : true,
						"https" : false,
						"request_body_buffering" : false
					},
					"host" : "127.0.0.1:8000",
					"http_major" : 1,
					"http_minor" : 1,
					"http_state" : "PARSING_BODY",
					"last_data_receive_time" :
					{
						"local" : "Sun Dec  4 06:51:32 2022",
						"relative" : "39m 19s ago",
						"relative_timestamp" : -2359.2372481822968,
						"timestamp" : 1670154692.22212
					},
					"last_data_send_time" : null,
					"method" : "POST",
					"next_request_early_read_error" : "The client connection is closed before the request is done processing (errno=-1020)",
					"path" : [REDACTED],
					"refcount" : 1,
					"request_body_already_read" : 3191,
					"request_body_fully_read" : true,
					"request_body_type" : "CONTENT_LENGTH",
					"response_begun" : false,
					"session" :
					{
						"gupid" : "1a8b925-HItyJLRJxd",
						"pid" : 3516
					},
					"session_checkout_try" : 1,
					"started_at" :
					{
						"local" : "Sun Dec  4 06:49:32 2022",
						"relative" : "41m 19s ago",
						"relative_timestamp" : -2479.3378040790558,
						"timestamp" : 1670154572.121572
					},
					"state" : "WAITING_FOR_APP_OUTPUT",
					"sticky_session" : false,
					"want_keep_alive" : false
				},
				"lingering_request_count" : 0,
				"name" : "1-114832",
				"number" : 114832,
				"output_channel_state" :
				{
					"bytes_buffered" :
					{
						"bytes" : 0,
						"human_readable" : "0 bytes"
					},
					"callback_in_progress" : false,
					"mode" : "IN_MEMORY_MODE",
					"nbuffers" : 0,
					"reader_state" : "RS_INACTIVE"
				},
				"refcount" : 2,
				"requests_begun" : 1
			},

Question 2: Passenger version and integration mode:
Your answer:

  • 6.0.15 standalone open source

Question 3: OS or Linux distro, platform (including version):

  • For example: Debian 8, x86_64 or OS X 10.10 Yosemite, x86_64

Your answer:

  • Ubuntu 20 LTS

Question 4: Passenger installation method:

Your answer:
[x] RubyGems + Gemfile
[ ] RubyGems, no Gemfile
[ ] Phusion APT repo
[ ] Phusion YUM repo
[ ] OS X Homebrew
[ ] source tarball
[ ] Other, please specify:

Question 5: Your app's programming language (including any version managers) and framework (including versions):

  • For example: Ruby 2.3.0, RVM, Rails 5.0.0; Node.js 4.0.0 with Express 4.13.4

Your answer:

  • Rails 6.1.6.1
  • Ruby 3.0.4p208
  • No UI

Question 6: Are you using a PaaS and/or containerization? If so which one?

  • For example: Heroku, Amazon Container Services, Docker 1.9 with an image based on passenger-docker

Your answer:
The app is hosted in a docker container

Question 7: Anything else about your setup that we should know?

Your answer:
Currently the application is in the "low" memory state, causing OOM issues.

The reason I think passenger has a lot to do with it is that utils.rb#110. That points to a line in passenger that does some thread management for exception handling.

Boosting the app to have more memory induced the holding behavior. Here's a graph of the behavior:
Queue Depth and Pool usage:
image

Memory:
image

I will continue to made changes on my end and if I discover what the app is doing that caused passenger to do this, i'll be sure to let you know.

Thanks!

We strive for quality and appreciate you taking the time to submit a report! Please note that if you want guaranteed response times and priority issue support we encourage you to join our enterprise customer base. They also provide us with the means to continue our high level of open source support!

@davidhatten
Copy link
Author

davidhatten commented Dec 20, 2022

Ok, this can be closed or merged or whatever works on your side to manage it. I have discovered the overall shape of the issue and while passenger played a role, i'm not sure there actually is a bug in passenger.

Situation: the controller was sleeping on redis key creation (sleep 0.05 until Redis.setnx), for a lock. One issue was that the key creation and the ttl setting were on different lines which could cause a deadlock. However, this deadlock occurred even when no issue setting the ttl happened. Eventually the sleeping Passenger thread would run out of memory and the OS in the container would kill it. The ttl on the key was 30 seconds so the webapp thread could be waiting for 30s in .05s bits.

It is worth noting that this issue became visible with Rails 6 + Passenger 6. If it was existing before, it wasn't attracting OS attention and the processes weren't getting killed for OOM issue.

The underlying design we were using has been rectified and this issue cannot happen anymore.

Add to the pile of "Passenger doesn't like a sleep-ing webapp processes", which is fair, because sleep-ing on a webapp process is pretty galaxy brain anyway.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant