Replies: 1 comment 16 replies
-
You need more load. Puma can clearly handle more load in your setup. If you're seeing 10% CPU utilization but request queue times are above 1 second, then you must have a problem somewhere else in the setup.
The solution is to not block. You should be immediately returning a 200 to the client and longpolling for the completed response, or use WebSockets. The current experience is suboptimal for the client, because they're simply on hold with no response while the server waits. |
Beta Was this translation helpful? Give feedback.
-
I'll break this down into two parts because one is more configuration specific and the other is about handling slow requests?
Configuration Question
From everything I've read I'm baffled by how Puma is supposed to be configured outside of Heroku or the like. All documentation and anything written recommends 1-1.5x workers per CPU and 5-6 threads. I can't figure out how this is optimal or scales when you start using EC2 instances.
For example, we have EC2 m6i.xlarge instances with 4 vCPU and 16GB and if I run them with 6 workers and 6 threads the servers will idle with 90% of their resources unused. Currently we're running them with 24 workers and and 25 threads and the CPU is idling at around 6-10% and the memory is around 60%.
What I missing about how to configure Puma?
Slow Request Handling
We have an API endpoint that has a default timeout of 5s and a maximum of 30s (user configurable). When a request comes in we queue a Sidekiq worker which will run for minutes if necessary to complete the process, but usually a whole request can be completed in an average of 400-500ms (fastest around 125ms). Our controller for this action is very simple. It validates the API request, queues a worker and then uses a mix of Redis blocking (first 5s) and polling (every second after) until the worker puts the full JSON response into Redis.
Currently, we've implemented this Redis blocking and polling in a Lua script running in NGINX which offloads all of the slow request handling from Puma onto NGINX. Instead of the controller action polling it just responds to our Lua script which handles this. While this scales well, it's virtually a black box for monitoring and requires duplicated logic inside Lua (we want Lua failures to safely fallback and still work in Rails). I'd like to eliminate this complexity in our application and have Puma handle it, but I'm not sure Puma is capable of handling this task.
Appreciative to anyone with more insight and feedback into how this should be optimally handled.
Beta Was this translation helpful? Give feedback.
All reactions