Scaling Puma to 500 requests per second w/ slow request handling #3136

jclusso · 2023-04-28T13:53:38Z

jclusso
Apr 28, 2023

I'll break this down into two parts because one is more configuration specific and the other is about handling slow requests?

Configuration Question

From everything I've read I'm baffled by how Puma is supposed to be configured outside of Heroku or the like. All documentation and anything written recommends 1-1.5x workers per CPU and 5-6 threads. I can't figure out how this is optimal or scales when you start using EC2 instances.

For example, we have EC2 m6i.xlarge instances with 4 vCPU and 16GB and if I run them with 6 workers and 6 threads the servers will idle with 90% of their resources unused. Currently we're running them with 24 workers and and 25 threads and the CPU is idling at around 6-10% and the memory is around 60%.

What I missing about how to configure Puma?

Slow Request Handling

We have an API endpoint that has a default timeout of 5s and a maximum of 30s (user configurable). When a request comes in we queue a Sidekiq worker which will run for minutes if necessary to complete the process, but usually a whole request can be completed in an average of 400-500ms (fastest around 125ms). Our controller for this action is very simple. It validates the API request, queues a worker and then uses a mix of Redis blocking (first 5s) and polling (every second after) until the worker puts the full JSON response into Redis.

Currently, we've implemented this Redis blocking and polling in a Lua script running in NGINX which offloads all of the slow request handling from Puma onto NGINX. Instead of the controller action polling it just responds to our Lua script which handles this. While this scales well, it's virtually a black box for monitoring and requires duplicated logic inside Lua (we want Lua failures to safely fallback and still work in Rails). I'd like to eliminate this complexity in our application and have Puma handle it, but I'm not sure Puma is capable of handling this task.

Appreciative to anyone with more insight and feedback into how this should be optimally handled.

nateberkopec · 2023-05-02T07:14:21Z

nateberkopec
May 2, 2023
Maintainer

f I run them with 6 workers and 6 threads the servers will idle with 90% of their resources unused

You need more load. Puma can clearly handle more load in your setup. If you're seeing 10% CPU utilization but request queue times are above 1 second, then you must have a problem somewhere else in the setup.

I'd like to eliminate this complexity in our application and have Puma handle it, but I'm not sure Puma is capable of handling this task.

The solution is to not block. You should be immediately returning a 200 to the client and longpolling for the completed response, or use WebSockets. The current experience is suboptimal for the client, because they're simply on hold with no response while the server waits.

16 replies

dentarg May 9, 2023
Maintainer

Isn't that the expected results? Matches everything Nate has stated about Puma and MRI/CRuby (the threads in Test 1 has to wait on the GIL?)

jclusso May 9, 2023
Author

Isn't that the expected results? Matches everything Nate has stated about Puma and MRI/CRuby (the threads in Test 1 has to wait on the GIL?)

That doesn’t really explain how I’m running 25 workers instead of the suggested 4 and getting significantly better results.

jclusso Jul 10, 2023
Author

Chiming back in on this thread since it's been a while. Any further suggestions based on my most recent tests that completely violate the recomendation of 4 workers yet show significantly better performance?

dentarg Jul 10, 2023
Maintainer

Haven't we established that the recommendation of 4 workers is for an app that perform compute work in its routes but yours are almost exclusively I/O? I'm not sure it is fair to judge Puma based on your specific application.

It would be great if you could benchmark an app that you can share, so others could try to reproduce/confirm any numbers presented.

jclusso Jul 10, 2023
Author

@dentarg I was thought that scaling the threads was where I should go and I should really stick to the 4 workers part. Maybe I misunderstood. Let me see if I can create a very barebones app to share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling Puma to 500 requests per second w/ slow request handling #3136

{{title}}

Replies: 1 comment 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Scaling Puma to 500 requests per second w/ slow request handling #3136

jclusso Apr 28, 2023

Configuration Question

Slow Request Handling

Replies: 1 comment · 16 replies

nateberkopec May 2, 2023 Maintainer

dentarg May 9, 2023 Maintainer

jclusso May 9, 2023 Author

jclusso Jul 10, 2023 Author

dentarg Jul 10, 2023 Maintainer

jclusso Jul 10, 2023 Author

jclusso
Apr 28, 2023

Replies: 1 comment 16 replies

nateberkopec
May 2, 2023
Maintainer

dentarg May 9, 2023
Maintainer

jclusso May 9, 2023
Author

jclusso Jul 10, 2023
Author

dentarg Jul 10, 2023
Maintainer

jclusso Jul 10, 2023
Author