Any recommendations on how to evaluate Puma’s performance as we increase the number of threads per worker? #3241

d2army · 2023-09-30T15:51:41Z

d2army
Sep 30, 2023

Hi Puma community,

I consider myself pretty new to Puma and have a question for you veterans and experts.

In our heavy production environment, we have a Rails service on Puma that currently has 1 worker per CPU core, and has 1 thread per worker. There are 128 cores per instance

Based on the Puma metrics, we see that there were times we might be getting close to running out of available threads per instance, ie getting close to 128.

So we have recently increased the number of threads per worker from 1 to 2, and now have not seen the case of running out of threads per instance so far( because now we got 256)

My question is how I can also evaluate if the amount of time a request takes before getting processed by an available thread has improved since increasing the number of threads. I am not sure this can be easily measured in Puma?

Also,the "backlog" metric for Puma didn't really change much when the threads got doubled

Thanks, interested to hear all your thoughts

Answered by nateberkopec

Oct 2, 2023

There are 128 cores per instance

IME there are pretty heavy diminishing returns on multithreaded performance once you go past 64 CPUs on a single board. Do you mean 128 cores on a single motherboard?

My question is how I can also evaluate if the amount of time a request takes before getting processed by an available thread has improved since increasing the number of threads. I am not sure this can be easily measured in Puma?

It is quite easily measured in fact! What you're looking for is request queue time. You measure it in a way that isn't Puma-specific: timestamp each request at your load balancer with an HTTP header (traditionally X-Request-Start) that has the current time in mill…

View full answer

nateberkopec · 2023-10-02T02:21:20Z

nateberkopec
Oct 2, 2023
Maintainer

There are 128 cores per instance

IME there are pretty heavy diminishing returns on multithreaded performance once you go past 64 CPUs on a single board. Do you mean 128 cores on a single motherboard?

My question is how I can also evaluate if the amount of time a request takes before getting processed by an available thread has improved since increasing the number of threads. I am not sure this can be easily measured in Puma?

It is quite easily measured in fact! What you're looking for is request queue time. You measure it in a way that isn't Puma-specific: timestamp each request at your load balancer with an HTTP header (traditionally X-Request-Start) that has the current time in milliseconds. Then, write a Rack middleware that looks at the time, subtracts the current time, and now you have the time spent waiting for a Puma thread.

Puma backlog is for something else entirely.

2 replies

d2army Oct 5, 2023
Author

Hi Nate,

thanks for the response.

I have tried the X-Request-Start idea and will collect some data.

By 128 cores per instance, I meant we have 128 vCPUs per EC2 instance in AWS.

When you mentioned Puma backlog is for something else entirely. , I assume you are referring to this backlog value from Puma.stats?

requests that are waiting for an available thread to be available

nateberkopec Oct 6, 2023
Maintainer

By 128 cores per instance, I meant we have 128 vCPUs per EC2 instance in AWS.

Wowza. You'd probably get better cost-performance out of 4 32-core machines, but maybe you prefer scaling single-node.

When you mentioned Puma backlog is for something else entirely. , I assume you are referring to this backlog value from Puma.stats?

Right. Requests waiting for a thread to be available - this means that they have already been buffered by the Reactor and are waiting on a thread from the threadpool. What I'm talking about is before that. Most requests spend much more time queuing before the Reactor even accepts the request (since the reactor is designed not to accept more requests than the threadpool can handle. this is why your backlog didn't change).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any recommendations on how to evaluate Puma’s performance as we increase the number of threads per worker? #3241

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Any recommendations on how to evaluate Puma’s performance as we increase the number of threads per worker? #3241

d2army Sep 30, 2023

Replies: 1 comment · 2 replies

nateberkopec Oct 2, 2023 Maintainer

d2army Oct 5, 2023 Author

nateberkopec Oct 6, 2023 Maintainer

d2army
Sep 30, 2023

Replies: 1 comment 2 replies

nateberkopec
Oct 2, 2023
Maintainer

d2army Oct 5, 2023
Author

nateberkopec Oct 6, 2023
Maintainer