Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring performance bottleneck #78

Open
atitan opened this issue Oct 16, 2019 · 3 comments
Open

Measuring performance bottleneck #78

atitan opened this issue Oct 16, 2019 · 3 comments

Comments

@atitan
Copy link
Contributor

atitan commented Oct 16, 2019

We use Scout APM to monitoring performance.

It seems Falcon and Puma have different approach handling requests.

Falcon has much higher queue time(yellow part in chart, time before request being processed) and low processing time. Like requests are blocked outside of server to wait for entrance.

Puma has much higher ActiveRecord time(green part in chart) and low queue time.

Both become slow during benchmark test and have similar response time.

image

Currently we're able to increase Falcon's throughput by using 8 processes for each 4 cpu machine, which originally has only 5 processes.

Is there anyway to probe the situation/bottleneck in Falcon?

@ioquatix
Copy link
Member

If you are doing high latency blocking operations in the event loop you will see this kind of response.

Because the core of the event loop for the server is:

connection = accept connection
connection.each_request do |request|
  response = process(request)
  conntion.send_response(response)
end

It's not quite that simple but that's generally how it fits together.

If you are blocking in process(request), we can not receive new requests (e.g. multiplexing ala HTTP/2 nor can we accept more connections.

You need to identify what is the blocking operation, probably a database query, and then decide if async-postgres or async-mysql is mature enough to work in your application.

If you have blocking operations that you simply can't avoid, you can spin up a thread and use Async::IO::Notification for handling reactor <-> thread synchronisation. I can give you some example code.

@atitan
Copy link
Contributor Author

atitan commented Oct 17, 2019

Is it the same to start Falcon in hybrid mode to use thread for request handling?

Also, I'd like to know how connection pool plays in this part.
Does it really help in hybrid or fork mode if requests are considered blocking the event loop reactor?

@ioquatix
Copy link
Member

That is a good question.

Yes, hybrid mode should give you mostly the same performance characteristics as puma cluster mode.

However, ideally you use non-blocking adapters otherwise there are still some cases where you can experience high latency, i.e. if two connections are within the same reactor on the same thread.

Process Model

One parent process spawns N child processes, one reactor per child process.

Thread Model

One parent process spawns N threads, one reactor per thread. GVL contention.

Hybrid Model

One parent process spawns N processes, and each process makes M threads, one reactor per thread. GVL contention, but more threads = better handling of blocking operations.

Let me know if you need further clarifications - happy to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants