New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance of server-side SSL connection close. #2675
Conversation
When the server wants to close a persistent SSL connection because it was idle for `persistent_timeout`, the call stack is: Reactor.wakeup! Server#reactor_wakeup Client#close MiniSSL::Socket#close The close method is called from within the reactor thread. In this case, `should_drop_bytes?` is true, because `@engine.shutdown` is false the first time it is called. Then, `read_and_drop(1)` is called, which starts by selecting on the socket. Now because this is a server-initiated close of an idle connection, in almost all cases there will be nothing to select, and hence the thread will just wait for 1 second. Since this is called by the reactor, the reactor will halt for 1 second and not be able to buffer any data during that time, which has huge effects on overall worker throughput. Now I'm not sure what is the use to read from the socket? * From the docs: It is acceptable for an application to only send its shutdown alert and then close the underlying connection without waiting for the peer's response. https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html * The existing code did not seem to send any shutdown alert, because the shutdown method was called only on the engine, but the engine is not connected to the actual TCP socket. The resulting data still needed to be compied * If the server wants to wait for the client's close_notify shutdown alert, then this waiting needs to happen with a non-blocking select in the reactor, so other work can be done in the meantime. This is not trivial to implement, though. Note that when the client initiated the close and the data was already read into the engine, @engine.shutdown will return true immediately, hence this is only a problem when the server wants to close a connection.
👍 The code from this merge request solves the issue of occasional 502 responses with SSL and Amazon load balancer, as described in #1735, without increasing the It also completely solves the occasional 502 on worker restarts with the same SSL+Amazon load balancer configuration |
Looks like the original code was introduced in #1334 to fix a puma shutdown issue with SSL. I confirm that with the code from this PR puma shuts down without any issue when there are established SSL keep-alive connections. |
🤔 restarting the truffleruby check or rebasing should solve the failing check, I can't reproduce the failure in my fork neither locally |
This definitely looks like a bit of old whale legs. If I'm reading this correctly, the I think this makes sense, needs a test. |
Marking as waiting-on-changes, let us know if you need help writing the test @devwout |
I've got a test for this, passing here, failing on master. I'll try to get a PR soon showing the master failure... Thanks for the PR. |
Completely lost track of this, thank you for adding the tests! |
Thanks for the fix! I'm the original author of #1334 I saw the fix by devwout is majorly performing unidirectional shutdown.
Though, unidirectional shutdown might be enough in most use cases of puma I guess. To prevent timeout on select (slow client), we could also introduce an array of pending close sockets which makes it possible to use a single IO#select call for all the sockets. |
When the server wants to close a persistent SSL connection because it was idle for `persistent_timeout`, the call stack is: Reactor.wakeup! Server#reactor_wakeup Client#close MiniSSL::Socket#close The close method is called from within the reactor thread. In this case, `should_drop_bytes?` is true, because `@engine.shutdown` is false the first time it is called. Then, `read_and_drop(1)` is called, which starts by selecting on the socket. Now because this is a server-initiated close of an idle connection, in almost all cases there will be nothing to select, and hence the thread will just wait for 1 second. Since this is called by the reactor, the reactor will halt for 1 second and not be able to buffer any data during that time, which has huge effects on overall worker throughput. Now I'm not sure what is the use to read from the socket? * From the docs: It is acceptable for an application to only send its shutdown alert and then close the underlying connection without waiting for the peer's response. https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html * The existing code did not seem to send any shutdown alert, because the shutdown method was called only on the engine, but the engine is not connected to the actual TCP socket. The resulting data still needed to be compied * If the server wants to wait for the client's close_notify shutdown alert, then this waiting needs to happen with a non-blocking select in the reactor, so other work can be done in the meantime. This is not trivial to implement, though. Note that when the client initiated the close and the data was already read into the engine, @engine.shutdown will return true immediately, hence this is only a problem when the server wants to close a connection.
When the server wants to close a persistent SSL connection
because it was idle for
persistent_timeout
, the call stack is:The close method is called from within the reactor thread.
In this case,
should_drop_bytes?
is true, because@engine.shutdown
is false the first time it is called.
Then,
read_and_drop(1)
is called, which starts by selecting on thesocket. Now because this is a server-initiated close of an idle
connection, in almost all cases there will be nothing to select,
and hence the thread will just wait for 1 second.
Since this is called by the reactor, the reactor will halt for 1 second
and not be able to buffer any data during that time, which has huge
effects on overall worker throughput.
Now I'm not sure what is the use to read from the socket?
From the docs:
It is acceptable for an application to only send its shutdown alert and
then close the underlying connection without waiting for the peer's
response.
https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html
The existing code did not seem to send any shutdown alert,
because the shutdown method was called only on the engine, but the
engine is not connected to the actual TCP socket. The resulting data
still needed to be compied
If the server wants to wait for the client's close_notify shutdown alert,
then this waiting needs to happen with a non-blocking select in the reactor,
so other work can be done in the meantime. This is not trivial to
implement, though.
Note that when the client initiated the close and the data was already
read into the engine, @engine.shutdown will return true immediately,
hence this is only a problem when the server wants to close a
connection.
Description
Please describe your pull request. Thank you for contributing! You're the best.
Your checklist for this pull request
[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.