Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of server-side SSL connection close. #2675

Merged
merged 1 commit into from Sep 8, 2021

Conversation

devwout
Copy link
Contributor

@devwout devwout commented Aug 9, 2021

When the server wants to close a persistent SSL connection
because it was idle for persistent_timeout, the call stack is:

Reactor.wakeup!
  Server#reactor_wakeup
    Client#close
      MiniSSL::Socket#close

The close method is called from within the reactor thread.

In this case, should_drop_bytes? is true, because @engine.shutdown
is false the first time it is called.

Then, read_and_drop(1) is called, which starts by selecting on the
socket. Now because this is a server-initiated close of an idle
connection, in almost all cases there will be nothing to select,
and hence the thread will just wait for 1 second.

Since this is called by the reactor, the reactor will halt for 1 second
and not be able to buffer any data during that time, which has huge
effects on overall worker throughput.

Now I'm not sure what is the use to read from the socket?

  • From the docs:

    It is acceptable for an application to only send its shutdown alert and
    then close the underlying connection without waiting for the peer's
    response.
    https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html

  • The existing code did not seem to send any shutdown alert,
    because the shutdown method was called only on the engine, but the
    engine is not connected to the actual TCP socket. The resulting data
    still needed to be compied

  • If the server wants to wait for the client's close_notify shutdown alert,
    then this waiting needs to happen with a non-blocking select in the reactor,
    so other work can be done in the meantime. This is not trivial to
    implement, though.

Note that when the client initiated the close and the data was already
read into the engine, @engine.shutdown will return true immediately,
hence this is only a problem when the server wants to close a
connection.

Description

Please describe your pull request. Thank you for contributing! You're the best.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

When the server wants to close a persistent SSL connection
because it was idle for `persistent_timeout`, the call stack is:

    Reactor.wakeup!
      Server#reactor_wakeup
        Client#close
          MiniSSL::Socket#close

The close method is called from within the reactor thread.

In this case, `should_drop_bytes?` is true, because `@engine.shutdown`
is false the first time it is called.

Then, `read_and_drop(1)` is called, which starts by selecting on the
socket. Now because this is a server-initiated close of an idle
connection, in almost all cases there will be nothing to select,
and hence the thread will just wait for 1 second.

Since this is called by the reactor, the reactor will halt for 1 second
and not be able to buffer any data during that time, which has huge
effects on overall worker throughput.

Now I'm not sure what is the use to read from the socket?

* From the docs:

    It is acceptable for an application to only send its shutdown alert and
    then close the underlying connection without waiting for the peer's
    response.
    https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html

* The existing code did not seem to send any shutdown alert,
  because the shutdown method was called only on the engine, but the
  engine is not connected to the actual TCP socket. The resulting data
  still needed to be compied

* If the server wants to wait for the client's close_notify shutdown alert,
  then this waiting needs to happen with a non-blocking select in the reactor,
  so other work can be done in the meantime. This is not trivial to
  implement, though.

Note that when the client initiated the close and the data was already
read into the engine, @engine.shutdown will return true immediately,
hence this is only a problem when the server wants to close a
connection.
@misdoro
Copy link
Contributor

misdoro commented Aug 26, 2021

👍 The code from this merge request solves the issue of occasional 502 responses with SSL and Amazon load balancer, as described in #1735, without increasing the persistent_timeout setting.

It also completely solves the occasional 502 on worker restarts with the same SSL+Amazon load balancer configuration

@misdoro
Copy link
Contributor

misdoro commented Aug 27, 2021

Looks like the original code was introduced in #1334 to fix a puma shutdown issue with SSL.

I confirm that with the code from this PR puma shuts down without any issue when there are established SSL keep-alive connections.

@misdoro
Copy link
Contributor

misdoro commented Aug 27, 2021

🤔 restarting the truffleruby check or rebasing should solve the failing check, I can't reproduce the failure in my fork neither locally

@nateberkopec
Copy link
Member

This definitely looks like a bit of old whale legs.

If I'm reading this correctly, the close method before did absolutely nothing except, potentially, cause a 1 second wait before always closing the socket in the ensure block.

I think this makes sense, needs a test.

@nateberkopec nateberkopec added waiting-for-changes Waiting on changes from the requestor and removed waiting-for-review Waiting on review from anyone labels Sep 6, 2021
@nateberkopec
Copy link
Member

Marking as waiting-on-changes, let us know if you need help writing the test @devwout

@MSP-Greg
Copy link
Member

MSP-Greg commented Sep 7, 2021

@devwout

I've got a test for this, passing here, failing on master. I'll try to get a PR soon showing the master failure... Thanks for the PR.

@MSP-Greg MSP-Greg merged commit 319f84d into puma:master Sep 8, 2021
@MSP-Greg MSP-Greg removed the waiting-for-changes Waiting on changes from the requestor label Sep 8, 2021
@devwout
Copy link
Contributor Author

devwout commented Sep 28, 2021

Completely lost track of this, thank you for adding the tests!

@respire
Copy link
Contributor

respire commented Oct 6, 2021

Thanks for the fix!

I'm the original author of #1334
I noticed that a change on read_and_drop which replaces MiniSSL::Socket#read_nonblock call with ruby stdlib ::Socket#read_nonblock without injecting the read bytes back to SSL engine.
This might be the major cause of keeping IO#select timeout.

I saw the fix by devwout is majorly performing unidirectional shutdown.
SSL_shutdown onlys closes the write side, not the read side.
To perform a bi-directional one instead, we could call
SSL_read (at MiniSSL#read) to wait for peer to return close_notify.
Absolutely, we should guard it with wait_readable to make it non-blocking.
This is what MiniSSL::Socket#read_nonblock does.
So, you could call read_nonblock instead of

while alert_data = @engine.extract
   @socket.write alert_data
end

Though, unidirectional shutdown might be enough in most use cases of puma I guess.

To prevent timeout on select (slow client), we could also introduce an array of pending close sockets which makes it possible to use a single IO#select call for all the sockets.
But considering most people might connect puma with high speed vpc, blocking IO might not be a big problem.

JuanitoFatas pushed a commit to JuanitoFatas/puma that referenced this pull request Sep 9, 2022
When the server wants to close a persistent SSL connection
because it was idle for `persistent_timeout`, the call stack is:

    Reactor.wakeup!
      Server#reactor_wakeup
        Client#close
          MiniSSL::Socket#close

The close method is called from within the reactor thread.

In this case, `should_drop_bytes?` is true, because `@engine.shutdown`
is false the first time it is called.

Then, `read_and_drop(1)` is called, which starts by selecting on the
socket. Now because this is a server-initiated close of an idle
connection, in almost all cases there will be nothing to select,
and hence the thread will just wait for 1 second.

Since this is called by the reactor, the reactor will halt for 1 second
and not be able to buffer any data during that time, which has huge
effects on overall worker throughput.

Now I'm not sure what is the use to read from the socket?

* From the docs:

    It is acceptable for an application to only send its shutdown alert and
    then close the underlying connection without waiting for the peer's
    response.
    https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html

* The existing code did not seem to send any shutdown alert,
  because the shutdown method was called only on the engine, but the
  engine is not connected to the actual TCP socket. The resulting data
  still needed to be compied

* If the server wants to wait for the client's close_notify shutdown alert,
  then this waiting needs to happen with a non-blocking select in the reactor,
  so other work can be done in the meantime. This is not trivial to
  implement, though.

Note that when the client initiated the close and the data was already
read into the engine, @engine.shutdown will return true immediately,
hence this is only a problem when the server wants to close a
connection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lack of SSL max_version can cause excessive slow negotation, 502 on AWS w/ELB
5 participants