Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid blocking on SSL's #read_nonblock #1857

Merged
merged 1 commit into from Aug 3, 2019
Merged

Avoid blocking on SSL's #read_nonblock #1857

merged 1 commit into from Aug 3, 2019

Conversation

Jesus
Copy link
Contributor

@Jesus Jesus commented Jul 19, 2019

Closes #1835.

@MSP-Greg
Copy link
Member

Maybe a similar idea, add the line between lines 59 & 60:

raise Errno::EAGAIN unless IS_WINDOWS

puma/lib/puma/minissl.rb

Lines 57 to 72 in 099a447

begin
data = @socket.read_nonblock(size, exception: false)
if data == :wait_readable || data == :wait_writable
if @socket.to_io.respond_to?(data)
@socket.to_io.__send__(data)
elsif data == :wait_readable
IO.select([@socket.to_io])
else
IO.select(nil, [@socket.to_io])
end
elsif !data
return nil
else
break
end
end while true

Tests haven't completed yet, but both Travis and Appveyor seem on their way to passing in my fork.

JFYI, not using Puma, but with Ruby trunk & OpenSSL 1.1.1, I make several verified SSL connections to GitHub and Appveyor every day...

@Jesus
Copy link
Contributor Author

Jesus commented Jul 19, 2019

Hi Greg, thanks for chiming in. We can get tests passing in all platforms with the following:

data = @socket.read_nonblock(size, exception: false)
if data == :wait_readable || data == :wait_writable
  raise Errno::EAGAIN
elsif data.nil?
  return nil
end

But I was hoping to find a better explanation of why the following fails on Windows (I would expect to have the same results, as it does in Linux & OSX):

data = @socket.read_nonblock(size)
return nil if data.nil?

I have some guesses but I don't have a Windows machine to debug, so perhaps I should just move on.

@MSP-Greg
Copy link
Member

Jesús,

That's look good, and avoids the all the IO.select's. I was in the middle of several things when I checked it, it had an error on a test that involved a refused connection. Give me a little while to re-check it...

@MSP-Greg
Copy link
Member

Jesús,,

Just tested locally with the gem built by the PR, on Ruby 2.2 (1.0.2) and trunk (1.1.1), and running test_puma_server_ssl.rb several times, it always passed on both. Thanks, Greg

@Jesus
Copy link
Contributor Author

Jesus commented Jul 19, 2019

Thats great, thanks for taking the time!

Gonna keep this as a draft until I write tests confirming that this fixes #1835.

@evanphx evanphx marked this pull request as ready for review July 19, 2019 22:48
@evanphx
Copy link
Member

evanphx commented Jul 19, 2019

Ooops, didn't realize clicking that button would change the status for everyone! This looks fine, it's certainly a bug for read_nonblock to, well, block like that.

@MSP-Greg
Copy link
Member

MSP-Greg commented Jul 24, 2019

Maybe replace the added test with something like the following, and move test_request_wont_block_thread to test_puma_server.rb? JFYI, I just tried locally with Ruby trunk & windows...

EDIT: changed to a much better test. It actually freezes testing on Puma w/o this PR...

  def test_request_wont_block_thread
    # Open a connection and give enough data to trigger a read, then wait
    ctx = OpenSSL::SSL::SSLContext.new
    ctx.verify_mode = OpenSSL::SSL::VERIFY_NONE
    socket = OpenSSL::SSL::SSLSocket.new TCPSocket.new(@host, @port), ctx
    socket.write "x"
    sleep 1

    # Capture the amount of threads being used after connecting and being idle
    thread_pool = @server.instance_variable_get(:@thread_pool)
    busy_threads = thread_pool.spawned - thread_pool.waiting

    socket.close

    # The thread pool should be empty since the request would block on read
    # and our request should have been moved to the reactor.
    assert busy_threads.zero?, "Our connection is monopolizing a thread"
  end

@Jesus
Copy link
Contributor Author

Jesus commented Jul 24, 2019

Would you mind explaining how that test is better? I've tried running it against the code without this PR and it's stuck forever. I'm not sure if that's intended 🤔.

Re your other suggestion, I don't think we can move it to test_puma_server.rb because we need the server be started with SSL.

@MSP-Greg
Copy link
Member

MSP-Greg commented Jul 24, 2019

If one tries to connect to an ssl/https server with http, what should happen? It should refuse it. And, http is a tcp connection. So, should the test use ssl or tcp?

I should mention that my quick look at this last night seemed to show issues with a Puma SSL server gracefully rejecting an http connection. I tested it both with code and a browser.

I'm not sure if that was due to TLSv1.3 or something else. I haven't had time to force TLSv1.2. After the release of OpenSSL 1.1.1, which was the first release with TLSv1.3, I recall that there were differences between how TLSv1.3 and older versions refused/dropped invalid connections. I don't recall the specifics.

Also, there isn't a test for it.

@Jesus
Copy link
Contributor Author

Jesus commented Jul 24, 2019

I've added a comment on the test to clarify what's the intention of using a raw TCP socket. The good part about it is that it's cypher-suite agnostic. I'll see if I can convince you why.

If one tries to connect to an ssl/https server with http, what should happen? It should refuse it. And, http is a tcp connection. So, should the test use ssl or tcp?

That's right, but before the server can reject a connection it needs to receive a full SSL client hello. Until the handshake is exchanged it's just a regular TCP connection.

So what I tried to do is cause the server to block on read right after the connection is created and even before the SSL handshake is exchanged.

The test as it currently stands makes Puma call Puma::MiniSSL::Socket#read_nonblock and what we want to check is that this read call won't stay in the thread pool.

Does that make sense to you? You're the SSL expert here so please correct me if I have any wrong assumptions.

@Jesus Jesus changed the title [WIP] Avoid blocking on #read_nonblock Avoid blocking on SSL's #read_nonblock Jul 24, 2019
@MSP-Greg
Copy link
Member

MSP-Greg commented Jul 24, 2019

@Jesus,

Just to clarify, removing all IO.select statements is good, as that's the whole point of nio4r, so thanks for your work on this.

is that it's cypher-suite agnostic

I think we want it to check against all the OpenSSL variations, and we want to know if any issues arise.

I ran both tests, adding some debug output, as shown below. They're basically the same. Your test has a few reads after the sleep, the https/ssl test takes a little time to close the 'client' connection.

TestPumaServerSSL#test_request_wont_block_thread =
      0.00359  Server#accept_nonblock         18
      0.00618  MiniSSL::Socket#read_nonblock  18
      0.00657  MiniSSL::Socket#read_nonblock  18
      0.00783  Reactor#add c.io.to_io         18
      1.00891  after sleep 1
      1.02421  MiniSSL::Socket#read_nonblock  18
      1.02620  MiniSSL::Socket#read_nonblock  18
1.03 s = *

TestPumaServerSSL#test_request_wont_block_thread_https =
      0.00322  Server#accept_nonblock         23
      0.00628  MiniSSL::Socket#read_nonblock  23
      0.00668  MiniSSL::Socket#read_nonblock  23
      0.00798  Reactor#add c.io.to_io         23
      1.01281  after sleep 1
2.05 s = *

I'm still looking into how Puma SSL refuses http/tcp connections....

EDIT: I meant to move it earlier, but was distracted. The above output for MiniSSL::Socket#read_nonblock is with the output coming right before the @socket.read_nonblock statement. Below out is for the output right before the while true statement.

TestPumaServerSSL#test_request_wont_block_thread =
      0.00248  Server#accept_nonblock         21
      0.01372  MiniSSL::Socket#read_nonblock  21
      0.01465  Reactor#add c.io.to_io         21
      1.00293  after sleep 1
      1.01877  MiniSSL::Socket#read_nonblock  21
      1.02225  MiniSSL::Socket#read_nonblock  21
1.04 s = *

TestPumaServerSSL#test_request_wont_block_thread_https =
      0.00351  Server#accept_nonblock         25
      0.00910  MiniSSL::Socket#read_nonblock  25
      0.01023  Reactor#add c.io.to_io         25
      1.01056  after sleep 1
2.05 s = *

@Jesus
Copy link
Contributor Author

Jesus commented Jul 25, 2019

Just to clarify, removing all IO.select statements is good, as that's the whole point of nio4r, so thanks for your work on this.

Thanks for your feedback too!

I'm still looking into how Puma SSL refuses http/tcp connections....

Looks like it won't refuse it. I've checked this by doing the following experiment:

  1. Launch Puma with SSL enabled.
  2. Open a raw TCP connection with telnet for example.
  3. Write any amount of nonsense characters, they're all received by the server and it won't drop the connection.

puma-ssl

At the very end of the demo, you can see some errors from the SSL engine. Obviously what I sent wasn't a valid "client hello" so an error happens.

Perhaps we should reject the connection after reading enough bytes to build a "client hello". But that'd be a different issue and outside the scope of this PR.

So what do you think? Should we change the test so it opens an SSL connection anyway?

@MSP-Greg
Copy link
Member

But that'd be a different issue and outside the scope of this PR.

Agreed.

Should we change the test so it opens an SSL connection anyway?

I bet you can guess my opinion. I wonder about moving the TCP test to test_puma_server.rb. I haven't looked at it ot tested it...

@Jesus Jesus force-pushed the master branch 3 times, most recently from faf381d to a033bf4 Compare July 25, 2019 23:00
@Jesus
Copy link
Contributor Author

Jesus commented Jul 26, 2019

Test updated and branch rebased.

Added Greg as co-writer for his help on the test.

@MSP-Greg
Copy link
Member

Added Greg as co-writer for his help on the test.

Thank you.

Test updated and branch rebased.

I hate it when that's needed. I think some of my open PR's may need the same...

I think I've got a possible fix for the 'tcp connection to ssl server' issue; it requires code in minissl.c, it needs to be tested (at a minimum) with both OpenSSL 1.1.1 and 1.0.2, etc. Definitely a weekend thing.

@MSP-Greg
Copy link
Member

@Jesus

Looks like it won't refuse it. I've checked this by doing the following experiment:

Looked again at this over lunch. When you have a minute, which version of OpenSSL was being used in your telnet test?

ruby -rpuma/puma_http11 -e "t = Puma::MiniSSL ; puts t::OPENSSL_LIBRARY_VERSION, t::OPENSSL_
VERSION"

@Jesus
Copy link
Contributor Author

Jesus commented Jul 26, 2019

$ ruby -rpuma/puma_http11 -e "t = Puma::MiniSSL ; puts t::OPENSSL_LIBRARY_VERSION, t::OPENSSL_VERSION"
OpenSSL 1.1.1b  26 Feb 2019
OpenSSL 1.1.1b  26 Feb 2019

@MSP-Greg
Copy link
Member

@Jesus

Thanks. I don't know if you can easily check another OpenSSL version, but I think I've confirmed this is not an issue with 1.0.2.

I think it's TLSv1.3 related, and hence, 1.1.0 would be ok, but I should check it.

I don't believe any info would be transmitted, but the connection should be closed/removed so it doesn't take resources that it shouldn't.

Thanks, and have a good weekend, Greg

@nateberkopec nateberkopec added this to the 4.1.0 milestone Jul 28, 2019
Co-authored-by: MSP-Greg <MSP-Greg@users.noreply.github.com>
@nateberkopec
Copy link
Member

Most of the PRs marked for 4.1.0 are currently gating on a maintainer meeting-of-minds to happen soon.

# EAGAIN if necessary but it seems like it'll misbehave on Windows.
# I don't have a Windows machine to debug this so I can't explain
# exactly whats happening in that OS. Please let me know if you
# find out!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSP-Greg can you verify/clarify this comment from @Jesus, so we can remove it based on your feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed:

@socket.read_nonblock(size, exception: false)

to

@socket.read_nonblock(size)

and made the other adjustments.

On Windows every job failed, but Travis passed. See:
https://ci.appveyor.com/project/MSP-Greg/puma/build/job/ykqnya8f3y1ehorf
https://travis-ci.org/MSP-Greg/puma/builds/567079580

I'll investigate further when I have more time, it may have something to do with the method returning nil, but I'm not sure.

Regardless, this seems to be a Windows quirk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI, I quickly checked Ruby's OpenSSL tests, and also WEBrick. Anywhere that read_nonblock could raise an error, exception: false was used. Interesting thing is that many of the commits were done before testing was done on Windows...

@nateberkopec
Copy link
Member

@twalpole You can always fork and apply the patch yourself.

@nateberkopec
Copy link
Member

And I'd rather not do work on someone else's schedule for free. It will be done when it's done, thanks.

@Jesus
Copy link
Contributor Author

Jesus commented Aug 2, 2019

Hi @twalpole,

I don't speak for this project as I'm not part of the maintainers team. But I want to say please understand that people leading this project do it in their spare time so it's not cool to put pressure on them. That doesn't mean your comments aren't welcome.

In fact, I must thank you for reporting that bug a few days ago. It has helped uncovering a problem in Puma and that's great. So thanks!

@evanphx
Copy link
Member

evanphx commented Aug 3, 2019

@MSP-Greg can you confirm that this is ok to merge in current state then? I'm fine with how it is now.

@MSP-Greg
Copy link
Member

MSP-Greg commented Aug 3, 2019

@evanphx

Yes, please merge.

@evanphx evanphx merged commit 0a94347 into puma:master Aug 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persistent Connections monopolize a thread
4 participants