Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update test_integration files per PR #1956 #1965

Merged
merged 6 commits into from Sep 19, 2019

Conversation

MSP-Greg
Copy link
Member

@MSP-Greg MSP-Greg commented Sep 13, 2019

test_integration_cluster.rb

Request handling during server TERM - two tests

#test_term_closes_listeners_tcp
#test_term_closes_listeners_unix

using #term_closes_listeners

Send requests 10 per second. Send 10, then :TERM server, then send another 30.
No more than 10 should throw Errno::ECONNRESET.

Request handling during phased restart - two tests

#test_usr1_all_respond_tcp
#test_usr1_all_respond_unix

using #usr1_all_respond

Send requests 1 per second. Send 1, then :USR1 server, then send another 24.
All should be responded to, and at least three workers should be used

Stuck worker tests - two tests

#test_stuck_external_term_spawn
Tests whether externally TERM'd 'stuck' workers are proper re-spawned.

#test_stuck_phased_restart
Tests whether 'stuck' workers are properly shutdown during phased-restart.

helper files/methods changes

  1. Changes to allow binding to TCP or UNIX, see kwarg unix:
  2. Skip on Windows for signal TERM

@MSP-Greg
Copy link
Member Author

Per #1956 (comment)

This is the first PR mentioned, as it contains updated tests. This will fail on #test_stuck_external_term_spawn if applied before PR #1952

rescue
assert_operator 10, :>=, resets , msg

assert_operator 20, :<=, refused , msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a pretty big change. Wasn't the original test that all responses must be refused?

Copy link
Member Author

@MSP-Greg MSP-Greg Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be incorrect, but I think the updated test is probably more accurate. GIven that the system may be loaded down, resource starved, parallel tests, etc, it will take some finite amount of time to shut down.

I figured this would define a baseline for that time, it may be able to be improved, but it certainly will show if the code if changed and the time is affected for the worse.

EDIT: This was something I hope can be looked at more in the future. I don't recall if the 'resets' assert was OS dependent, etc...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't quite get past this. This test wasn't failing intermittently on Linux/Windows before, so I don't like that it is (?) now and we need this "<" stuff.

# wait for boot from `events.on_booted`
wait.sysread 1
# used with thread_run to define correct 'refused' errors
def thread_run_refused(unix: false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the normalization of deviance to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bothers me also. Given that it seems to only apply to macOS and it's intermittent...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's at least only allow the deviance there. I'd prefer we just skip the test entirely on Darwin with a note, rather than allow all platforms to slide backward.

@nateberkopec
Copy link
Member

The changes to #term_closes_listeners feel weird to me. It feels like we're just normalizing deviance of the test, which was originally that all connections were refused.

Thanks for this, I realize this PR probably took a lot of effort! The new integration test suite is a joy, it's so much nicer than it used to be.

@MSP-Greg
Copy link
Member Author

The changes to #term_closes_listeners feel weird to me. It feels like we're just normalizing deviance of the test, which was originally that all connections were refused.

  1. Since you're macOS (?), have you checked any of these tests locally? I wonder if you see the intermittent failures? Anyway, see below...

  2. I wanted to get the integration tests stable. At some point, you gotta 'ship'.

  3. It's not the best comparison, but when building ruby-loco (Windows Ruby trunk), the total times on AppVeyor can run anywhere from about 30 minutes to 50 minutes. Both the build time and the test time vary. I haven't seen that much deviation on Travis or GH Actions. Also, there have been plenty of times where I could successfully run a test (for days) locally, but running it on AppVeyor either always failed or failed intermittently.

  4. I think CI is probably better than local. Maybe some 'TODO" notes need to be added to the tests.

@nateberkopec
Copy link
Member

I wonder if you see the intermittent failures?

No, never 😞

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 13, 2019

No, never 😞

Which kind of gets back to what's a better indicator of Puma use, testing locally or testing on CI? And if CI testing has intermittent issues, how to move forward...

Anyway, any interest in merging #1886 and/or #1952 (or something similar) so this is passing, then start making the changes discussed?

@nateberkopec
Copy link
Member

Pull in #1952 for sure.

For #1886, you can try it, but I would want to see this assertion uncommented and passing.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 13, 2019

For #1886, you can try it, but I would want to see this assertion uncommented and passing.

I'll try it in a branch based on this.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 14, 2019

@nateberkopec

I think I've gotten the tests to a 'reasonable' state. Several builds passed on both Actions and Travis.
See:

https://github.com/MSP-Greg/puma/commits/update-test-integration-bind-path

Three commits:

  1. Update test_integration files per PR puma#1956 - updates only the split up test files to changes similar to PR CI: Use Ruby 2.4.7, 2.5.6, 2.6.4 #1935

  2. Add Binder#close_unix_paths, test_integration_* parallelize_me! - both lib files and test files.

    2.1 Adds a Binder#close_unix_paths method, and a matching Launcher#close_binder_unix_paths method. I tried to add this to the existing methods in Launcher, but decided that it was easiest to add it to the end of the run method in Single and Cluster, since that is certainly where a clear 'finish/done/quit' is processed.

    2.2 lib files - changed most/all of the signal references from strings to symbols.

    2.3 test files - test/test_cli.rb - test_control_clustered - this was affecting Minitest's debug output, some additions that may not be needed, but it's clear what they do.

    2.4 test files - misc cleanup

  3. Add full_name for logging - some of this is in the previous commit. Since we may want to look at some test data over several builds/OS's, etc, I added an output section for the tests. Travis example is here.

I'm sure I've forgotten some, but the main questions are:

A. If you have time for a quick review, is this ok?

B. If A, how would you like it divided up?

C. I think I've adjusted for most of the items in your review comments.

test/helpers/integration.rb Show resolved Hide resolved
test/helpers/integration.rb Show resolved Hide resolved
last = phase0_worker_pids.last
phase0_worker_pids.each do |pid|
Process.kill :TERM, pid
sleep 4 unless pid == last
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4? Please leave a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say this because ocasionally in integration tests we have to sleep specific times for specific reasons, so if there are reasons, they should be documented.

rescue
assert_operator 10, :>=, resets , msg

assert_operator 20, :<=, refused , msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't quite get past this. This test wasn't failing intermittently on Linux/Windows before, so I don't like that it is (?) now and we need this "<" stuff.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 17, 2019

I think I fixed everything, except a previously duplicated 'term?' method in cluster.rb.

See:
https://github.com/MSP-Greg/puma/runs/225512262
and
https://travis-ci.org/MSP-Greg/puma/builds/586123815

Commits on the branch that I used above are at:
https://github.com/MSP-Greg/puma/commits/update-test-integration-lib

I updated the tests, created a new branch from it, then added PR's #1952, #1970, & #1973, along with an Actions yml file. All passed, importantly, I believe the tests are stable...

@MSP-Greg
Copy link
Member Author

Question - should the :INFO signal be tested against on JRuby?

@MSP-Greg
Copy link
Member Author

I still can't quite get past this.

Yeah, I know.

This test wasn't failing intermittently on Linux/Windows before, so I don't like that it is (?) now and we need this "<" stuff.

I created the new test to clearly spread out the workers a bit more. I was debugging the output of the old test, and felt something different was needed. The new tests clearly show that there is a small amount of time where request handling is indeterminate, which is reasonable. The original PR mentions shutting down in relation to load balancers, which is odd, as I would think that one would shut down a server after it was removed from the load balancer, not before.

I kept the tests because it is a metric that shouldn't change. If code is introduced that increases the number of resets, something may be wrong.

@MSP-Greg
Copy link
Member Author

Just pushed with added comments. Sorry, should have added [skip CI]...

@nateberkopec
Copy link
Member

The new tests clearly show that there is a small amount of time where request handling is indeterminate, which is reasonable.

How about we do this instead: block until the connection closes (received 1 conn closed), then send 30 requests, expect all to be rejected (I think I might have gotten the states wrong here, but basically keep sending up to X_LARGE_NUMBER requests until the conn gets into the state we want, then send like 30 more and expect it to stay there).

@nateberkopec nateberkopec added this to the 4.2.0 milestone Sep 18, 2019
@MSP-Greg MSP-Greg mentioned this pull request Sep 18, 2019
@MSP-Greg
Copy link
Member Author

How about we do this instead

Maybe we could try this for a while and see how well it works? I'm real leery of of tests that wait for state changes.

Obviously, we do have to wait for a booting server to become stable, but past that, nothing in the real world is waiting for the server, our tests also should not. Doing so often leads to false positives, and we've had enough problems with those...

Off topic:

I've contributed to several gems/repos testing, but I did a lot of work getting the two main Ruby test suites stable and passing on Windows, There were also stability issues with parallel testing on all platforms. It was common for particular tests to fail on cloud CI but pass locally or non-parallel. Total test count also jumped around in parallel testing.

Re Puma, it's often running with an app that is using far more resources than Puma itself (disk IO, memory, OS calls, etc). Hence, running tests locally is really about as far from a good test environment as one could get. We can't duplicate the 'resource hungry app' in CI (or locally), but running parallel is a good start.

In some respects, this is kind of a 'point of view' thing, but one particular issue is creating a passing test locally, then finding out it fails or isn't stable in CI. Modifying it for CI may create a false positive test. In some respects, I'm lucky that I can't run the tests locally...

@nateberkopec
Copy link
Member

nothing in the real world is waiting for the server, our tests also should not. Doing so often leads to false positives, and we've had enough problems with those...

I see this as being similar to booting. You're moving from state A to state B. It's important that we don't go to state A again or even to state C, we're just in State B forever.

I've seen the Mac test fail in such crazy ways (like reset to refusted back to reset and then to success) and we need to capture those fails, not say that going from A-> B -> A || C is ok if you only do it a few times. My proposal captures this.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 18, 2019

I've seen the Mac test fail in such crazy ways (like reset to refusted back to reset and then to success)

I've seen that with the current test, and that's the reason I changed how the request threading works.

Rather than try and determine what state the server is in, should we see if the array divides into three arrays, the first containing only 'responds', the second containing only :resets, and the third containing only :refused? IOW, the array has no 'interleaving' of the three?

I'll make a branch from this and add the actual array to the debug output to check if it's interleaved...

@nateberkopec
Copy link
Member

should we see if the array divides into three arrays, the first containing only 'responds', the second containing only :resets, and the third containing only :refused? IOW, the array has no 'interleaving' of the three?

That works for me!

@MSP-Greg
Copy link
Member Author

Don't know if you saw it, it was an edit, I added:
I'll make a branch from this and add the actual array to the debug output to check if it's interleaved...

For now, I'll rebase since all the 'lib' PR's needed for this to work are merged. I'll start on the interleaved issue on my fork after that.

@@ -71,28 +73,28 @@ def test_term_not_accepts_new_connections
def test_int_signal_with_background_thread_in_jruby
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name could be updated to test_int_shutdown or test_int_exit while this file is being modified.
Also, I think skip_unless :jruby can be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but not yet pushed.

@MSP-Greg
Copy link
Member Author

@nateberkopec

Took a few more breaks than normal and added interleave. But, the interleave can only be tested between :reset & :refused. Since the requests have a delay of one second in the 'app', they continue into the :refused sections. Removing the delay will remove all :reset entries, as I believe they're raised when the accepted request is read?

I added the array in debug output, see:

https://github.com/MSP-Greg/puma/commit/44fc57b5e3b21de6846dfded01b02817abbb7462/checks

It's at the bottom of the test step in all jobs but windows. Also, #1952 is needed. I believe this will pass and be stable when that is added. Lastly, :INT signal doesn't work on Windows...

@MSP-Greg
Copy link
Member Author

The main issue with the above is that I kept with the 'original' concept for loading the 'replies' array with request information. That concept loaded the array in the order that requests were handled, whether that be an actual response or raising an error. Given that the 'app' was sleeping, the array order did not match the 'sent' order. That array is helpful, but not optimal for our needs.

Light bulb went off, and I wrote an ru file that allows loading the array in the order that the requests are sent.

Much better. Using that concept for the array, the array (tcp only) shows three non-interleaved sets, the first being responses, second is resets, and third is refused. Unix sockets never seem to generate any :reset members...

Haven't moved the code back into this branch yet.

@nateberkopec nateberkopec added the waiting-for-changes Waiting on changes from the requestor label Sep 19, 2019
test_integration_cluster.rb

Request handling during server TERM - two tests

`#test_term_closes_listeners_tcp`
`#test_term_closes_listeners_unix`

using `#term_closes_listeners`

Send requests 10 per second.  Send 10, then :TERM server, then send another 30.
No more than 10 should throw Errno::ECONNRESET.

Request handling during phased restart - two tests

`#test_usr1_all_respond_tcp`
`#test_usr1_all_respond_unix`

using `#usr1_all_respond`

Send requests 1 per second.  Send 1, then :USR1 server, then send another 24.
All should be responded to, and at least three workers should be used

Stuck worker tests - two tests

`#test_stuck_external_term_spawn`
Tests whether externally TERM'd 'stuck' workers are proper re-spawned.

`#test_stuck_phased_restart`
Tests whether 'stuck' workers are properly shutdown during phased-restart.

helper files/methods changes

1. helper file changes to allow binding to TCP or UNIX, see kwarg unix:
2. Skip on Windows for signal TERM
@nateberkopec
Copy link
Member

Needs a rebase.

@MSP-Greg
Copy link
Member Author

Needs a rebase.

Done. Passed except for JRuby 9.2.8.0...

@nateberkopec nateberkopec merged commit 4c8d4d6 into puma:master Sep 19, 2019
@nateberkopec
Copy link
Member

Congrats, that was a lot of work @MSP-Greg 👏

@MSP-Greg
Copy link
Member Author

Thanks. The intermittent tests were driving me crazy, probably the same for you. IDK about JRuby 9.2.8.0, seems to need more work.

Now we'll move on to the next phase. I saw #1976, and wanted to write some tests for how Puma started when a UNIX control and/or bind socket existed already, or if the file existed, but no sockets.

Couldn't help but think how much easier it would be if #1971 was in place...

BTW, the gist is totally wrong, I've already got an update that I'll start working with. Rather than setting config via a call to cli_server, we need a method like setup_puma which would be the first thing done in a test. That allows some skip statements to move into it, and also allows for tests that require something created before starting Puma, like in #1976...

@MSP-Greg MSP-Greg deleted the update-test-integration branch September 20, 2020 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-changes Waiting on changes from the requestor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants