New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update test_integration files per PR #1956 #1965
Conversation
Per #1956 (comment) This is the first PR mentioned, as it contains updated tests. This will fail on |
rescue | ||
assert_operator 10, :>=, resets , msg | ||
|
||
assert_operator 20, :<=, refused , msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a pretty big change. Wasn't the original test that all responses must be refused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be incorrect, but I think the updated test is probably more accurate. GIven that the system may be loaded down, resource starved, parallel tests, etc, it will take some finite amount of time to shut down.
I figured this would define a baseline for that time, it may be able to be improved, but it certainly will show if the code if changed and the time is affected for the worse.
EDIT: This was something I hope can be looked at more in the future. I don't recall if the 'resets' assert was OS dependent, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still can't quite get past this. This test wasn't failing intermittently on Linux/Windows before, so I don't like that it is (?) now and we need this "<" stuff.
# wait for boot from `events.on_booted` | ||
wait.sysread 1 | ||
# used with thread_run to define correct 'refused' errors | ||
def thread_run_refused(unix: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like the normalization of deviance to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It bothers me also. Given that it seems to only apply to macOS and it's intermittent...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's at least only allow the deviance there. I'd prefer we just skip the test entirely on Darwin with a note, rather than allow all platforms to slide backward.
The changes to #term_closes_listeners feel weird to me. It feels like we're just normalizing deviance of the test, which was originally that all connections were refused. Thanks for this, I realize this PR probably took a lot of effort! The new integration test suite is a joy, it's so much nicer than it used to be. |
|
No, never 😞 |
Which kind of gets back to what's a better indicator of Puma use, testing locally or testing on CI? And if CI testing has intermittent issues, how to move forward... Anyway, any interest in merging #1886 and/or #1952 (or something similar) so this is passing, then start making the changes discussed? |
Pull in #1952 for sure. For #1886, you can try it, but I would want to see this assertion uncommented and passing. |
I'll try it in a branch based on this. |
I think I've gotten the tests to a 'reasonable' state. Several builds passed on both Actions and Travis. https://github.com/MSP-Greg/puma/commits/update-test-integration-bind-path Three commits:
I'm sure I've forgotten some, but the main questions are: A. If you have time for a quick review, is this ok? B. If A, how would you like it divided up? C. I think I've adjusted for most of the items in your review comments. |
last = phase0_worker_pids.last | ||
phase0_worker_pids.each do |pid| | ||
Process.kill :TERM, pid | ||
sleep 4 unless pid == last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 4? Please leave a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say this because ocasionally in integration tests we have to sleep specific times for specific reasons, so if there are reasons, they should be documented.
rescue | ||
assert_operator 10, :>=, resets , msg | ||
|
||
assert_operator 20, :<=, refused , msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still can't quite get past this. This test wasn't failing intermittently on Linux/Windows before, so I don't like that it is (?) now and we need this "<" stuff.
17c1d29
to
fe6ddcd
Compare
I think I fixed everything, except a previously duplicated 'term?' method in cluster.rb. See: Commits on the branch that I used above are at: I updated the tests, created a new branch from it, then added PR's #1952, #1970, & #1973, along with an Actions yml file. All passed, importantly, I believe the tests are stable... |
Question - should the :INFO signal be tested against on JRuby? |
Yeah, I know.
I created the new test to clearly spread out the workers a bit more. I was debugging the output of the old test, and felt something different was needed. The new tests clearly show that there is a small amount of time where request handling is indeterminate, which is reasonable. The original PR mentions shutting down in relation to load balancers, which is odd, as I would think that one would shut down a server after it was removed from the load balancer, not before. I kept the tests because it is a metric that shouldn't change. If code is introduced that increases the number of resets, something may be wrong. |
Just pushed with added comments. Sorry, should have added [skip CI]... |
How about we do this instead: block until the connection closes (received 1 conn closed), then send 30 requests, expect all to be rejected (I think I might have gotten the states wrong here, but basically keep sending up to X_LARGE_NUMBER requests until the conn gets into the state we want, then send like 30 more and expect it to stay there). |
Maybe we could try this for a while and see how well it works? I'm real leery of of tests that wait for state changes. Obviously, we do have to wait for a booting server to become stable, but past that, nothing in the real world is waiting for the server, our tests also should not. Doing so often leads to false positives, and we've had enough problems with those... Off topic: I've contributed to several gems/repos testing, but I did a lot of work getting the two main Ruby test suites stable and passing on Windows, There were also stability issues with parallel testing on all platforms. It was common for particular tests to fail on cloud CI but pass locally or non-parallel. Total test count also jumped around in parallel testing. Re Puma, it's often running with an app that is using far more resources than Puma itself (disk IO, memory, OS calls, etc). Hence, running tests locally is really about as far from a good test environment as one could get. We can't duplicate the 'resource hungry app' in CI (or locally), but running parallel is a good start. In some respects, this is kind of a 'point of view' thing, but one particular issue is creating a passing test locally, then finding out it fails or isn't stable in CI. Modifying it for CI may create a false positive test. In some respects, I'm lucky that I can't run the tests locally... |
I see this as being similar to booting. You're moving from state A to state B. It's important that we don't go to state A again or even to state C, we're just in State B forever. I've seen the Mac test fail in such crazy ways (like reset to refusted back to reset and then to success) and we need to capture those fails, not say that going from A-> B -> A || C is ok if you only do it a few times. My proposal captures this. |
I've seen that with the current test, and that's the reason I changed how the request threading works. Rather than try and determine what state the server is in, should we see if the array divides into three arrays, the first containing only 'responds', the second containing only I'll make a branch from this and add the actual array to the debug output to check if it's interleaved... |
That works for me! |
Don't know if you saw it, it was an edit, I added: For now, I'll rebase since all the 'lib' PR's needed for this to work are merged. I'll start on the interleaved issue on my fork after that. |
7bfd490
to
1bd0711
Compare
test/test_integration_single.rb
Outdated
@@ -71,28 +73,28 @@ def test_term_not_accepts_new_connections | |||
def test_int_signal_with_background_thread_in_jruby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name could be updated to test_int_shutdown
or test_int_exit
while this file is being modified.
Also, I think skip_unless :jruby
can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but not yet pushed.
Took a few more breaks than normal and added interleave. But, the interleave can only be tested between :reset & :refused. Since the requests have a delay of one second in the 'app', they continue into the :refused sections. Removing the delay will remove all :reset entries, as I believe they're raised when the accepted request is read? I added the array in debug output, see: https://github.com/MSP-Greg/puma/commit/44fc57b5e3b21de6846dfded01b02817abbb7462/checks It's at the bottom of the test step in all jobs but windows. Also, #1952 is needed. I believe this will pass and be stable when that is added. Lastly, :INT signal doesn't work on Windows... |
The main issue with the above is that I kept with the 'original' concept for loading the 'replies' array with request information. That concept loaded the array in the order that requests were handled, whether that be an actual response or raising an error. Given that the 'app' was sleeping, the array order did not match the 'sent' order. That array is helpful, but not optimal for our needs. Light bulb went off, and I wrote an ru file that allows loading the array in the order that the requests are sent. Much better. Using that concept for the array, the array (tcp only) shows three non-interleaved sets, the first being responses, second is resets, and third is refused. Unix sockets never seem to generate any :reset members... Haven't moved the code back into this branch yet. |
1bd0711
to
6f080d6
Compare
test_integration_cluster.rb Request handling during server TERM - two tests `#test_term_closes_listeners_tcp` `#test_term_closes_listeners_unix` using `#term_closes_listeners` Send requests 10 per second. Send 10, then :TERM server, then send another 30. No more than 10 should throw Errno::ECONNRESET. Request handling during phased restart - two tests `#test_usr1_all_respond_tcp` `#test_usr1_all_respond_unix` using `#usr1_all_respond` Send requests 1 per second. Send 1, then :USR1 server, then send another 24. All should be responded to, and at least three workers should be used Stuck worker tests - two tests `#test_stuck_external_term_spawn` Tests whether externally TERM'd 'stuck' workers are proper re-spawned. `#test_stuck_phased_restart` Tests whether 'stuck' workers are properly shutdown during phased-restart. helper files/methods changes 1. helper file changes to allow binding to TCP or UNIX, see kwarg unix: 2. Skip on Windows for signal TERM
Needs a rebase. |
6f080d6
to
ef49bbd
Compare
Done. Passed except for JRuby 9.2.8.0... |
Congrats, that was a lot of work @MSP-Greg 👏 |
Thanks. The intermittent tests were driving me crazy, probably the same for you. IDK about JRuby 9.2.8.0, seems to need more work. Now we'll move on to the next phase. I saw #1976, and wanted to write some tests for how Puma started when a UNIX control and/or bind socket existed already, or if the file existed, but no sockets. Couldn't help but think how much easier it would be if #1971 was in place... BTW, the gist is totally wrong, I've already got an update that I'll start working with. Rather than setting config via a call to |
test_integration_cluster.rb
Request handling during server TERM - two tests
#test_term_closes_listeners_tcp
#test_term_closes_listeners_unix
using
#term_closes_listeners
Send requests 10 per second. Send 10, then :TERM server, then send another 30.
No more than 10 should throw Errno::ECONNRESET.
Request handling during phased restart - two tests
#test_usr1_all_respond_tcp
#test_usr1_all_respond_unix
using
#usr1_all_respond
Send requests 1 per second. Send 1, then :USR1 server, then send another 24.
All should be responded to, and at least three workers should be used
Stuck worker tests - two tests
#test_stuck_external_term_spawn
Tests whether externally TERM'd 'stuck' workers are proper re-spawned.
#test_stuck_phased_restart
Tests whether 'stuck' workers are properly shutdown during phased-restart.
helper files/methods changes