Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) #736

Merged
merged 4 commits into from Dec 18, 2019

Conversation

ndbroadbent
Copy link
Contributor

@ndbroadbent ndbroadbent commented Dec 7, 2019

See the description and comments in #735

These tests are run in parallel threads, so there are no guarantees about the execution order.

I've also removed the Timeout.timeout block around each example, because that seemed to swallow the actual error and make it much harder to figure out why random tests were crashing. (And nothing seems to be hanging anymore.)

EDIT: I'll also throw in the commit to cache the gems in spec/fixtures/rails51/vendor/bundle and spec/fixtures/rails52/vendor/bundle, because that speeds up the CI builds.

@@ -188,12 +188,6 @@ def setup_runtime_log

config.raise_errors_for_deprecations!

# sometimes stuff hangs -> do not hang everything
config.include(Module.new {def test_timeout;30;end })
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need this though ?
... having tests just stop and hang is hard to debug :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can re-add that! But I think the Timeout block was actually swallowing some errors and making it super hard to figure out this test failure, and when I removed it the test started failing consistently, so it was easy to fix. So my theory is that those regular timeout errors might have actually been caused by this ordered expectation.

But I'll run my delayed commit script again for a while and create 15 more builds, and will check to see if there's any other flaky tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember when I last worked on stabilizing CI here, I also temporarily removed this because it seemed to be indeed swallowing errors somehow. 👍

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bringing them back in #741 with a dedicated class and note on debugging

@ndbroadbent
Copy link
Contributor Author

ndbroadbent commented Dec 8, 2019

Heh I found one more flaky test. The test checked that the output doesn't include 222, but the time in the test statistics happened to include 0.09222 seconds:

Failures:
  1) CLI can run with given files
     Failure/Error: expect(result).not_to include('222')
       expected "2 processes for 2 specs, ~ 1 specs per process\n111\nNo examples found.\n\n\nFinished in 0.00042 sec...ok 0.09222 seconds to load)\n0 examples, 0 failures\n\n\n0 examples, 0 failures\n\nTook 1 seconds\n" not to include "222"
     # ./spec/integration_spec.rb:277:in `block (2 levels) in <top (required)>'

So I've fixed that one too. Otherwise, the tests seem to be very stable now, and nothing is hanging or timing out. So I think the Timeout block really was actually causing this issue, and it was hiding the true source of the error (the ordered expectation failing when the threads run in a different order.)

@ndbroadbent
Copy link
Contributor Author

One more flaky test: https://ci.appveyor.com/project/grosser/parallel-tests/builds/29391856/job/8j2dqxd7duf10gra

Failures:
  1) CLI can show simulated output when serializing stdout
     Failure/Error: expect(result).to match(/\.{4}.*TEST1.*\.{4}.*TEST2/m)
       expected "2 processes for 2 specs, ~ 1 specs per process\n.............\nTEST1\n\n.\n\n\n\nFinished in 0.51565...0937 seconds to load)\n\n1 example, 0 failures\n\n\n\n.\n2 examples, 0 failures\n\nTook 2 seconds\n" to match /\.{4}.*TEST1.*\.{4}.*TEST2/m
       Diff:
       @@ -1,2 +1,32 @@
       -/\.{4}.*TEST1.*\.{4}.*TEST2/m
       +2 processes for 2 specs, ~ 1 specs per process
       +.............
       +TEST1
       +
       +.
       +
       +
       +
       +Finished in 0.51565 seconds (files took 0.09375 seconds to load)
       +
       +1 example, 0 failures
       +
       +
       +
       +...
       +TEST2
       +
       +.
       +
       +
       +
       +Finished in 1.02 seconds (files took 0.10937 seconds to load)
       +
       +1 example, 0 failures
       +
       +
       +
       +.
       +2 examples, 0 failures
       +
       +Took 2 seconds
     # ./spec/integration_spec.rb:165:in `block (2 levels) in <top (required)>'

On my machine, the result is usually:

2 processes for 2 specs, ~ 1 specs per process
..........
TEST1
.

Finished in 0.50502 seconds (files took 0.08053 seconds to load)
1 example, 0 failures

.....
TEST2
.

Finished in 1.01 seconds (files took 0.08033 seconds to load)
1 example, 0 failures

And then occasionally there is one fewer dot:

2 processes for 2 specs, ~ 1 specs per process
...........
TEST1
.

Finished in 0.50482 seconds (files took 0.08965 seconds to load)
1 example, 0 failures

....
TEST2
.

Finished in 1.01 seconds (files took 0.08978 seconds to load)
1 example, 0 failures

But for this CI build, the output only included 3 dots before TEST2:

...
TEST2

So it didn't match the regex: expect(result).to match(/\.{4}.*TEST1.*\.{4}.*TEST2/m)

It seems like the value of 'PARALLEL_TEST_HEARTBEAT_INTERVAL' => '0.1' is right on the edge of causing some sporadic test failures. So I decreased this by a factor of 10 to ensure that the test never fails again:

After

'PARALLEL_TEST_HEARTBEAT_INTERVAL' => '0.01' produces this result:

2 processes for 2 specs, ~ 1 specs per process
............................................................................................
TEST1
.

Finished in 0.50375 seconds (files took 0.087 seconds to load)
1 example, 0 failures

............................................
TEST2
.

Finished in 1 second (files took 0.08725 seconds to load)
1 example, 0 failures

@ndbroadbent ndbroadbent changed the title Fix flaky test by removing the .ordered expectation Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) Dec 8, 2019
.travis.yml Outdated
- ruby-head
- jruby-head
# - ruby-head
# - jruby-head
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need that back ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I commented those out because they are consistently failing and adding 23 minutes to the build times, because Travis CI runs all of the builds in a sequence (not in parallel.)

See:

I had a quick look to see if I could fix those builds as well, but it was a bit difficult. The ruby-head one was particularly weird, because it's just getting stuck at the installation/gemset stage with RVM and timing out after 10m of inactivity. I found these related issues but couldn't find a solution:

But I will uncomment these!

@grosser grosser merged commit d6872cf into grosser:master Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants