Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) #736

ndbroadbent · 2019-12-07T20:14:46Z

See the description and comments in #735

These tests are run in parallel threads, so there are no guarantees about the execution order.

I've also removed the Timeout.timeout block around each example, because that seemed to swallow the actual error and make it much harder to figure out why random tests were crashing. (And nothing seems to be hanging anymore.)

EDIT: I'll also throw in the commit to cache the gems in spec/fixtures/rails51/vendor/bundle and spec/fixtures/rails52/vendor/bundle, because that speeds up the CI builds.

grosser · 2019-12-07T21:42:48Z

spec/spec_helper.rb

@@ -188,12 +188,6 @@ def setup_runtime_log

  config.raise_errors_for_deprecations!

-  # sometimes stuff hangs -> do not hang everything
-  config.include(Module.new {def test_timeout;30;end })


still need this though ?
... having tests just stop and hang is hard to debug :(

Sure I can re-add that! But I think the Timeout block was actually swallowing some errors and making it super hard to figure out this test failure, and when I removed it the test started failing consistently, so it was easy to fix. So my theory is that those regular timeout errors might have actually been caused by this ordered expectation.

But I'll run my delayed commit script again for a while and create 15 more builds, and will check to see if there's any other flaky tests.

I remember when I last worked on stabilizing CI here, I also temporarily removed this because it seemed to be indeed swallowing errors somehow. 👍

bringing them back in #741 with a dedicated class and note on debugging

ndbroadbent · 2019-12-08T09:54:22Z

Heh I found one more flaky test. The test checked that the output doesn't include 222, but the time in the test statistics happened to include 0.09222 seconds:

Failures:
  1) CLI can run with given files
     Failure/Error: expect(result).not_to include('222')
       expected "2 processes for 2 specs, ~ 1 specs per process\n111\nNo examples found.\n\n\nFinished in 0.00042 sec...ok 0.09222 seconds to load)\n0 examples, 0 failures\n\n\n0 examples, 0 failures\n\nTook 1 seconds\n" not to include "222"
     # ./spec/integration_spec.rb:277:in `block (2 levels) in <top (required)>'

So I've fixed that one too. Otherwise, the tests seem to be very stable now, and nothing is hanging or timing out. So I think the Timeout block really was actually causing this issue, and it was hiding the true source of the error (the ordered expectation failing when the threads run in a different order.)

ndbroadbent · 2019-12-08T11:28:13Z

One more flaky test: https://ci.appveyor.com/project/grosser/parallel-tests/builds/29391856/job/8j2dqxd7duf10gra

Failures:
  1) CLI can show simulated output when serializing stdout
     Failure/Error: expect(result).to match(/\.{4}.*TEST1.*\.{4}.*TEST2/m)
       expected "2 processes for 2 specs, ~ 1 specs per process\n.............\nTEST1\n\n.\n\n\n\nFinished in 0.51565...0937 seconds to load)\n\n1 example, 0 failures\n\n\n\n.\n2 examples, 0 failures\n\nTook 2 seconds\n" to match /\.{4}.*TEST1.*\.{4}.*TEST2/m
       Diff:
       @@ -1,2 +1,32 @@
       -/\.{4}.*TEST1.*\.{4}.*TEST2/m
       +2 processes for 2 specs, ~ 1 specs per process
       +.............
       +TEST1
       +
       +.
       +
       +
       +
       +Finished in 0.51565 seconds (files took 0.09375 seconds to load)
       +
       +1 example, 0 failures
       +
       +
       +
       +...
       +TEST2
       +
       +.
       +
       +
       +
       +Finished in 1.02 seconds (files took 0.10937 seconds to load)
       +
       +1 example, 0 failures
       +
       +
       +
       +.
       +2 examples, 0 failures
       +
       +Took 2 seconds
     # ./spec/integration_spec.rb:165:in `block (2 levels) in <top (required)>'

On my machine, the result is usually:

2 processes for 2 specs, ~ 1 specs per process
..........
TEST1
.

Finished in 0.50502 seconds (files took 0.08053 seconds to load)
1 example, 0 failures

.....
TEST2
.

Finished in 1.01 seconds (files took 0.08033 seconds to load)
1 example, 0 failures

And then occasionally there is one fewer dot:

2 processes for 2 specs, ~ 1 specs per process
...........
TEST1
.

Finished in 0.50482 seconds (files took 0.08965 seconds to load)
1 example, 0 failures

....
TEST2
.

Finished in 1.01 seconds (files took 0.08978 seconds to load)
1 example, 0 failures

But for this CI build, the output only included 3 dots before TEST2:

...
TEST2

So it didn't match the regex: expect(result).to match(/\.{4}.*TEST1.*\.{4}.*TEST2/m)

It seems like the value of 'PARALLEL_TEST_HEARTBEAT_INTERVAL' => '0.1' is right on the edge of causing some sporadic test failures. So I decreased this by a factor of 10 to ensure that the test never fails again:

After

'PARALLEL_TEST_HEARTBEAT_INTERVAL' => '0.01' produces this result:

2 processes for 2 specs, ~ 1 specs per process
............................................................................................
TEST1
.

Finished in 0.50375 seconds (files took 0.087 seconds to load)
1 example, 0 failures

............................................
TEST2
.

Finished in 1 second (files took 0.08725 seconds to load)
1 example, 0 failures

grosser · 2019-12-09T01:07:01Z

.travis.yml

-  - ruby-head
-  - jruby-head
+  # - ruby-head
+  # - jruby-head


need that back ?

Oh yeah, I commented those out because they are consistently failing and adding 23 minutes to the build times, because Travis CI runs all of the builds in a sequence (not in parallel.)

See:

https://travis-ci.org/grosser/parallel_tests/builds/621971355?utm_medium=notification&utm_source=github_status

https://travis-ci.org/grosser/parallel_tests/builds/621971005?utm_medium=notification&utm_source=github_status

https://travis-ci.org/grosser/parallel_tests/builds/621957792?utm_medium=notification&utm_source=github_status

I had a quick look to see if I could fix those builds as well, but it was a bit difficult. The ruby-head one was particularly weird, because it's just getting stuck at the installation/gemset stage with RVM and timing out after 10m of inactivity. I found these related issues but couldn't find a solution:

RVM install stuck while importing gemset global.gems rvm/rvm#3344

Something to do with DNS timeouts?

Installation of ruby stucks on importing gems rvm/rvm#2455

But I will uncomment these!

…are run in parallel threads and there are no guarantees about the execution order

…up CI builds

…t expectations. (Failing build: https://travis-ci.org/grosser/parallel_tests/jobs/622216736?utm_medium=notification&utm_source=github_status)

…ure (example: https://ci.appveyor.com/project/grosser/parallel-tests/builds/29391856/job/8j2dqxd7duf10gra)

grosser reviewed Dec 7, 2019

View reviewed changes

ndbroadbent force-pushed the fix_flaky_order branch from 614dae8 to 950777e Compare December 8, 2019 11:29

ndbroadbent changed the title ~~Fix flaky test by removing the .ordered expectation~~ Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) Dec 8, 2019

grosser reviewed Dec 9, 2019

View reviewed changes

ndbroadbent added 4 commits December 9, 2019 15:07

Fix flaky test by removing the .ordered expectation, since the tests …

2d41658

…are run in parallel threads and there are no guarantees about the execution order

Cache gems in spec/fixtures/{rails51,rails52}/vendor/bundle to speed …

550eafb

…up CI builds

Fix another flaky test by adding a "TEST" prefix to more of the outpu…

a793d8f

…t expectations. (Failing build: https://travis-ci.org/grosser/parallel_tests/jobs/622216736?utm_medium=notification&utm_source=github_status)

Decrease PARALLEL_TEST_HEARTBEAT_INTERVAL to fix occasional test fail…

e48578e

…ure (example: https://ci.appveyor.com/project/grosser/parallel-tests/builds/29391856/job/8j2dqxd7duf10gra)

ndbroadbent force-pushed the fix_flaky_order branch from 950777e to e48578e Compare December 9, 2019 08:08

grosser merged commit d6872cf into grosser:master Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) #736

Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) #736

ndbroadbent commented Dec 7, 2019 •

edited

grosser Dec 7, 2019

ndbroadbent Dec 8, 2019

deivid-rodriguez Dec 13, 2019

grosser Dec 18, 2019

ndbroadbent commented Dec 8, 2019 •

edited

ndbroadbent commented Dec 8, 2019

grosser Dec 9, 2019

ndbroadbent Dec 9, 2019

Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) #736

Fix a number of flaky tests (.ordered expectation in multiple threads, and unstable output expectations) #736

Conversation

ndbroadbent commented Dec 7, 2019 • edited

grosser Dec 7, 2019

Choose a reason for hiding this comment

ndbroadbent Dec 8, 2019

Choose a reason for hiding this comment

deivid-rodriguez Dec 13, 2019

Choose a reason for hiding this comment

grosser Dec 18, 2019

Choose a reason for hiding this comment

ndbroadbent commented Dec 8, 2019 • edited

ndbroadbent commented Dec 8, 2019

After

grosser Dec 9, 2019

Choose a reason for hiding this comment

ndbroadbent Dec 9, 2019

Choose a reason for hiding this comment

ndbroadbent commented Dec 7, 2019 •

edited

ndbroadbent commented Dec 8, 2019 •

edited