Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in one test can cause trailing failures #1432

Closed
smcclure15 opened this issue Feb 21, 2020 · 4 comments · Fixed by #1642
Closed

Exception in one test can cause trailing failures #1432

smcclure15 opened this issue Feb 21, 2020 · 4 comments · Fixed by #1642
Assignees
Labels
Component: Core For module, test, hooks, and runner. Type: Bug

Comments

@smcclure15
Copy link
Member

Tell us about your runtime:

  • QUnit version: 2.9.3
  • What environment are you running QUnit in? (e.g., browser, Node): Observed in both Browser & Node

What are you trying to do?

Code that reproduces the problem:

Simple repro:

QUnit.module('Trailing failures on exception', {
  afterEach: assert => {
    let done = assert.async();
    setTimeout(done, 50);
  }
}, function () {
  QUnit.test('Test A', assert => {
    assert.ok(true);
    let done = assert.async();
    setTimeout(done, 10);
    throw new Error('oops');
  });
  QUnit.test('Test B', assert => {
    assert.ok(true);
  });
});

What did you expect to happen?

Test A should fail, and Test B should pass.

What actually happened?

Both Test A and Test B fail.

This is very similar to #1375 (I was running previously on Qunit 2.9.1, and hopeful that it was already fixed), but that does not resolve this. I avoided returning promises in an attempt to simplify it.

Added debug statements

I interspersed some debug help to better understand the lifecycles; maybe this can shine some light on the root cause and/or solution:

QUnit.module('Trailing failures on exception', {
  afterEach: assert => {
    let name = assert.test.testName;
    assert.ok(true, `entering afterEach for ${name}`);
    let done = assert.async();
    setTimeout(() => {
      // "done" marker from Test 1 resumes QUnit during this timeout
      console.log(`>>> done afterEach for ${name}`);
      done();
    }, 50);
  }
}, function () {
  QUnit.test('Test A', assert => {
    assert.ok(true, 'in test A');
    let done = assert.async();
    setTimeout(() => {
      assert.ok(true, 'done test A');
      done();
    }, 10);
    throw new Error('oops');
  });
  QUnit.test('Test B', assert => {
    // this test fails because afterEach from Test 1 resumes here
    assert.ok(true, 'in test B');
  });
});

QUnit.log(({message}) => {
  // just to help with tracking lifecycles
  console.log(`>>> ${message}`);
});

This produces the following TAP output (snipping my path a bit):

% qunit repro.js
TAP version 13
>>> in test A
>>> Died on test #2     at Object.<anonymous> (/SNIP/repro.js:16:9): oops
>>> entering afterEach for Test A
>>> done test A
not ok 1 Trailing failures on exception > Test A
  ---
  message: "Died on test #2     at Object.<anonymous> (/SNIP/repro.js:16:9): oops"
  severity: failed
  actual: null
  expected: undefined
  stack: Error: oops
    at Object.QUnit.test.assert (/SNIP/repro.js:23:11)
  ...
>>> in test B
>>> entering afterEach for Test B
>>> done afterEach for Test A
>>> assert.async callback called after test finished.
>>> done afterEach for Test B
not ok 2 Trailing failures on exception > Test B
  ---
  message: "assert.async callback called after test finished."
  severity: failed
  actual: null
  expected: undefined
  stack: undefined:undefined
  ...
1..2
# pass 0
# skip 0
# todo 0
# fail 2

Notice that we enter Test B before being done the afterEach associated with Test A.

@trentmwillis trentmwillis added Component: Core For module, test, hooks, and runner. Type: Bug labels Mar 7, 2020
@trentmwillis
Copy link
Member

trentmwillis commented Mar 7, 2020

Thanks for the detailed bug report! Unsure when we'll get around to fixing it, but if you'd like to take a stab at a fix, that would be great 🙂

@smcclure15
Copy link
Member Author

The more I debug, the more I am convinced that this is correct functional behavior.

This comes down to the semaphore handling which makes these async/done's simply as start/stop's.
From the repro steps,

STOP is called in Test A
Test A throws an exception, the semaphore is reset (0), and we start to process afterEach
STOP is called in A's afterEach
START is called from the lingering setTimeout in Test A (semaphore = 0), which advances the framework to process Test B
Test B runs and completes
STOP is called in B's afterEach
START is called from A's afterEach, but we are in Test B still, which throws the error (A's done callback occurred after test A was finished)
START is again called from B's afterEach (no harm done, it seems)

So our semaphore gets offset, and the framework keeps rolling on after toggling these starts/stops.
This all sounds like solid logic, though with maybe some room for improvement? However, we wouldn't know the intent of the user's code, whether they made their done callback exception safe or not, and I worry we'd risk some false-negatives by doing something unexpected here.

Bottom line, the test failed, it is accurately recorded, but it causes later failures because it used setTimeout's to cause disruption down the line. I think this type of cascading effect is forgivable given we can't guarantee a "clean state" if we continue, as much as we can safely try. I see this as effectively the same ill-posed problem as the following:

    QUnit.test('Test A', assert => {
        assert.ok(1);
        setTimeout(() => {
            throw new Error('oops2');
        }, 10);
        throw new Error('oops1')
    });
    QUnit.test('Test B', assert => {
        assert.ok(1);
        return new Promise(resolve => setTimeout(resolve, 100))
    });

Test A fails because of "oop1" being thrown in the test, but Test B tests fail because that "oops2" error is finally thrown during its runtime. Which makes sense, and the framework shouldn't try to intercept this because Test A had previously thrown an exception.

I would be onboard with closing this issue if others agree with the assessment/philosophy.

@Krinkle
Copy link
Member

Krinkle commented Nov 11, 2020

I'm inclined to agree, but I do think we should make sure that if this happens to you, that you have the best information available.

When a test fails, we kinda have to move on, since the failure could mean that we will never see a callback or settled promise. This justifies the transition from test-A to afterEach-A, and likewise the transition from afterEach-A to test-B.

What we could check:

  • Never start a afterEach unless the associated test was just completed (success or fail). In other words, it must not be possible for afterEach-A to begin after test-B has started or after test-B has failed. Is this already true today?
  • When an async callback or promose-test is resolved after the test ended, the error message should be clearly related to the test that it was started from. Either by mentioning the test in question or, if possible (e.g. in HTML reporter) by modifying the output of the original test and adding the extra error.
  • When an exception happens from an asynchronous stack trace, we can't easily know what test it relates to as it is caught from the global window.onerror handler. But we should in that case make sure the message clearly says that it was a "global error". It can render chronologically in the test wherever it happened, that seems fine and has the benefit of being right where you want it for the more common case of only one error happening during and it being caught during the test instead of after.

See also: #1377, #1478

@Krinkle
Copy link
Member

Krinkle commented Jul 28, 2021

I'm working on a patch that replaces our semaphore counter with a map of unique locks, thus making it impossible for a release callback from assert.async() to release an unrelated lock. I believe this was planned years ago when we implemented the QUnit 2 interface in our usual "fake it until you make it" manner (e.g. we implemented a thin interface in a 2.x release that added module scopes and scoped assert objects, but at the time they were still just aliasees to their global/stateless equivalents). We made them true instances in 2.x, but it seems we never got around to tightening up the semaphore.

The outcome of such patch would be that, in @smcclure15 's example above, the done() from the test's async lock won't (unintentionally) release the lock created in the afterEach() callback. Thus preventing test B from starting too early, and then encountering unexpected issues there.

Krinkle added a commit that referenced this issue Jul 29, 2021
== Background ==

When creating two async pauses in a test, it was possible for a test
to pass by invoking one of them twice, and the other not at all.

Easy scenario (though perhaps not realistic):

> Use `assert.async()` twice, assigned as done1 and done2 in the same
> `QUnit.test()` case, and then simulate the failure scenario such that
> you wrongly call done1 two times, and forget to call done2.

Complex scenario across `QUnit.test()` and "afterEach" hooks, since
these previously shared a single semaphore:

> Use `assert.async()` once in a simple test, and schedule the resume
> call in the future, but then fail with an uncaught error. The uncaught
> error is found and `Test.run()` would internally kill the pause by
> resetting the semaphore to zero (this make sense since we shouldn't
> wait for the release once the test is known to have failed).
> After this reset, we proceed to the "afterEach" hook. Suppose this
> hook is also async, and during its execution, the originally scheduled
> resume call happens. This would effectively end up releasing the
> afterEach's async pause despite not being finished yet, and then we
> proceed to the next test. That test would then fail when the afterEach's
> own release call happens, failing as "release during a different test".

This is the scenario of #1432.

Fix this and numerous other edge cases by making the returned
callbacks from `assert.async()` strict about which locks they release.

Each lock now adds a unique token to a map, and invoking the release
function decrements/removes this token from the map.

== Notes ==

* es6-map.js assigns the fallback in all browsers.
  This is a bug, to be fixed later.

* The `isNaN(semaphore)` logic was originally added in 2015 by ea3e350.
  At the time, the internal resume function was public, and NaN could
  emerge through `QUnit.start("bla")` as result of `semaphore += "bla"`.
  This has not been possible for a while. During PR #1590, I did not
  trace the origin of this code, and thus did not realize that it was
  already obsolete (the semaphore itself is not publicly supported).

* The "during different test" error is now almost impossible to
  trigger since we now kill pending locks during test failures and
  tolerate all late calls equally. This meant the `drooling-done.js`
  test case now fails in a more limited way.

  I added a new test case for coverage, that reproduces it still, but
  it's a lot more obscure – it requires the original test to pass and
  then also have an unexpected call during a different test.

Fixes #1432.
Krinkle added a commit that referenced this issue Jul 29, 2021
== Background ==

When creating two async pauses in a test, it was possible for a test
to pass by invoking one of them twice, and the other not at all.

Easy scenario (though perhaps not realistic):

> Use `assert.async()` twice, assigned as done1 and done2 in the same
> `QUnit.test()` case, and then simulate the failure scenario such that
> you wrongly call done1 two times, and forget to call done2.

Complex scenario across `QUnit.test()` and "afterEach" hooks, since
these previously shared a single semaphore:

> Use `assert.async()` once in a simple test, and schedule the resume
> call in the future, but then fail with an uncaught error. The uncaught
> error is found and `Test.run()` would internally kill the pause by
> resetting the semaphore to zero (this make sense since we shouldn't
> wait for the release once the test is known to have failed).
> After this reset, we proceed to the "afterEach" hook. Suppose this
> hook is also async, and during its execution, the originally scheduled
> resume call happens. This would effectively end up releasing the
> afterEach's async pause despite not being finished yet, and then we
> proceed to the next test. That test would then fail when the afterEach's
> own release call happens, failing as "release during a different test".

This is the scenario of #1432.

Fix this and numerous other edge cases by making the returned
callbacks from `assert.async()` strict about which locks they release.

Each lock now adds a unique token to a map, and invoking the release
function decrements/removes this token from the map.

== Notes ==

* es6-map.js assigns the fallback in all browsers.
  This is a bug, to be fixed later.

* The `isNaN(semaphore)` logic was originally added in 2015 by ea3e350.
  At the time, the internal resume function was public, and NaN could
  emerge through `QUnit.start("bla")` as result of `semaphore += "bla"`.
  This has not been possible for a while. During PR #1590, I did not
  trace the origin of this code, and thus did not realize that it was
  already obsolete (the semaphore itself is not publicly supported).

* The "during different test" error is now almost impossible to
  trigger since we now kill pending locks during test failures and
  tolerate all late calls equally. This meant the `drooling-done.js`
  test case now fails in a more limited way.

  I added a new test case for coverage, that reproduces it still, but
  it's a lot more obscure – it requires the original test to pass and
  then also have an unexpected call during a different test.

* I considered using the phrase "async lock" in the public-facing
  error messages, but found this perhaps too internal/technical
  when coming from the perspective of `var done = assert.async();`.

  In order to keep the code shared between handling of async-await,
  Promise, and assert.async, but remain friendly and understandable,
  I went for the phrase "async pause".

Fixes #1432.
Krinkle added a commit that referenced this issue Jul 29, 2021
== Background ==

When creating two async pauses in a test, it was possible for a test
to pass by invoking one of them twice, and the other not at all.

Easy scenario (though perhaps not realistic):

> Use `assert.async()` twice, assigned as done1 and done2 in the same
> `QUnit.test()` case, and then simulate the failure scenario such that
> you wrongly call done1 two times, and forget to call done2.

Complex scenario across `QUnit.test()` and "afterEach" hooks, since
these previously shared a single semaphore:

> Use `assert.async()` once in a simple test, and schedule the resume
> call in the future, but then fail with an uncaught error. The uncaught
> error is found and `Test.run()` would internally kill the pause by
> resetting the semaphore to zero (this make sense since we shouldn't
> wait for the release once the test is known to have failed).
> After this reset, we proceed to the "afterEach" hook. Suppose this
> hook is also async, and during its execution, the originally scheduled
> resume call happens. This would effectively end up releasing the
> afterEach's async pause despite not being finished yet, and then we
> proceed to the next test. That test would then fail when the afterEach's
> own release call happens, failing as "release during a different test".

This is the scenario of #1432.

Fix this and numerous other edge cases by making the returned
callbacks from `assert.async()` strict about which locks they release.

Each lock now adds a unique token to a map, and invoking the release
function decrements/removes this token from the map.

== Notes ==

* es6-map.js assigns the fallback in all browsers.
  This is a bug, to be fixed later.

* The `isNaN(semaphore)` logic was originally added in 2015 by ea3e350.
  At the time, the internal resume function was public, and NaN could
  emerge through `QUnit.start("bla")` as result of `semaphore += "bla"`.
  This has not been possible for a while. During PR #1590, I did not
  trace the origin of this code, and thus did not realize that it was
  already obsolete (the semaphore itself is not publicly supported).

* The "during different test" error is now almost impossible to
  trigger since we now kill pending locks during test failures and
  tolerate all late calls equally. This meant the `drooling-done.js`
  test case now fails in a more limited way.

  I added a new test case for coverage, that reproduces it still, but
  it's a lot more obscure – it requires the original test to pass and
  then also have an unexpected call during a different test.

* I considered using the phrase "async lock" in the public-facing
  error messages, but found this perhaps too internal/technical
  when coming from the perspective of `var done = assert.async();`.

  In order to keep the code shared between handling of async-await,
  Promise, and assert.async, but remain friendly and understandable,
  I went for the phrase "async pause".

Fixes #1432.
Krinkle added a commit that referenced this issue Aug 2, 2021
== Background ==

When creating two async pauses in a test, it was possible for a test
to pass by invoking one of them twice, and the other not at all.

Easy scenario (though perhaps not realistic):

> Use `assert.async()` twice, assigned as done1 and done2 in the same
> `QUnit.test()` case, and then simulate the failure scenario such that
> you wrongly call done1 two times, and forget to call done2.

Complex scenario across `QUnit.test()` and "afterEach" hooks, since
these previously shared a single semaphore:

> Use `assert.async()` once in a simple test, and schedule the resume
> call in the future, but then fail with an uncaught error. The uncaught
> error is found and `Test.run()` would internally kill the pause by
> resetting the semaphore to zero (this make sense since we shouldn't
> wait for the release once the test is known to have failed).
> After this reset, we proceed to the "afterEach" hook. Suppose this
> hook is also async, and during its execution, the originally scheduled
> resume call happens. This would effectively end up releasing the
> afterEach's async pause despite not being finished yet, and then we
> proceed to the next test. That test would then fail when the afterEach's
> own release call happens, failing as "release during a different test".

This is the scenario of #1432.

Fix this and numerous other edge cases by making the returned
callbacks from `assert.async()` strict about which locks they release.

Each lock now adds a unique token to a map, and invoking the release
function decrements/removes this token from the map.

== Notes ==

* es6-map.js assigns the fallback in all browsers.
  This is a bug, to be fixed later.

* The `isNaN(semaphore)` logic was originally added in 2015 by ea3e350.
  At the time, the internal resume function was public, and NaN could
  emerge through `QUnit.start("bla")` as result of `semaphore += "bla"`.
  This has not been possible for a while. During PR #1590, I did not
  trace the origin of this code, and thus did not realize that it was
  already obsolete (the semaphore itself is not publicly supported).

* The "during different test" error is now almost impossible to
  trigger since we now kill pending locks during test failures and
  tolerate all late calls equally. This meant the `drooling-done.js`
  test case now fails in a more limited way.

  I added a new test case for coverage, that reproduces it still, but
  it's a lot more obscure – it requires the original test to pass and
  then also have an unexpected call during a different test.

* I considered using the phrase "async lock" in the public-facing
  error messages, but found this perhaps too internal/technical
  when coming from the perspective of `var done = assert.async();`.

  In order to keep the code shared between handling of async-await,
  Promise, and assert.async, but remain friendly and understandable,
  I went for the phrase "async pause".

Fixes #1432.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core For module, test, hooks, and runner. Type: Bug
Development

Successfully merging a pull request may close this issue.

3 participants