Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: delay parallel pool termination to prevent false negatives #4959

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

echuber2
Copy link

I found that moving await pool.terminate() into the finally block here could prevent some false negatives, where tests could fail but node would mysteriously exit 0 anyway. (That is, by "false negative", I mean that the mocha test suite gave a passing exit code when it should have failed.)

Description of the Change

We saw some sporadic cases where a mixture of tests run with --parallel would exit clean with code 0 despite there being some test failures. It's difficult to reproduce. (One specific case can be found in the PrairieLearn project issue linked on this PR.) After some experimentation, I found that the call to await pool.terminate() here is sometimes causing the entire node process to immediately exit with 0. I'm not entirely sure of the cause, but it may be due to the unusual implementation of cancellable promises in the workerpool library (named Promise like the official JS version, but different). When the false negative issue arises in pool.terminate(), the calls seem to descend into the version of Promise.all in workerpool and then suddenly exit. It may be that some signal handlers for uncaught exceptions are misconfigured somewhere.

At any rate, when I move this call to await pool.terminate(); later in the code (into the finally block), the test results appear to accurately show when at least one test has failed or not. Maybe someone else can test this if they've seen anything similar.

Alternate Designs

I tried to figure out a way to fix the issue in the workerpool library's handling of Promise.all (which uses its own Promise type, not the standard JS one), since the call to terminate here ultimately ends up there. But I couldn't conclusively determine if the issue was arising there.

Even without this change, it does work to use --bail with --parallel and still catch the first failure that way. But then, the downside would be that the CI test run won't include test results beyond the first failure.

Why should this be in core?

The --parallel feature sometimes has false negatives (failed tests that show as passing with exit 0 in CI).

Benefits

This change successfully catches some unusual failing cases in the particular situation we saw on the PrairieLearn project. I'm not entirely certain, but I think it doesn't hurt anything to terminate the pool later, in the finally block.

Possible Drawbacks

I don't know if this will cause new false negatives or false positives for other users in strange cases. The underlying issue(s) may be in the workerpool library, in which case it could theoretically be fixed without changing the mocha code at all.

Applicable issues

Maybe related somehow to #4559

Hopefully fixes PrairieLearn/PrairieLearn#6940

Possibility of breaking change

This may be a harmless bug fix or a breaking change. I won't know unless more people test it.

I found that moving `await pool.terminate()` into the finally block here could prevent some false negatives, where tests could fail but node would mysteriously exit 0 anyway.
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 13, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: echuber2 / name: Eric Huber (08245b5)

@github-actions
Copy link

This PR hasn't had any recent activity, and I'm labeling it stale. Remove the label or comment or this PR will be closed in 14 days. Thanks for contributing to Mocha!

@github-actions github-actions bot added the stale this has been inactive for a while... label May 15, 2023
@nwalters512
Copy link

This is still applicable.

@github-actions github-actions bot removed the stale this has been inactive for a while... label May 17, 2023
@Magoli1
Copy link

Magoli1 commented Jun 28, 2023

This is highly relevant for me 👍🏾 Can we have it merged?

@github-actions
Copy link

This PR hasn't had any recent activity, and I'm labeling it stale. Remove the label or comment or this PR will be closed in 14 days. Thanks for contributing to Mocha!

@github-actions github-actions bot added the stale this has been inactive for a while... label Oct 27, 2023
@nwalters512
Copy link

This is still applicable. It's waiting for review by a maintainer.

@github-actions github-actions bot removed the stale this has been inactive for a while... label Oct 30, 2023
@JoshuaKGoldberg
Copy link
Member

Note that per #5027 we're a new group of maintainers and not intimately familiar with Mocha's internals yet. Changes to async/pool logic are scary in general and especially to us.

Could someone please post an isolated reproduction please? Either in a new 🐛 Bug issue, or as a casual comment here if that's inconvenient for you? We can't reasonably triage this PR without an isolated reproduction.

I'm also particularly interested in seeing whether the bug can be reproduced with the native Promise, rather than just with the workerpool library. If it's only a workerpool issue then I'd think the right report would be a bug on workerpool.

@JoshuaKGoldberg JoshuaKGoldberg added the status: waiting for author waiting on response from OP - more information needed label Mar 4, 2024
@JoshuaKGoldberg JoshuaKGoldberg changed the title Delay parallel pool termination to prevent false negatives fix: delay parallel pool termination to prevent false negatives Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting for author waiting on response from OP - more information needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mocha --parallel sometimes fails silently in CI
4 participants