Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test on v16 and x86 platform #1290

Closed
legendecas opened this issue Feb 22, 2023 · 3 comments · Fixed by #1297
Closed

Failing test on v16 and x86 platform #1290

legendecas opened this issue Feb 22, 2023 · 3 comments · Fixed by #1297

Comments

@legendecas
Copy link
Member

legendecas commented Feb 22, 2023

I think that I found the test that does not work:
https://github.com/nodejs/node-addon-api/blob/main/test/async_progress_worker.cc#L185
https://github.com/nodejs/node-addon-api/blob/main/test/async_progress_worker.js#L12

Anyone have an idea because this test hangs on Node.js v16 and x86 platform?

Originally posted by @NickNaso in #1278 (comment)

@JckXia
Copy link
Member

JckXia commented Mar 16, 2023

I did some digging into this issue. So far, it appears that a normal CI run will reliably reproduce this error. Occasionally, they will succeed but that seems to be more of an exception than the norm.

However, when I tried to run just this particular test using node16 on X86 (https://github.com/JckXia/node-addon-api/actions/runs/4440582410/jobs/7794559762), it will pass with flying colors each time. I thought async_progress_queue_worker was the confounding factor but that doesn't seem to make much of a difference.

Does anyone have any other ideas that I could explore?

Issues to persist when we are running on X86 only, but against all node versions: https://github.com/JckXia/node-addon-api/actions/runs/4440781888/jobs/7795016181

@KevinEady
Copy link
Contributor

Hi @JckXia ,

Thanks for looking into this!

Looking at test/index.js, the log of "Running test async_worker" should come after the the Promise for the async_progress_worker test resolves. Since it's not, I think it is still async_progress_worker hanging. There could be some random deadlocking / mutex stuff going on when running the full test suite vs a subset of tests that makes it seem like everything is right when in actuality things are... not.

I remember awhile back we had a similar flakey test. It was due to the missing predicate condition for condition_variable.wait() calls: #1168 to fix this issue #1150 that has a very similar test timeout failure.

I noticed we have some places in async_progress_worker that use condition variables with no predicate, eg:

for (int32_t idx = 0; idx < _times; idx++) {
data.progress = idx;
progress.Send(&data, 1);
_cv.wait(lock);
}

Maybe this is the cause? If you can start with addressing this one and see if it fixes the hang. If so, we should look into changing the other places that use condition variables with no predicate:

while (!info->closeCalledFromJs) {
info->signal.wait(lk);
}

while (!info->closeCalledFromJs) {
info->signal.wait(lk);
}

@JckXia
Copy link
Member

JckXia commented Mar 16, 2023

Hey @KevinEady! Thank you, I added a wait condition and it indeed seems to have resolved the issue.
#1297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants