New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix synchronization between main thread and worker #14541
Conversation
@@ -60,7 +60,7 @@ exports.WorkerClient = class WorkerClient extends Client { | |||
{ env: WorkerClient.#worker_threads.SHARE_ENV }, | |||
); | |||
|
|||
#signal = new Int32Array(new SharedArrayBuffer(4)); | |||
#signal = new Int32Array(new SharedArrayBuffer(8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need doubling the byte length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (resp) break;
Atomics.wait(this.#signal, 1, 0, 30);
For convenience, I increased the byte length for sleep.
I will re-run CI test for a few more times. Hopefully the last flaky test is fixed. |
Ha ha! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is it needed to run the loop 100 times? Isn't 2 enough?
- When the worker main thread is woken up but we receive undefined, does the worker wake up us again?
- Has this bug ever been reproduced outside of our CI? If not, we could consider making the fix a test-only fix rather than adding it to the published code.
I do not know.
I don't quite understand what you mean here.
There are currently no reports on this issue. |
@liuxingbaoyu I pushed a commit with some logs to understand better what's causing the bug. |
ce563b8
to
7584140
Compare
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/51995/ |
f64aa11
to
d696a81
Compare
I may now understand a little what this means.
I've also tried The above explanation may not be completely accurate because nodejs is too complicated. |
^ @addaleax (if you have time to answer) Is it expected that using // main.js
const signal = new Int32Array();
const subChannel = new MessageChannel();
// Tell the worker to start doing its job
worker.postMessage({ signal, port: subChannel.port1 }, [subChannel.port1]);
// Wait until the worker is done
Atomics.wait(signal, 0, 0);
// Read the worker response
const response = receiveMessageOnPort(subChannel.port2); // worker.js
parentPort.addListener("message", async ({ signal, port }) => {
let response = await computeResponse();
// Send the response to the main thread
port.postMessage(response);
port.close();
// Wake up the main thread
Atomics.store(signal, 0, 1);
Atomics.notify(signal, 0);
}); And sometime |
@nicolo-ribaudo Are you sure this code is doing what you want it to do? I see that there are calls to multiple babel/eslint/babel-eslint-parser/src/parse.cjs Lines 39 to 40 in 885e1e0
These would result in multiple
#signal[0] is still 0 ), which means that postMessage() may not have run yet when it returns.
Am I missing something? |
Thanks for your reply!
0 before calling postMessage . The main thread is fully synchronous, so there shouldn't be any race condition between these client.* calls.
The confusing thing is that the code works ~90% of the times. |
d696a81
to
89bb6ab
Compare
Ah yeah, that one I was indeed missing. 🙂 Do you have a standalone reproduction? Can you compare the output of a failing/a successful run when running node as (I think technically you should be using |
I cannot even reproduce the error locally 😂 I'll continue tweaking this PR, feel free to unsubscribe from notifications and I'll ping you if I have more info. |
@addaleax @nicolo-ribaudo |
89bb6ab
to
d1d1281
Compare
The errors are just random at this point, now they are about some unrelated functions not being callable. |
Using a new SAB every times seems to have fixed it? Now the logs shows that it always works at the first |
Maybe we should enable all tests and try again? |
The exception for |
Yup, that's what I was seeing when I commented #14541 (comment) 😬 It looks like v8's internal memory is getting corrupted somehow, and variable values are randomly changed. Let's pretend we didn't see that failure. |
The new error is on the |
There is nothing worse than this.😰😰😰 |
I can reproduce the new error locally 🥳 |
|
6a1cd01
to
5794b3b
Compare
This now fixes the I'm going to merge this PR for now. |
I may have found the cause of that error. The reason is that E.g |
This seems to be a bug in nodejs.
But it's hard to reproduce and we want to be compatible with older versions of nodejs.
So fixed in babel.
This PR also fixes unstable CI.