Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

await Promise with Timer behavior regress from previous version #21515

Closed
3 tasks done
JasonYeMSFT opened this issue Dec 13, 2019 · 28 comments · Fixed by #23153
Closed
3 tasks done

await Promise with Timer behavior regress from previous version #21515

JasonYeMSFT opened this issue Dec 13, 2019 · 28 comments · Fixed by #23153

Comments

@JasonYeMSFT
Copy link

JasonYeMSFT commented Dec 13, 2019

Preflight Checklist

  • I have read the Contributing Guidelines for this project.
  • I agree to follow the Code of Conduct that this project adheres to.
  • I have searched the issue tracker for an issue that matches the one I want to file, without success.

Issue Details

  • Electron Version:
    • 5.0.12 - 7.1.10
  • Operating System:
    • Windows 10/macOS 10.15
  • Last Known Working Electron version:
    • 4.2.5

Expected Behavior

We have been using the piece of code to perform asynchronous operations.

await new Promise((resolve) => { setTimeout(resolve, 1000); }

It would pause the execution of the code path containing it for about 1 second and resume it then. This has been working well for a long time.

Actual Behavior

Since we upgrade to Electron 7, in forked node processes, only the first several timer callbacks are called on time, while the rest of them are called extremely late for unknown reason.

Suppose we call the above piece of code for 10 times in a loop. Only the first two promises are created and resolved correctly. In addition, the process exits for unknown reason. In our production code, since there are other things going on (not computationally intensive) which keeps the process alive, we are seeing the 3rd and the rest of the Timers being created and invoked unreasonably late (around 12s after the 2nd one resolves).

The output in the renderer console is

Spawned child process <some_pid>
child process send a message INIT_ACK
child process send a message INIT_ACK
child process exits

To Reproduce

https://github.com/JasonYeMSFT/electron-quick-start/tree/bug/await-Promise

To run the repro steps

$ git clone https://github.com/JasonYeMSFT/electron-quick-start -b bug/await-Promise
$ npm install
$ npm start

Screenshots

Additional Information

Electron 4.2.5 behaves correctly for exactly the same code in this repo. The behavior doesn't reproduce in renderer process. Also it doesn't seem to be a nodejs problem because I can run the renderer.js directly with node on the same version and it behaves correctly.

I did some extra experiment and it turns out that setInterval is producing much more stable behaviors compared to setTimeout (update, I was wrong, setInterval also has the weird behavior).

@MRayermannMSFT
Copy link
Contributor

MRayermannMSFT commented Dec 17, 2019

If you change the file which is forked (awaitPromise.js) to not use await, e.g.:

function waitMs(ms) {
    return new Promise(c => setTimeout(c, ms))
}
async function sendAck() {
    let limit = 10;
    let latestPromise;
    for (let i = 0; i < limit; i++) {
        try {
            latestPromise = (latestPromise || waitMs(1000)).then(() => {
                console.log(`Sent ack ${i} ${Date.now()}`);
                return waitMs(1000);
            });
        } catch (error) {
            console.log(error);
        }
    }
}

sendAck();

The issue still repro's. So I think this is more an issue of Promise as opposed to await.

Output of Electron forking a NodeJS process:

Spawned child process 36432
Sent ack 0 1576603904998
child process exits

Output of a NodeJS process forking a NodeJS process:

Spawned child process 20168
Sent ack 0 1576603909254
Sent ack 1 1576603910256
Sent ack 2 1576603911256
Sent ack 3 1576603912257
Sent ack 4 1576603913257
Sent ack 5 1576603914258
Sent ack 6 1576603915259
Sent ack 7 1576603916260
Sent ack 8 1576603917260
Sent ack 9 1576603918261
child process exits

@MRayermannMSFT
Copy link
Contributor

It may even be as specific as an issue with Promises in conjunction with timeouts. If I change awaitPromise.js to something like this there is no issue:

function syncCodeThatTakesALongTime() {
    for (let i = 0; i < 10000000; i++) {
        Math.random() *  Math.random() *  Math.random() *  Math.random();
        Math.random() *  Math.random() *  Math.random() *  Math.random();
        Math.random() *  Math.random() *  Math.random() *  Math.random();
        Math.random() *  Math.random() *  Math.random() *  Math.random();
        Math.random() *  Math.random() *  Math.random() *  Math.random();
    }
}

function waitMs(ms) {
    return new Promise((resolve, reject) => {
        syncCodeThatTakesALongTime();
        resolve();
    })
}
async function sendAck() {
    let limit = 10;
    let latestPromise;
    for (let i = 0; i < limit; i++) {
        try {
            latestPromise = (latestPromise || waitMs(1000)).then(() => {
                console.log(`Sent ack ${i} ${Date.now()}`);
                return waitMs(1000);
            });
        } catch (error) {
            console.log(error);
        }
    }
}

sendAck();

@MRayermannMSFT
Copy link
Contributor

MRayermannMSFT commented Dec 18, 2019

One more interesting thing. If I rewrite waitMs using setInterval the bug reproduces depending on where I clear the interval. For example:

function waitMs(ms) {
    return new Promise((resolve, reject) => {
        let id = setInterval(() => {
            /**
             * Clearing interval here causes the bug to reproduce.
             */
            // clearInterval(id);
            resolve(id);
            /**
             * Clearing interval here causes the bug to reproduce.
             */
            // clearInterval(id);
        }, ms);
    });
}

async function sendAck() {
    let limit = 10;
    for (let i = 0; i < limit; i++) {
        try {
            let id = await waitMs(1000);
            /**
             * Clearing interval here does not cause the bug to reproduce.
             */
            clearInterval(id);
            console.log(`Sent ack ${i} ${Date.now()}`);
        } catch (error) {
            console.log(error);
        }
    }
}

sendAck();

@MRayermannMSFT
Copy link
Contributor

Sorry, I know I keep adding additional info, but I'm just trying to give as many hints as I can. If you do a setInterval somewhere else in the script (say at the top level scope), that setInterval prevents the process from dying AND after its callback runs, the waitMs function "wakes up", or something like that. Eg:

function waitMs(ms) {
    return new Promise(async (resolve, reject) => {
        setTimeout(resolve, ms);
    });
}

async function sendAck() {
    let limit = 5000;
    for (let i = 0; i < limit; i++) {
        try {
            await waitMs(500);
            console.log(`Sent ack ${i}`);
        } catch (error) {
            console.log(error);
        }
    }
}

setInterval(function() {
    console.log("timer that keeps nodejs processing running");
}, 10);

sendAck();

Weird!

@codebytere codebytere self-assigned this Dec 18, 2019
@codebytere
Copy link
Member

codebytere commented Dec 18, 2019

@MRayermannMSFT no, i actually seriously appreciate all the info! i'll try to tackle this. For some context on our side: child_process.fork runs by design in ELECTRON_RUN_AS_NODE mode. The errors present are likely the result of a schism between real Node.js and Electron's mode that runs to emulate the Node.js executable in a way that breaks timers.

@MRayermannMSFT
Copy link
Contributor

MRayermannMSFT commented Jan 10, 2020

Hey @codebytere , do you have any thoughts on when y'all might be getting to this? Also, I tried disabling/enabling ELECTRON_RUN_AS_NODE and neither mode fixed the issue (assuming I did it correctly). I can double check again later today.

EDIT: after double checking, it doesn't look like Electron is letting me change the value of ELECTRON_RUN_AS_NODE.

@craxal
Copy link

craxal commented Jan 29, 2020

@codebytere Do you have any update to share? This issue is the only blocking issue to us upgrading to Electron 7, which will fix a lot of our own bugs, and we'd really like to do that.

@codebytere
Copy link
Member

i can try to get to this soon - i see from above though that there are some potential workarounds, though. Do none of those work enough to unblock this?

@ckerr
Copy link
Member

ckerr commented Jan 30, 2020

I don't have any breaking news on this, but to report in for completeness:

I did a little poking around on this yesterday to track down when this broke, in the hopes that it might be something simple. Unfortunately, the breakpoint is exactly what one would've expected -- the repro exits prematurely in 3ddd917 (Dec 4 2018) but works fine in 83d951d (Nov 30 2018) -- that is, it broke when we upgraded to 70.0.3538.110 for Electron 5, most likely with the node bump in 0adbcfd

@ckerr
Copy link
Member

ckerr commented Jan 30, 2020

For some context on our side: child_process.fork runs by design in ELECTRON_RUN_AS_NODE mode.

Agreed! As long as we're using ELECTRON_RUN_AS_NODE, we don't even need child_process.fork().

$ ELECTRON_RUN_AS_NODE=1 electron awaitPromise.js

Is enough to trigger the bug, leaving main.js out of it entirely.

As a baseline,

$ node awaitPromise.js

works correctly.

@MRayermannMSFT
Copy link
Contributor

MRayermannMSFT commented Jan 30, 2020

@ckerr Is there a way to disable ELECTRON_RUN_AS_NODE for a child process? I wasn't able to do it, but maybe I was just doing it wrong?

@codebytere
Copy link
Member

@MRayermannMSFT no - not at present. It's hardcoded into our logic and not overridable.

@ckerr
Copy link
Member

ckerr commented Jan 31, 2020

I tried locally reverting back our 4-2-x --> 5-1-x changes to node_main.cc, which is where our custom node event loop lives, to no effect.

So this afternoon has managed to cross off node_main.cc and child_process off of the suspect list. I'd be happier knowing what it is rather than what it isn't, but we can keep digging

@ckerr
Copy link
Member

ckerr commented Feb 14, 2020

Extra fun: I tried patching a stock version of Node with Electron's patchset, and building that as a standalone version of Node, and testing the repro with that. That works just fine.

So it's not triggered by just our patches, nor just by our node_main.cc, by themselves...

@ckerr ckerr self-assigned this Feb 14, 2020
@MRayermannMSFT
Copy link
Contributor

The mystery deepens! Please let us know if there's anything we can do to help @ckerr , and thank you for the continued updates. 🙂

@aabuhijleh
Copy link

aabuhijleh commented Feb 17, 2020

I'm having quite a severe problem with async/await in Electron (even without timers and I've tried versions 4 to 8 and they all have the same issue)

I just added this code to main.js

globalShortcut.register("F6", async () => {
  console.log("F6 is pressed");
  console.time("anAsyncFunctionThatResolvesImmediately");
  await anAsyncFunctionThatResolvesImmediately();
  console.timeEnd("anAsyncFunctionThatResolvesImmediately");
});

async function anAsyncFunctionThatResolvesImmediately() {
  return;
}

This is the console output when I press F6 multiple times (it takes an extremely long time to finish)

F6 is pressed
anAsyncFunctionThatResolvesImmediately: 795.781ms
F6 is pressed
anAsyncFunctionThatResolvesImmediately: 696.556ms
F6 is pressed
anAsyncFunctionThatResolvesImmediately: 1056.945ms
F6 is pressed
anAsyncFunctionThatResolvesImmediately: 4201.750ms
F6 is pressed
anAsyncFunctionThatResolvesImmediately: 4144.318ms

my repo

Update: This problem seems to be gone when you're out of globalShortcut.register's callback's context

Fix:

globalShortcut.register("F6", () => {
   setTimeout(handlePress, 0) // To get out of this function's context
});

async function handlePress() {
  console.log("F6 is pressed");
  console.time("anAsyncFunctionThatResolvesImmediately");
  await anAsyncFunctionThatResolvesImmediately();
  console.timeEnd("anAsyncFunctionThatResolvesImmediately");
}

async function anAsyncFunctionThatResolvesImmediately() {
  return;
}

However, I'd be lying if I said I understood what's happening here :)

Can someone please explain WTF is going on here?

@witcher112
Copy link

I'm not sure but it could be related with #19468 - it seems to freeze the whole Electron app.

@MRayermannMSFT
Copy link
Contributor

This also repro's in Electron 8.0.2.

@MRayermannMSFT
Copy link
Contributor

MRayermannMSFT commented Apr 1, 2020

@ckerr any update on if the team has made progress on this bug?

And for what it's worth, this repro's on Electron 9.0.0-beta.12.

@arantes555
Copy link

For anyone interested in a workaround, from my tests it seems that wrapping the setTimeout in a process.nextTick or a setImmediate makes it work 100% of the time.

This issue is really weird...

@MRayermannMSFT
Copy link
Contributor

@ckerr is there anything we can do to help y'all with this? Have you been thinking about this anymore since Feb?

@ckerr
Copy link
Member

ckerr commented Apr 14, 2020

@MRayermannMSFT yes, I got blocked on this and have come back it; sorry for the silence.

I may have found the source of the issue and will report back tomorrow or possibly later today after I've done a little more digging to test my theory.

@MRayermannMSFT
Copy link
Contributor

No worries! I am excited to learn what it is. This is obviously one of those extra special bugs. 😋

@ckerr
Copy link
Member

ckerr commented Apr 15, 2020

I'll write up the status of the bug hunt in the next comment -- and I still have some more digging to do there -- but @MRayermannMSFT first I want to get you unblocked in a way that doesn't rely on e.g. setImmediate. Here's a workaround that WfM in testing.

The issue arises iff control flow returns to node at any point when there's not a pending timeout. This happens in your code when first timeout returns a new Promise but its resolve() function (which actually creates the steady-state timer) hasn't yet been invoked.

If you create a sibling Promise-based timeout that runs parallel to the real one but fires out of step with it, that ensures that there's always at least one pending timeout and that's enough to keep everything alive.

I paraphrased your electron-quick-start fork as this file for testing:

const waitMs = (msec) => new Promise((resolve) => setTimeout(resolve, msec));

const intervalMsec = 1000;
const numIterations = 10; 
let iteration = 0;

function doWork() {
    ++iteration;
    console.log(`doing task #${iteration}`);
}

async function sendAck() {
    let workPromise;
    for (let i = 0; i < numIterations; i++) {
        try {
            workPromise = (workPromise || waitMs(intervalMsec)).then(() => {
                doWork();
                return waitMs(intervalMsec);
            });
        } catch (error) {
            console.log(error);
        }
    }
}

sendAck();

When I run this with ELECTRON_RUN_AS_NODE=1 in Electron >= 5, it exits after

doing task #1

Here's that code with a sibling Promise added:

const waitMs = (msec) => new Promise((resolve) => setTimeout(resolve, msec));

const intervalMsec = 1000;
const numIterations = 10; 
let iteration = 0;

function doWork() {
    ++iteration;
    console.log(`doing task #${iteration}`);
}

async function sendAck() {
    // tracingPromise works around a bug introduced in Electron 5.
    // It's here solely to keep workPromise alive. Can be removed when
    // https://github.com/electron/electron/issues/21515 is fixed upstream
    let tracingPromise;

    let workPromise;
    for (let i = 0; i < numIterations; i++) {
        try {
            // NB: tracingPromise must not fire at the same time as workPromise,
            // which is why the initial interval is halved
            tracingPromise = (tracingPromise || waitMs(intervalMsec/2)).then(() => {
                return waitMs(intervalMsec);
            });
            workPromise = (workPromise || waitMs(intervalMsec)).then(() => {
                doWork();
                return waitMs(intervalMsec);
            });
        } catch (error) {
            console.log(error);
        }
    }
}

sendAck();

Under the same test conditions, this makes it all the way to

doing task #10

@ckerr
Copy link
Member

ckerr commented Apr 15, 2020

The issue is arising in timer code that was introduced in Node.js 12. The Node.js function Environment::RunTimers() has this block:

  Local<Function> cb = env->timers_callback_function();
  MaybeLocal<Value> ret;
  Local<Value> arg = env->GetNow();
  // This code will loop until all currently due timers will process. It is
  // impossible for us to end up in an infinite loop due to how the JS-side
  // is structured.
  do {
    TryCatchScope try_catch(env);
    try_catch.SetVerbose(true);
    ret = cb->Call(env->context(), process, 1, &arg);
  } while (ret.IsEmpty() && env->can_call_into_js());

  ... // code omitted for clarity

  // To allow for less JS-C++ boundary crossing, the value returned from JS
  // serves a few purposes:
  // 1. If it's 0, no more timers exist and the handle should be unrefed
  // 2. If it's > 0, the value represents the next timer's expiry and there
  //    is at least one timer remaining that is refed.
  // 3. If it's < 0, the absolute value represents the next timer's expiry
  //    and there are no timers that are refed.
  int64_t expiry_ms =
      ret.ToLocalChecked()->IntegerValue(env->context()).FromJust();

Where cb is processTimers() from lib/internal/timers.js:

  function processTimers(now) {
    debug('process timer lists %d', now);
    nextExpiry = Infinity;

    let list;
    let ranAtLeastOneList = false;
    while (list = timerListQueue.peek()) {
      if (list.expiry > now) {
        nextExpiry = list.expiry;
        return refCount > 0 ? nextExpiry : -nextExpiry;
      }
      if (ranAtLeastOneList)
        runNextTicks();
      else
        ranAtLeastOneList = true;
      listOnTimeout(list, now);
    }
    return 0;
  }

As mentioned in the previous comment, the issue arises when the last timeout has fired and there is currently no timeout but there is one that will be created inside a Promise resolve() function. (See previous comment for sample code)

So what's happening is:

  1. Environment::RunTimers() invokes cb->Call(), which is the JS function processTimers()
  2. processTimers() processes the last pending timeout and returns 0 to indicate there are no more timers. That 0 is what ret gets back in the C++ world in Environment::RunTimers()`
  3. Before cb->Call() returns, V8's microtask queue is invoked, looking for Promises which can be handled. This results in that resolve() function being called, creating a new timeout. There is now one pending timeout.
  4. Control returns to Environment::RunTimers(), which sees that 0 from step 2 and acts as if there are no more timers. It unrefs the uv handle which results in Electron exiting.

That's the state of the bug. Next, I need to find out why this only happens in Electron-running-as-Node.js and not in Node.js itself. Two possible theories: (1) This is a bug in Node.js too, but some other handle reference somewhere keeps things alive long enough to work, or maybe (2) calling Local<Function>::Call() in Node.js doesn't trigger the microtask queue in Node.js; in which case, it would be Electron's issue for ELECTRON_RUN_AS_NODE not emulating Node closely enough.

@ckerr
Copy link
Member

ckerr commented Apr 16, 2020

Here's the one-liner that fixes it

diff --git a/atom/app/node_main.cc b/atom/app/node_main.cc
index c4dde00f9..98dd52dee 100644
--- a/atom/app/node_main.cc
+++ b/atom/app/node_main.cc
@@ -52,6 +52,7 @@ int NodeMain(int argc, char* argv[]) {
 
     // Initialize gin::IsolateHolder.
     JavascriptEnvironment gin_env(loop);
+    gin_env.isolate()->SetMicrotasksPolicy(v8::MicrotasksPolicy::kExplicit);
 
     // Explicitly register electron's builtin modules.
     NodeBindings::RegisterBuiltinModules();

Explanation

Following where the last comment left off, here's how RunMicrotasks() is reached from v8::Function::Call():

#0  0x00005555598ec0a4 in RunMicrotasks() () at ./../../v8/src/microtask-queue.cc:112
#1  0x000055555989a6db in FireCallCompletedCallback() () at ./../../v8/src/isolate.cc:4040
#2  0x00005555590bf31e in ~CallDepthScope() () at ../../v8/src/api.cc:300
#3  0x00005555590e5baf in Call() () at ../../v8/src/api.cc:5033
#4  0x000055555e8c5e26 in RunTimers() () at ../../third_party/electron_node/src/env.cc:624
#5  0x000055555e9c8fe7 in uv__run_timers () at ../../third_party/electron_node/deps/uv/src/timer.c:174
#6  0x000055555e9cbaad in uv_run () at ../../third_party/electron_node/deps/uv/src/unix/core.c:357
#7  0x000055555a8006b5 in NodeMain() () at ../../electron/atom/app/node_main.cc:124
#8  0x00005555579fa240 in main() () at ../../electron/atom/app/atom_main.cc:193

The important part is that Isolate::FireCallCompletedCallback() will run microtasks at the end of Function::Call() iff these conditions are true:

  bool run_microtasks =
      microtask_queue && microtask_queue->size() &&
      !microtask_queue->HasMicrotasksSuppressions() &&
      microtask_queue->microtasks_policy() == v8::MicrotasksPolicy::kAuto;

This code block is slightly different in Node 12 than Electron 5 because the former uses v8 7.4.288.13-node.13 and the latter uses 7.3.492.27-electron.0. Here's the code in Electron 5:

    bool run_microtasks =
        default_microtask_queue()->size() &&
        !default_microtask_queue()->HasMicrotasksSuppressions() &&
        handle_scope_implementer()->microtasks_policy() ==
            v8::MicrotasksPolicy::kAuto;

HandleScopeImplementer's default microtasks policy is kAuto:

    explicit HandleScopeImplementer(Isolate* isolate)
        : isolate_(isolate),
          spare_(nullptr),
          call_depth_(0),
          microtasks_policy_(v8::MicrotasksPolicy::kAuto),
          last_handle_before_deferred_block_(nullptr) {
    }

We override that in Electron 5's NodeBindings::CreateEnvironment(), which answers the question of why this worked in Electron when Electron wasn't pretending to be Node.js:

    if (browser_env_ == BROWSER) {
      // SetAutorunMicrotasks is no longer called in node::CreateEnvironment
      // so instead call it here to match expected node behavior
      context->GetIsolate()->SetMicrotasksPolicy(v8::MicrotasksPolicy::kExplicit);
    } else {
      // Node uses the deprecated SetAutorunMicrotasks(false) mode, we should
      // switch to use the scoped policy to match blink's behavior.
      context->GetIsolate()->SetMicrotasksPolicy(v8::MicrotasksPolicy::kScoped);
    }

For its part, Node 12 sets the value to kExplicit in its SetIsolateUpForNode() function:

void SetIsolateUpForNode(v8::Isolate* isolate) {
  isolate->AddMessageListenerWithErrorLevel(
      OnMessage,
      Isolate::MessageErrorLevel::kMessageError |
          Isolate::MessageErrorLevel::kMessageWarning);
  isolate->SetAbortOnUncaughtExceptionCallback(ShouldAbortOnUncaughtException);
  isolate->SetMicrotasksPolicy(MicrotasksPolicy::kExplicit);
  isolate->SetFatalErrorHandler(OnFatalError);
  isolate->SetAllowWasmCodeGenerationCallback(AllowWasmCodeGenerationCallback);
  isolate->SetPromiseRejectCallback(task_queue::PromiseRejectCallback);
  v8::CpuProfiler::UseDetailedSourcePositionsForProfiling(isolate);
}

SetIsolateUpForNode() is too Node-specific for Electron, and NodeBindings::CreateEnvironment() is to Electron-specific for Electron-running-as-Node.js. So probably the best thing is to just call it ourselves in app/node_main.cc, as shown in the diff at the top of this comment.

@MRayermannMSFT
Copy link
Contributor

Nice! I think I'm going to need to read through all of this 2 or 3 more times to fully understand things. Great write-up and thanks again for returning to this.

@MRayermannMSFT
Copy link
Contributor

I just tested against Electron 8.2.4. I can confirm that our issue no longer repros. Many thanks again to @ckerr and @codebytere! 🥰

@electron electron deleted a comment from trop bot Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
6.1.x
Unsorted Issues
7.2.x
Unsorted Issues
Development

Successfully merging a pull request may close this issue.

9 participants