[WIP] Support requiring .mjs files #30891

weswigham · 2019-12-10T22:05:32Z

This implements the ability to use require on .mjs files, loaded via the esm loader, using the same tradeoffs that top level await makes in esm itself.

What this means: If possible, all execution and evaluation is done synchronously, via immediately unwrapping the execution's component promises. This means that any and all existing code should have no observable change in behavior, as there exist no asynchronous modules as of yet. The catch is that once a module which requires asynchronous execution is used, it must yield to the event loop to perform that execution, which, in turn, can allow other code to execute before the continuation after the async action, which is observable to callers of the now asynchronous module. If this matters to your callers, this means making your module execution asynchronous could be considered a breaking change to your library, however in practice, it will not matter for most callers. Moreover, as the ecosystem exists today, there are zero asynchronously executing modules, and so until there are, there are no downsides to this approach at all, as no execution is changed from what one would expect today (excepting, ofc, that it's no longer an error to require("./foo.mjs").

As of right now, this is missing:

Support for "type": "module" in the cjs loader. (where did that go? I thought we already had something that did that internally - getPackageType or something?)
New tests (I have no idea how to write tests for node, help appreciated)
Speculative integration tests for src: add support for top level await #30370 - the code is already written assuming ModuleWrap.evaluate may eventually return a promise.

ljharb · 2019-12-10T22:09:41Z

cc @nodejs/modules-active-members

guybedford · 2019-12-10T22:15:29Z

Discussion for this PR in nodejs/modules#454.

devsnek

this cannot be generalized to TLA modules because their execution could be waiting on something not owned by the libuv event loop, which would result in execution deadlocking.

weswigham · 2019-12-10T22:21:26Z

this cannot be generalized to TLA modules because their execution could be waiting on something not owned by the libuv event loop, which would result in execution deadlocking.

You have an example? TLA can already trivially deadlock in a huge number of ways (there's an faq entry defending it), what are you thinking of?

devsnek · 2019-12-10T22:25:25Z

@weswigham literally any async behaviour that isn't a one-shot call to a libuv api. interestingly we had an issue opened a few days ago asking about deasync behaviour, you might find it interesting: #30634

devsnek · 2019-12-10T22:28:24Z

lib/internal/modules/cjs/loader.js

+   * So long as the module doesn't contain TLA, this will be sync, otherwise it
+   * appears async
+   */
+  promiseWait(instantiated.evaluate(-1, false));


I believe this will throw if the last expression of a module source returns a rejected promise.

Oooo spicy - is there a way to distinguish that from a TLA module that threw once #30370 is in?

once TLA is enabled, the result of module.evaluate() is either undefined or Promise<undefined>.

Hah, yeah, I guess I just don't need this promiseWait call until #30370 is in. I'll remove it for now.

once TLA is enabled, the result of module.evaluate() is either undefined or Promise.

Suppose I have

// @filename: mod.mjs new Promise(resolve => resolve(undefined))

how am I to distinguish that from a module with TLA? I think conflating the evaluation of the final expression and the completion promise, API-wise, is problematic.

weswigham · 2019-12-10T22:29:29Z

literally any async behaviour that isn't a one-shot call to a libuv api

Such as? AFAIK all the library ones go into the event loop or the microtask queue, no (both of which are being advanced)? As far as I know, this centralization of async handling is, like, a major feature of javascript/node.

devsnek · 2019-12-10T22:30:35Z

lib/internal/modules/cjs/loader.js

+Module._extensions['.mjs'] = function(module, filename) {
+  const ESMLoader = asyncESM.ESMLoader;
+  const url = `${pathToFileURL(filename)}`;
+  const job = ESMLoader.getModuleJobWorker(url, 'module');


so this just skips the esm resolver entirely? this seems like a rather dangerous divergence.

That's because we already resolved the specifier, in order to know we needed the esm loader. The cjs resolver and the esm resolver are supposed to resolve the same specifier to the same thing, so it should be fine. This is part of why I kept saying it really should only be considered to be one loader.

devsnek · 2019-12-10T22:34:33Z

@weswigham third party libraries can run their own async logic and then push the result into the microtask queue to be handled. additionally, node can't warn/throw/etc when it won't work, it will just sit there keeping your cpu pinned to 100% while it busy loops.

weswigham · 2019-12-10T22:37:10Z

third party libraries can run their own async logic and then push the result into the microtask queue to be handled

That still works, no? Or does node somehow sleep in that scenario normally...? What's the normal behavior in that situation? node's event loop is empty - therefore I would think that the process dies, and the third party native module's async callback gets dropped? Tell me more.

devsnek · 2019-12-10T22:38:30Z

i'm not entirely familiar with how these libraries work, there are more details of weirdness in #30634 and the deasync repo.

weswigham · 2019-12-10T22:50:53Z

Looks like the reported problem in deasync tracked back to an issue in node that was never addressed: nodejs/node-v0.x-archive#25831 (because of the stigma associated with the deasync pattern itself)

Timer handles could mangle one another, apparently. The bug as originally described no longer appears to exist (I can freely unwrap a number of layers of nested async promises/setTimeouts just fine). So that shouldn't matter.

This implements the ability to use require on .mjs files, loaded via the esm loader, using the same tradeoffs that top level await makes in esm itself. What this means: If possible, all execution and evaluation is done synchronously, via immediately unwrapping the execution's component promises. This means that any and all existing code should have no observable change in behavior, as there exist no asynchronous modules as of yet. The catch is that once a module which requires asynchronous execution is used, it must yield to the event loop to perform that execution, which, in turn, can allow other code to execute before the continuation after the async action, which is observable to callers of the now asynchronous module. If this matters to your callers, this means making your module execution asynchronous could be considered a breaking change to your library, however in practice, it will not matter for most callers. Moreover, as the ecosystem exists today, there are zero asynchronously executing modules, and so until there are, there are no downsides to this approach at all, as no execution is changed from what one would expect today (excepting, ofc, that it's no longer an error to require("./foo.mjs"). Ref: nodejs/modules#308 Ref: https://github.com/nodejs/modules/issues/299 Ref: nodejs/modules#454

…pport is in, and until then, the results are a bit off

devsnek · 2019-12-10T23:07:20Z

@weswigham that's not the only issue... https://github.com/abbr/deasync/issues

there are several unsolved compatibility issues listed, and unless you could prove they're all fixed and that no new ones would pop up, i don't see how we can use this approach.

jkrems

This would introduce official support for deasync to node core. As in: It would be possible to write code that awaits arbitrary async work using a synchronous call. I don't think that there should be an official node API that allows awaiting in the middle of synchronous call chains. All node APIs are designed assuming that certain patterns are safe, like attaching event listeners on returned objects within the same synchronous flow. With this change, innocent looking code would be able to trigger error events before event listeners have been added etc..

I don't believe node's own APIs are compatible with the feature introduced here. Also, adding this API to node moves node away from the expected execution flow of JS which is that code runs to completion, one event loop tick at a time.

weswigham · 2019-12-10T23:23:54Z

there are several unsolved compatibility issues listed, and unless you could prove they're all fixed and that no new ones would pop up, i don't see how we can use this approach.

Aside: you can't prove software is bug free, per sey, you can just test extensively to increase the your confidence that faults likely don't exist~

Also, this is a much, much higher level primitive than deasync exposed - this is essentially just an off-brand await, implemented in the c++ engine that attempts to avoid spinning event loop turns, if possible. deasync left people to compose their own sync methods, which often meant that exceptions weren't handled and got left hanging (and you do have to explicitly unwrap the exception out of the promise and rethrow it into the context to get it right, which is a notable thing missing in may of those issues). IMO, deasync's implementation is also crazy roundabout, since it relies on node-style-continuation callbacks, and quietly fails if the callback isn't actually node-style (eg, error is swapped, the the return value is the error). By only unwrapping promises, this is way, way simpler. And a solid half of it's issues are just related to it being a native module that doesn't always build when the node api changes.

I'm off topic - we shouldn't be discussing an external library that does a kinda-similarish-but-not-really thing to what some internals are doing in this PR. We should be looking to increase confidence in this PR with concrete tests that help us believe it's good, or provide concrete testable examples as to why it's not.

weswigham · 2019-12-10T23:26:27Z

I don't believe node's own APIs are compatible with the feature introduced here

I don't believe node's whole ecosystem is reasonably compatible with es modules without the feature introduced here.

devsnek · 2019-12-10T23:27:29Z

deasync internals and this pr do the same thing, even if the c++ is slightly different. I'm pointing to previous examples of this approach as being inherently unstable for generalized use in node.

weswigham · 2019-12-10T23:36:18Z

deasync internals and this pr do the same thing

deasync internals just exports raw uv_run, which is absolutely not what this is doing - this has a fair bit more around that to handle unwrapping and executing promises appropriately (and actually checking loop completion state), all of which is also only doable in the c++ layer. node internals safely use uv_run elsewhere - it's a matter of maintaining the expected usage of the event loop - as is, this is way closer to what the platform is already doing in node_main_instance.cc (including using the caller's environment's loop, rather than the "default" loop, something deasync failed to do, which is likely the cause of the issue with Workers).

weswigham · 2019-12-10T23:41:13Z

With this change, innocent looking code would be able to trigger error events before event listeners have been added etc..

Only if that code comes from an es module. And es modules can already do that amongst themselves, thanks to TLA. Ergo, nothing has changed there. This is just porting the ability to utilize that behavior to the cjs resolver, which is immensely useful for compatibility.

bmeck · 2019-12-16T15:03:03Z

@weswigham

Spec-wise, would there anything preventing one, from, say, converting the cjs wrapper function to a wrapper generator and having an invocation of require yield that generator?

No, CJS isn't JS and isn't bound by any TC39 semantics; though, you would need to do a much more thorough transform. Every call site of require() needs every execution context to be in a yield position for such a transform to properly be written in JS (including WASM execution contexts [theoretically possible if you also transform all the WASM?], dynamic codegen execution contexts like Function/eval (would need another hook for V8 to intercept direct eval src and transform it), and C++ execution contexts [improbable?]). Either way, this kind of thinking is about how to take non-JS and make it JS which I don't really find compelling to say that the non-transformed source is JS if it requires a very invasive transform. JS is turing complete so we can compile pretty much anything to it; the questions in this PR's comments are more about the source passed to Node being JS, not if Node can transform something that isn't JS into JS.

jkrems · 2019-12-16T15:50:54Z

Every call site of require() needs every execution context to be in a yield position

Just to make sure the scope of this is clear: require is a function call that currently may exist in any expression position, not only in the top-most function body. So a more extreme way to put it would be "if require calls turn into a yield expression, all functions running in node would have to become generators". "All functions" may be too broad but given how dynamic JS is, it's pretty close.

So I would say: Yes, there is something spec-wise to prevent it. Because that transformation would realistically also apply to all "real"/spec'd JS (ES modules), not just to CJS. Otherwise call sites into functions that use inline-require would break. That's of course unless there would be a way to isolate CJS being called from JS code.

ljharb · 2019-12-16T17:29:34Z

There’s also makeRequireFunction in ESM that would need to act the same as CJS require.

jkrems · 2019-12-18T21:19:50Z

One thing that we ran out of time for in our call: If the idea is that require(esm) can run a syncified version of the entire module loading pipeline, how does this affect concurrent import jobs? E.g. say an import job is currently active that touches on a subset of the same modules involved in the required esm. What happens to that job? Do we somehow prevent that job to yield back to the event loop? If so, how? Or would we effectively say that all import jobs become synchronous (in relation to the main thread) and accept that loading and compiling code in the background is no longer a feature of ESM that node supports?

bmeck · 2019-12-18T21:27:42Z

@jkrems

say an import job is currently active that touches on a subset of the same modules involved in the required esm. What happens to that job?

Sharing and graph execution prior to the queued async job, same as what import() can do.

// imagine a and b share a module c
const a = import('will load after b (long fetch lets say)');
const b = import('will load before a');

Can lead to c => b => a even though a queued first.

Do we somehow prevent that job to yield back to the event loop? If so, how?

I don't fully understand this question. Off Realm resolvers would be a separate loop in the same way a Worker is. The resolver would yield back to its own event loop eventually still. The main realm would be paused via a mechanism like Atomics.wait.

Or would we effectively say that all import jobs become synchronous (in relation to the main thread) and accept that loading and compiling code in the background is no longer a feature of ESM that node supports?

Not necessarily, you only need to perform locking for synchronous call sites. This is similar to XHR differentiating sync vs async but having the same API (probably not the best example).

jkrems · 2019-12-18T21:47:48Z

@bmeck What I meant is the following:

// Simple example, it becomes more complicated with deeper nesting

import('a');
// create import job for a but continue execution.
// create a callback to process the result of loading / instantiate the module on the main thread.

require('a');
// wants to reuse import job for a but that job is currently set up to yield back to the main thread
// when it's not actively interacting with the Isolate.
// What does it do? Does it modify the original job? Does it synchronously set the job's resolutions?
// Also implied: Does this mean that everything to do with import needs to happen in C++?

Right now import isn't one chunk of work that blocks the main thread for an extended period of time (or rather: it doesn't have to be). This is one of the major advantages over require: Loading and compiling code doesn't prevent execution on the main thread.

A possible import flow would be:

  MT                    OT
import
  |                  resolveUrl
(unblocked!)        loadResource
  |                  transform
CompileModule
  |
(unblocked)         compileSource
  |
execute

Multiple concurrent imports aren't an issue - none of them blocks the main thread and could prevent progress in the others. But if we use normal task scheduling on the main thread for loader jobs (which we do right now), then blocking the main thread while a loader job is running will be a problem.

One solution is indeed blocking the main thread, via Atomics.wait or any other mechanism doesn't really matter here. But I haven't seen anything saying out loud that this will mean that ESM loading in node will de-facto have the same main thread blocking issues that CJS currently has. If that's the case, we effectively have a sync loader for ESM that as an implementation detail makes blocking calls to another thread where technically async code is running.

bmeck · 2019-12-18T22:13:52Z

// wants to reuse import job for a but that job is currently set up to yield back to the main thread

I don't understand this, the each ModuleJob is independent and only the entire graph yields back (though I'm unclear on the "yield back" phrasing).

// What does it do? Does it modify the original job? Does it synchronously set the job's resolutions?

Modify and resolve, both synchronously. I'm unclear on the problem being framed with that as ModuleJob is not a Promise in and of itself.

// Also implied: Does this mean that everything to do with import needs to happen in C++?

Not to my knowledge

jkrems · 2019-12-18T22:24:15Z

I don't understand this, the each ModuleJob is independent and only the entire graph yields back (though I'm unclear on the "yield back" phrasing).

Yield back as "allow JS to run on the main thread". JS can run on the main thread while a file is loading for example. The way the module job is currently written, we do yield control of the main thread all the time, not just when the entire module graph has been loaded. Which means that the individual steps that need to run on the main thread (like creating objects in the Isolate) have to be scheduled somehow. Right now we use promises to do it. If we don't call them, nothing happens.

So if we can't use promises to schedule those main thread units of work, what is the replacement? Will we have a separate scheduler that needs to be called from the main event loop but also can be called blocking?

I'm unclear on the problem being framed with that as ModuleJob is not a Promise in and of itself.

The module job isn't a Promise but uses Promises to schedule work. That's why I used "resolutions" - a job depends on many resolutions right now, one for each time it needs to do work on the main thread.

weswigham · 2019-12-18T22:34:34Z

So if we can't use promises to schedule those main thread units of work, what is the replacement?

ESM graph execution actually isn't "scheduled" at all yet. It's 100% sync (as TLA isn't actually in yet). And, to be clear: to be to spec for TLA, you can't naively hook it up with just chained promises, or you don't preserve sync module graphs, which is required. Not that any of that really matters, since v8 wires up all the executions anyway - we just call .execute on the root v8 module object. Today, that's always sync. Post-TLA, it'll return a promise or not. The non-promise return will indicate execution was purely sync, the promise return'll indicate async execution was required. We can then return that promise or throw (which seems preferable based on today's discussion) as desired. All that ModuleJob scheduling chicanery is actually handled by v8 itself - we just kick it off by specifying an entrypoint (and doing the resolution).

The module job isn't a Promise but uses Promises to schedule work. That's why I used "resolutions" - a job depends on many resolutions right now, one for each time it needs to do work on the main thread.

Yes and no. It does right now, but it doesn't need to. It's implemented as such to be API forward compatible with potential in-thread async resolver hooks; but those are problematic for other reasons, so running them off-thread (and simplifying the API to not be unnecessarily async tagged) is nicer anyway (plus, the API is internal anyway, so it was a bit of a premature over-design).

jkrems · 2019-12-18T22:57:08Z

ESM graph execution actually isn't "scheduled" at all yet.

I wasn't talking about execution, I was talking about load and compile. My graph above ended with a single execute but I maybe should've been clear that it was only included for illustrative purposes as "the thing happening after the whole process is over". :)

All that ModuleJob scheduling chicanery is actually handled by v8 itself - we just kick it off by specifying an entrypoint (and doing the resolution).

It's not, the module job is handled by the embedder which in this case is us. We lifted the name "module job" from the HTML spec which is implemented in Chrome/Blink. V8 has no orchestration for resolution and resource loading built-in afaik.

Yes and no. It does right now, but it doesn't need to.

That's what I wrote above. :) But I also mentioned very specific advantages for allowing fetch&compile to happen while execution of the main thread continues. It's not just "because we can". It's because it's a known issue with require that blocking the main thread while dependencies load and parse can be costly. At least from my side running async resolvers in-thread was never a goal.

BridgeAR · 2020-01-12T00:45:31Z

Without looking through all the comments: this has four -1 and there was no movement since a couple of weeks. Should this stay open for further discussions or would it be fine to close this?

MylesBorins · 2020-01-14T16:38:08Z

@BridgeAR it is actively being worked on please do not close

Raynos · 2020-05-28T10:36:45Z

From reading the test suite changes it looks like this PR will fix #33605

GeoffreyBooth · 2020-05-28T16:58:16Z

As of right now, this is missing:

Support for "type": "module" in the cjs loader. (where did that go? I thought we already had something that did that internally - getPackageType or something?)

@weswigham I think #33229 should have resolved this part, at least. Are you still working on this PR?

dfabulich · 2020-08-11T01:49:25Z

In a comment on Wes' proposal issue for this PR @weswigham wrote:

Ah, yep, nodejs/node#34558 landed yesterday. I might be able to remake/update nodejs/node#30891 in the hopefully nearish future.

So, yes, please don't close this PR!

addaleax · 2021-06-20T16:16:04Z

I’ll close this as an approach that spins the event loop while it is already running is currently a non-starter purely from a technical point of view, as mentioned above, and there hasn’t been activity in a year here.

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Dec 10, 2019

weswigham mentioned this pull request Dec 10, 2019

Open issues with require(esm) nodejs/modules#454

Closed

devsnek suggested changes Dec 10, 2019

View reviewed changes

devsnek reviewed Dec 10, 2019

View reviewed changes

weswigham added 4 commits December 10, 2019 15:02

Fix lints

286692b

Remove unneeded call to promiseWait - we only need this once TLA su…

fe74e6f

…pport is in, and until then, the results are a bit off

Fix mjs test

afe50b9

weswigham force-pushed the support-require-esm-in-the-style-of-TLA branch from 5a8c931 to afe50b9 Compare December 10, 2019 23:06

jkrems requested changes Dec 10, 2019

View reviewed changes

mhdawson mentioned this pull request Dec 16, 2019

Node.js Foundation Modules Team Meeting 2019-12-18 nodejs/modules#455

Closed

GeoffreyBooth mentioned this pull request May 28, 2020

Cannot use mjs file synchronously in commonJS without await / .then() #33605

Closed

BridgeAR force-pushed the master branch 2 times, most recently from 8ae28ff to 2935f72 Compare May 31, 2020 12:19

sokra mentioned this pull request Jun 10, 2020

Add support for pkg.exports.module or similar webpack/webpack#11014

Closed

weswigham mentioned this pull request Aug 4, 2020

Proposal for single-mode packages with optional fallbacks for older versions of node #49450

Open

renoirb mentioned this pull request Aug 5, 2020

Need clarifications: Warns when fileName extension is .mjs for ESM and Node.js v14 egoist/bili#435

Open

aduh95 mentioned this pull request Aug 14, 2020

ESM dynamic import via CommonJS #34775

Closed

aduh95 mentioned this pull request Mar 18, 2021

chore(test): refactor tests to use ESM syntax joshgoebel/highlight.js#2

Closed

2 tasks

guybedford force-pushed the master branch from dc5a5da to 8e46568 Compare March 29, 2021 21:33

MylesBorins mentioned this pull request Jun 17, 2021

module: proxy require('esm') to import('esm') #39064

Closed

addaleax closed this Jun 20, 2021

joyeecheung mentioned this pull request Mar 5, 2024

module: support require()ing synchronous ESM graphs #51977

Closed

joyeecheung mentioned this pull request Mar 21, 2024

module: increased surface for hazards with require(esm) experimental flag #52173

Open

[WIP] Support requiring .mjs files #30891

[WIP] Support requiring .mjs files #30891

Conversation

weswigham commented Dec 10, 2019

ljharb commented Dec 10, 2019

guybedford commented Dec 10, 2019

devsnek left a comment

Choose a reason for hiding this comment

weswigham commented Dec 10, 2019 • edited

devsnek commented Dec 10, 2019

devsnek Dec 10, 2019

Choose a reason for hiding this comment

weswigham Dec 10, 2019

Choose a reason for hiding this comment

devsnek Dec 10, 2019

Choose a reason for hiding this comment

weswigham Dec 10, 2019

Choose a reason for hiding this comment

weswigham Dec 10, 2019 • edited

Choose a reason for hiding this comment

weswigham commented Dec 10, 2019

devsnek Dec 10, 2019

Choose a reason for hiding this comment

weswigham Dec 10, 2019

Choose a reason for hiding this comment

devsnek commented Dec 10, 2019

weswigham commented Dec 10, 2019

devsnek commented Dec 10, 2019

weswigham commented Dec 10, 2019 • edited

devsnek commented Dec 10, 2019

jkrems left a comment

Choose a reason for hiding this comment

weswigham commented Dec 10, 2019 • edited

weswigham commented Dec 10, 2019

devsnek commented Dec 10, 2019

weswigham commented Dec 10, 2019 • edited

weswigham commented Dec 10, 2019

bmeck commented Dec 16, 2019 • edited

jkrems commented Dec 16, 2019

ljharb commented Dec 16, 2019

jkrems commented Dec 18, 2019 • edited

bmeck commented Dec 18, 2019

jkrems commented Dec 18, 2019

bmeck commented Dec 18, 2019

jkrems commented Dec 18, 2019

weswigham commented Dec 18, 2019

jkrems commented Dec 18, 2019 • edited

BridgeAR commented Jan 12, 2020

MylesBorins commented Jan 14, 2020

Raynos commented May 28, 2020

GeoffreyBooth commented May 28, 2020

dfabulich commented Aug 11, 2020

addaleax commented Jun 20, 2021

weswigham commented Dec 10, 2019 •

edited

weswigham Dec 10, 2019 •

edited

weswigham commented Dec 10, 2019 •

edited

weswigham commented Dec 10, 2019 •

edited

weswigham commented Dec 10, 2019 •

edited

bmeck commented Dec 16, 2019 •

edited

jkrems commented Dec 18, 2019 •

edited

jkrems commented Dec 18, 2019 •

edited