Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: handle extension-less "bin" files #41552

Conversation

JakobJingleheimer
Copy link
Contributor

@JakobJingleheimer JakobJingleheimer commented Jan 16, 2022

There have been several issues reported recently around extension-less files failing when --experimental-loader is used, and I myself encountered this.

I think the circumstances under which the failures occurred are legitimate use-cases specifically supported in the design of the relevant core module(s). Specifically in regard to "bin" (executable) files, this is likely to affect a non-trivial amount of users (ex users of Mocha which has 5.6M weekly downloads as of the opening of this PR).

Note that this PR does not affect specifier resolution, only determining format (when the filename on disk literally has no extension): when the specifier is not properly resolved (under the current rules), this has no effect.

cc @nodejs/loaders @nodejs/modules

Resolves #33226, resolves #41275, resolves #41465

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Jan 16, 2022
@@ -74,7 +74,7 @@ function getLegacyExtensionFormat(ext) {

function getFileProtocolModuleFormat(url, ignoreErrors) {
const ext = extname(url.pathname);
if (ext === '.js') {
if (ext === '.js' || !ext) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a broader impact than just "bin" files.

I could artificially limit it with something like the parentDir check I put together as a quick-fix to work around this issue for my own projects, so that it affects only "bin" files. I'm not sure whether the extra complexity would be worthwhile.

Is there a particular reason other extension-less files should specifically be excluded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if extensionless files are wasm in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclusively or additionally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclusive isn't possible; it'd have to be additionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When more than 2 options are possible, I would probably just convert the logic to some kind of Map or Set.

@ljharb
Copy link
Member

ljharb commented Jan 16, 2022

type module should only impact .js files; extensionless files either need to be only CJS forever, or, should be able to be any format node supports, which is 2 now but will be 3 soon.

@GeoffreyBooth
Copy link
Member

Search for "type": "wasm" in the closed PRs. We decided not to support extensionless files via the ESM loader because we didn't want to lock down that extensionless = JavaScript. There are workarounds like symlinks or extensionless CommonJS wrappers.

@JakobJingleheimer
Copy link
Contributor Author

extensionless files either need to be […] or, should be able to be any format node supports, which is 2 now but will be 3 soon.

@ljharb oh, sure. I wasn't intending to limit it to only CJS or ESM (that either/or logic here already existed—I just enabled it when there is no extension).


We decided not to support extensionless files via the ESM loader because we didn't want to lock down that extensionless = JavaScript.

@GeoffreyBooth do you mean as a general principle or as a blanket rule? It's common practice for executables to have no extension, so I think that should at least not be applied here to "bin". I'd be happy to limit to just bin 🙂 (I just figured it would be less work to add if desired rather than remove if undesired).

@ljharb
Copy link
Member

ljharb commented Jan 16, 2022

@JakobJingleheimer right but it's not common practice to have extensionless ESM files, because that's not supported yet.

We shouldn't allow it to become common practice if doing so would prevent extensionless WASM files.

@JakobJingleheimer
Copy link
Contributor Author

Yes, I agree it would be an undesirable outcome if this did preclude extensionless wasm. I can't think of how it could though:

project-a/
  ├ bin/
     └ wasm-executable
  ├ src/
  └ package.json
{
  "name": "project-a",
  "bin": "./bin/wasm-executable",
  "type": "wasm"
}
// test.mjs
import projectA from "project-a" // all good in the neighbourhood

The above would actually fail without this PR.

@ljharb
Copy link
Member

ljharb commented Jan 16, 2022

type: wasm would be expected to impact .js files, since that's what type does.

@JakobJingleheimer
Copy link
Contributor Author

JakobJingleheimer commented Jan 16, 2022

type: wasm would be expected to impact .js files, since that's what type does.

For now. Maybe it won't in future when WASM is fully supported 🤷‍♂️ Or maybe there will be some other flag, in which case we'd have to check both anyway since CommonJS files can already be extension-less: Pandora's Box is already open (this problem already exists).

@aduh95
Copy link
Contributor

aduh95 commented Jan 16, 2022

WASM files start with a magic word (00 61 73 6d), ideally that would be how Node.js detect them. However if we expect extentionless files to be JavaScript, that certainly complicate things. Should we impose a magic phrase for exentionless files to be treated as JS? (e.g. iif the file starts with #!/usr/bin/env node\n, it's parsed as JavaScript text file, otherwise the loader throws?)

@ljharb
Copy link
Member

ljharb commented Jan 16, 2022

All these questions are exactly why no changes should be made to extensionless files yet - to keep the design space maximally open for when wasm arrives and gets ecosystem adoption and we can consider use cases.

Indeed i suspect it would have to be another flag; the “type” flag unfortunately forced us into a position where the flag name has insufficient actual relation to what the flag does, and it’s not extensible to other formats and extensions.

@JakobJingleheimer
Copy link
Contributor Author

WASM files start with a magic word (00 61 73 6d), ideally that would be how Node.js detect them.

Oof, I hope not 😳 that's like how a dog determines whether it should put something in his mouth by putting it in his mouth.

IMO, checking for that magic word should be considered the same as checking for the magic word in an image file to know what kind it is: A last resort to absolutely confirm.


Since the problem this PR is intending to address occurs only when an ESM loader hook is used, would it be sufficient to also gate this fix behind that experimental CLI flag? Thus the experimental introduction of the problem and the experimental fix to the problem are tethered and atomic.

@GeoffreyBooth
Copy link
Member

Since the problem this PR is intending to address occurs only when an ESM loader hook is used

I think this is the problem that should be addressed: whether the --loader flag is used or not should have no bearing on how the entry point is resolved. So node ./node_modules/.bin/mocha should resolve to the same file (and be interpreted as the same module type) as node --loader foo.mjs ./node_modules/.bin/mocha. Whether the entry is extensionless or not shouldn’t matter.

This would also solve the original issue, and would preserve the status quo where extensionless files in a CommonJS package scope are always treated as CommonJS, and extensionless files in an ESM package scope are unknown/unloadable.

@GeoffreyBooth
Copy link
Member

WASM files start with a magic word (00 61 73 6d), ideally that would be how Node.js detect them. However if we expect extentionless files to be JavaScript, that certainly complicate things. Should we impose a magic phrase for exentionless files to be treated as JS? (e.g. iif the file starts with #!/usr/bin/env node\n, it’s parsed as JavaScript text file, otherwise the loader throws?)

This isn’t a bad idea, and if it applies only to the entry point it wouldn’t present a performance concern (because we’re only checking a single file) but it would lock us into requiring that all future entry points are distinguishable based on magic words/phrases. I think for now, because the future is so uncertain regarding potential new module types, we want to keep the design space here as open as possible. The workaround of a one-line extensionless CommonJS wrapper that just has an import() to the “real” ESM entry point isn’t so cumbersome, and I think is good enough for the relatively small number of users publishing CLI packages.

For now we should probably document (if it isn’t in the docs already) that extensionless ESM entry files are unsupported, with an example of the CommonJS wrapper workaround.

@targos
Copy link
Member

targos commented Jan 17, 2022

The workaround of a one-line extensionless CommonJS wrapper that just has an import() to the “real” ESM entry point isn’t so cumbersome, and I think is good enough for the relatively small number of users publishing CLI packages.

Even for CLI packages, there is no need for a CommonJS wrapper. You can publish the bin entry points with an extension, because npm already creates an extensionless shell wrapper for them.

@JakobJingleheimer
Copy link
Contributor Author

JakobJingleheimer commented Jan 17, 2022

if it applies only to the entry point it wouldn’t present a performance concern (because we’re only checking a single file)

ah, that's true. Sorry, Antoine, I thought you were suggesting this as a way to determine all (extension-less) wasm files.


I think this is the problem that should be addressed: whether the --loader flag is used or not should have no bearing on how the entry point is resolved.

The problem with that is the the --loader flag causes ESMloader to get used instead of CJSLoader, thus entering this flow. IF the user does want the main entry point to go through ESMLoader (let's say, for something not an extension-less bin), there's no other way to indicate that (which is why I presume that check occurs before the check for .cjs). Without changing that or adding something new, I think there's no other way than somehow addressing within get_format().


Even for CLI packages, there is no need for a CommonJS wrapper. You can publish the bin entry points with an extension, because npm already creates an extensionless shell wrapper for them.

Do all the major package managers do that? (If so, it sounds like we should specifically say "don't publish an extension-less bin entry point" and possibly throw if they do, similar to @aduh95's idea above.)

@GeoffreyBooth
Copy link
Member

The problem with that is the the --loader flag causes ESMloader to get used instead of CJSLoader, thus entering this flow.

Right, this is what I think we should change. The flag should not cause a change in which loader is used for the entry point. If necessary, we could have an “entry loader” that only determines what should be done with the entry point, and then passes it off to the appropriate loader. This new loader should also be unaffected by whether or not --loader is passed.

@ljharb
Copy link
Member

ljharb commented Jan 17, 2022

Do all the major package managers do that?

npm does that, which means that any alternative that doesn’t is broken ¯\_(ツ)_/¯

@JakobJingleheimer
Copy link
Contributor Author

This new loader should also be unaffected by whether or not --loader is passed.

Lions, and tigers, and bears—oh my! Entry-point loader, ambient loader, (lay) loader 😵‍💫

I like the idea of that if we can somehow avoid a maelstrom of loaders and let the loader control its own destiny, but without creating a whole new concept.

What if the existing loader somehow signalled whether it should affect entry-point flow? Ex export const entryPointDeterimant = true


npm does that, which means that any alternative that doesn’t is broken ¯_(ツ)_/¯

Isn't Node.js supposed to be package manager agnostic now?

(If yarn also does it, I think we're sufficiently covered?)

@ljharb

This comment has been minimized.

@arcanis

This comment has been minimized.

@JakobJingleheimer
Copy link
Contributor Author

let the loader control its own destiny, but without creating a whole new concept.

What if the existing loader somehow signalled whether it should affect entry-point flow? Ex export const entryPointDeterimant = true

Actually, nevermind. It should be the user's decision, not the loader's.

If we can't go the throw on extension-less entry point route (which sounds like the better option), could we use the --input-type flag currently reserved for --eval and friends to signal that the entry point is to be CJS or ESM? (Or if not expanding --input-type, a new one, like --entry-type=[commonjs|module])

node \
--input-type=module \
--loader=whatever.mjs \
…

@ljharb

This comment has been minimized.

@arcanis

This comment has been minimized.

@bmeck
Copy link
Member

bmeck commented Jan 17, 2022

I think solutions are coming before constraints here. The goals are separated into:

  • allow extension-less files for the ESM loader
    • if entry point based how to deal w/ problems with wrapping scripts: PM2/mocha/etc. since the loader would be stateful and needs to be in entry point state to properly load these files. I personally think entry point exceptions are not worth the confusion/complexity.
  • allow ESM loaders to be used in conjunction with extension-less files
    • in general this is a very different problem since various solutions like allow type:commonjs to never load through the ESM loader for the entry point is a valid fix

These have been discussed at length multiple times and I personally don't think it is a high priority or worth the heated debate since a simple workaround exists and is even used in CJS to load things like .node files since they similarly cannot be extension-less. Answering how to solve these goals is actually prone to a lot of bike-shedding vs just showing a solution like the wrapper script done by yarn/npm/pnpm/etc. or manually writing your own. Using various magic like WASM magic bytes is equally problematic for forwards compat since not all magic byte combinations are mutually exclusive (particularly with various archive formats like .zip)! I think before jumping on a complex solution with heated debate we should just decide if this is not solvable already with a simple/easily learned workaround or if a capability is not reasonably easy and needs to be expanded. The capability we add doesn't necessarily need to be allowing extension-less files directly through the ESM loader, just allowing the goals to be achieved.

@GeoffreyBooth
Copy link
Member

  • allow extension-less files for the ESM loader

For reasons stated, I think this is a non-goal at the moment. We should document it as such if it isn't already documented.

  • allow ESM loaders to be used in conjunction with extension-less files

Or put another way, --loader (or probably any flag?) shouldn’t affect which loader is used for any entry point. This is a bug that I think should be fixed.

@guybedford
Copy link
Contributor

Or put another way, --loader (or probably any flag?) shouldn’t affect which loader is used for any entry point. This is a bug that I think should be fixed.

Agreed! This used to be because resolve handled both resolution and format. Given that we now have a separate getFormat hook, perhaps this hook could manually be called for the top-level main when using a loader, with that selection then determining the entry point module system (CJS leading to CJS loader, ESM to ESM). So a sort of out-of-band call to this loader hook from the main entry system, instead of the custom extension and type check. I believe that should be fully consistent?

@guybedford
Copy link
Contributor

One of the big problems is that the Node.js bootstrapping into CJS absolutely needs to be synchronous for timing requirements - a bunch of tests will fail if the event loop is yielded before the CJS loader is initiated. This ties us to needing to synchronously determine the module system upfront to be able to ensure a sync CJS bootstrap.

Node.js having an async bootstrap would be preferable IMO but eg many async hooks test cases would fail under the invariant change.

@cspotcode
Copy link

You're saying that the presence of --loader makes the CJS bootstrap async, so those tests will fail if --loader is added to them, on stable node?

@aduh95
Copy link
Contributor

aduh95 commented Jan 19, 2022

The presence of --loader makes Node.js use ESM loader for resolving the entry point, which is asynchronous. Once it has resolved the module to be a CJS file, it would be passed to the synchronous CJS loader, but the event loop has already ticked at this point.

@cspotcode
Copy link

Coming back to the core of the bugfix: "a flag can't affect or determine which loader is used"

Supposing we update the ESM loader so that node's treatment of extensionless entrypoints is identical whether or not you pass --loader ./empty-text-file.mjs. Then CJS bootstrap will be sync when --loader is omitted, and async when it is passed, right? --loader will affect the sync/async-ness but not the format of the entrypoint.

Is that acceptable? Would it adequately fix the bug today without too much effort nor breaking tests, and still allow for improvements to be made in the future?

@JakobJingleheimer
Copy link
Contributor Author

Supposing we update the ESM loader so that node's treatment of extensionless entrypoints is identical whether or not you pass --loader ./empty-text-file.mjs.

I believe that's what this PR currently does.

Then CJS bootstrap will be sync when --loader is omitted, and async when it is passed, right? --loader will affect the sync/async-ness but not the format of the entrypoint.

I think your suggestion above will not result in this as they are unrelated. Also, CJS loader cannot be async, so I don't understand what this second bit is positing.

@JakobJingleheimer
Copy link
Contributor Author

node --loader tsloader x.ts

where x.ts should be loaded through the ESM loader would be one example here.

I think (assuming the flag checks were removed from shouldUseESMLoader()) this would still use ESM loader if the package.json contained "type": "module" (otherwise it would fallback to the default) as it would hit https://github.com/nodejs/node/blob/master/lib/internal/modules/run_main.js#L43

@cspotcode
Copy link

Another use-case is node --loader tsloader x.mts with package.json type omitted or set to commonjs. .mts compiles to .mjs, so the loader should facilitate the same for .mts files.

@GeoffreyBooth
Copy link
Member

Another use-case is node --loader tsloader x.mts with package.json type omitted or set to commonjs. .mts compiles to .mjs, so the loader should facilitate the same for .mts files.

I think that’s a case where the user might want the loader to apply to x.mts, but I think in order to do so then loaders need to support both CommonJS and ESM (which would be a great enhancement, if we can get there). I think we wouldn’t want --loader to mean “treat the entry as ESM” so that we can let --loader apply to CommonJS entries in the future. The workaround in the meantime is node --loader ts-loader --input-type=module 'import "./x.mts"' which isn’t so bad.

@guybedford
Copy link
Contributor

but I think in order to do so then loaders need to support both CommonJS and ESM (which would be a great enhancement, if we can get there)

Agreed this is likely the more general fix to what is yet another sync / async issue.

@guybedford
Copy link
Contributor

Just moving loaders to a separate thread allowing syncified loader hook calls could likely unblock the loader from entirely controlling the main entry point process.

@cspotcode
Copy link

I think that’s a case where the user might want the loader to apply to x.mts, but I think in order to do so then loaders need to support both CommonJS and ESM (which would be a great enhancement, if we can get there).

I'm coming from the perspective of users who know this is already true today: if you use --loader ts-node/esm then both CommonJS and ESM resolution and transformation of TypeScript will work. The ESM loader auto-installs the CJS loader, too. This is what the user wants.

Since we're already requiring users to pass flags to node 1, you can imagine that the user is sufficiently motivated to pass these flags. So in the future, if loaders are moved to a separate thread, we can assume the user will read the appropriate documentation and learn that they must do the following: node --loader ts-node/esm --require ts-node/register x.mts. Either way, the flags are likely to be opaque copy-paste to many users.

Point being that package.json type is not sufficient to determine the entrypoint's type. .mts and .cts file extensions can be interpreted by a loader to contradict package.json type. It is already possible today for an ESM loader to resolve an x.mts or x.cts entrypoint as either module type, asynchronously delaying CJS execution if the entrypoint is classified as CJS by the loader.

One of the big problems is that the Node.js bootstrapping into CJS absolutely needs to be synchronous for timing requirements

My comments about sync vs async were attempting to understand this constraint, since stable node can already load a CJS entrypoint async in the presence of --loader. The loader asynchronously tells node to treat the file as CJS, and then it synchronously executes, but the event loop has already spun at that point. I understand this may change when moving loaders to a separate thread. But it seems like the sync constraint as described only applies to tests which are executing without --loader, because otherwise they'd be failing already.

Footnotes

  1. it is messier than merely invoking ts-node but avoids the overhead of another process spawn

@guybedford
Copy link
Contributor

But it seems like the sync constraint as described only applies to tests which are executing without --loader, because otherwise they'd be failing already.

The issue is this is eg a bunch of Node.js timing and async hooks test cases. Definitely asyncifying the main Node.js bootstrap is another route here, it's just that those async hooks test cases would have to break....

@bmeck
Copy link
Member

bmeck commented Jan 20, 2022

To clarify a bit on async bootstrapping, altering the tests themselves is fine to do and they are there mostly for stability not invariants; but doing so adds a few complexities since async_hooks etc. don't want to have incomplete tasks when the application starts up. It is non-trivial to fix, but if someone wants to do so they would be quite the hero. However, like @guybedford seems to say, that would best be done as a separate PR as it doesn't directly affect the goals discussed above.

@JakobJingleheimer
Copy link
Contributor Author

JakobJingleheimer commented Jan 20, 2022

@cspotcode is there a compelling, practical reason to omit "type": "module" in the case of an .mts entry point aside from a trivial amount of quasi-redundancy? ("Quasi" because typescript is outside the purview of Node.js, and .mts is a typescript concept, not Node.js's.)


@arcanis @bmeck could this whole "use CJS vs ESM loader" be handled by an "ambient loader" we talked about in the most recent Loaders team meeting? This seems too widespread an issue to offload to users to solve though 🤔

@ljharb
Copy link
Member

ljharb commented Jan 20, 2022

@JakobJingleheimer "type": "module" should only be present when there's .js files authored as ESM. Why would you want to include it with .mts?

@GeoffreyBooth
Copy link
Member

This thread is getting a little unmanageable. I think it’s clear that we won’t be merging this PR in as is, and we need a discussion to address the issue of how to handle entry points when --loader is passed. Perhaps we can close this PR and start a discussion thread?

@JakobJingleheimer
Copy link
Contributor Author

@ljharb ah but typescript does not play by the rules, does it 🙃 if the biggest annoyance of my day was having to also specify "type": "module", that would be a pretty good day.

@cspotcode
Copy link

TypeScript compiles .mts into .mjs, so the format rules for .mts should match the format rules for .mjs.

If defaultResolve is taught to determine format of entrypoints identically to node sans --loader -- without changing its behavior for non-entrypoints -- then this bug is fixed: CJS bootstrap without --loader remains sync, CJS bootstrap with --loader remains async, and format of entrypoints is unaffected by the presence of --loader.

@JakobJingleheimer
Copy link
Contributor Author

JakobJingleheimer commented Jan 20, 2022

@GeoffreyBooth I don't have the power to convert this into a discussion.


Last ditch idea before this gets converted:

./package.json:

{ "type": "commonjs" }

./foo.js:

const assert = require('assert');

import('./bar.mjs')
  .then((mod) => { assert(typeof mod.default, 'function') });
node --loader ./whatever.mjs --require ./whatever.cjs ./foo.js

In the above, foo.js is wrongly inferred to be ESMLoader-territory, despite package.json specifically telling node it's CJSLoader-territory. The extension-less scenario with Mocha is essentially the same.

AFAIK, there is no documented rule that --loader or --require affect choosing CJSLoader vs ESMLoader, so in doing so, node is currently behaving contrary to its documented rules, which are either "type" in package.json or file extension.

When the flag checks are removed from shouldUseESMLoader(), node behaves as spec'd (and this bug is resolved).

The documented purpose of the --loader flag is merely to supply a custom ESM loader.

Whether and how an entry-point is processed by a loader or require whatever is a different, probably larger issue.

I think removing the --experimental-specifier-resolution and --experimental-loader flag checks from shouldUseESMLoader() is very unlikely to cause a problem at this stage, and if it does, it's easily worked around (ex if wanting a commonjs or typescript entry-point file to go through ESMLoader, create a simple esm wrapper file and import the other file, or via cli node --loader=ts-loader.mjs --input-type=module -e "import 'foo.mts'").

@guybedford
Copy link
Contributor

I don't think we should make node x.ts --loader ts-loader any more complicated than that by requiring extra arguments or wrappers. TypeScript is and will continue to be a first class citizen in the JS ecosystem.

Another option for extensionless bins to be made to work with loaders could be having the --loader flag imply extensionless handling for the main within the default loader rules when a user loader is set that doesn't do otherwise, as a special exception to the usual handling to ensure the default behaviour matches up with loaderless.

@GeoffreyBooth
Copy link
Member

@GeoffreyBooth I don’t have the power to convert this into a discussion.

Neither do I, I think only issues can convert into discussions. Regardless, I think we should close this PR and you can open a discussion thread.

@cspotcode
Copy link

Another option for extensionless bins to be made to work with loaders could be having the --loader flag imply extensionless handling for the main within the default loader rules when a user loader is set that doesn't do otherwise, as a special exception to the usual handling to ensure the default behaviour matches up with loaderless.

To do this without passing an isMain to the resolve loader, nor setting parentUrl to some well-known URL that indicates it's the main entrypoint, we can pass a defaultResolve function that knows it's for the main entrypoint. It can return format: "CommonJS" for an extensionless URL that it receives from the user resolve hook. Essentially isMain will be bound within defaultResolve but not exposed to user loaders.

User loaders will still be able to probe it by passing an extensionless URL to guess if they're resolving the entrypoint, even though they won't be given an explicit isMain flag. But they won't be certain if format: CommonJS is coming from node's default loader or another user loader in the chain.

Is this loosely the implementation you had in mind?

@guybedford
Copy link
Contributor

@cspotcode that certainly sounds like a good approach if wanting to avoid isMain or a parentURL sentinel.

@JakobJingleheimer
Copy link
Contributor Author

Per the above (since this is getting quite long): #41711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run.
Projects
None yet