Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend PURE annotations to work on getters as well. #513

Open
Andarist opened this issue Nov 16, 2019 · 22 comments
Open

Extend PURE annotations to work on getters as well. #513

Andarist opened this issue Nov 16, 2019 · 22 comments

Comments

@Andarist
Copy link
Contributor

Andarist commented Nov 16, 2019

Feature request

#__PURE__ works only for functions calls at the moment - it seems to me that it seems reasonable to extend its meaning to also support getters. Especially that there is already pure_getters option but which has some drawbacks:

  • it's a global setting
  • cannot be provided by library authors.

If this is something that you would consider adding then I could potentially look into implementing it myself - especially that it seems that most pieces in the code are already there (pure_getters support and #__PURE__ handling for function calls).

We probably would have to agree if it should make all nested getters pure (as in i.want.to.access.this) - but it seems that it should. If we'd consider a single getter marked as pure in such a situation then annotating all of them within a single expression would just look weird and become unwieldy.

@kzc
Copy link
Contributor

kzc commented Nov 16, 2019

I do not support this proposed feature request.

Although slightly awkward, the existing pure annotation behavior can make any arbitrary expression "pure" using this idiom:

var drop_if_unused = /*@__PURE__*/(() => expr)();

or if you prefer ES5:

var drop_if_unused = /*@__PURE__*/(function () { return expr; })();

Efforts should be spent optimizing minification using this pattern rather than inventing a new set of rules governing pure annotations with potential backward compatibility issues. Angular uses this idiom quite extensively to great effect on a very large code base.

The current behavior of /*@__PURE__*/ annotations is well defined and works well with code in the wild. It is implemented uniformly on uglify-js, terser and rollup.

Comments are not a first class construct in ASTs and are problematic to say the least. I suggest to leave the semantics of pure annotations as is and limited to calls and new expressions. If any other arbitrary expression is annotated then combinations of other optimizations could lead to shifted code and ambiguous interpretation. If I were to go back in time and reimplement pure annotations I'd probably have chosen a regular function call mechanism, such as __PURE__(expr) instead of using comments.

@Andarist
Copy link
Contributor Author

Problem is that often it's hard to convince people (lib owners) to restructure their code "just" to make it for minifiers/tree-shakers/etc do their job more efficiently.

Take a look at this situation:
https://github.com/mridgway/hoist-non-react-statics/blob/695c9884d19133dc97b2b2d37118836c03abcd41/src/index.js#L58-L63

const defineProperty = Object.defineProperty;
const getOwnPropertyNames = Object.getOwnPropertyNames;
const getOwnPropertySymbols = Object.getOwnPropertySymbols;
const getOwnPropertyDescriptor = Object.getOwnPropertyDescriptor;
const getPrototypeOf = Object.getPrototypeOf;
const objectPrototype = Object.prototype;

I highly doubt the author will be willing to change this to:

const defineProperty = /* #__PURE__ */ (() => Object.defineProperty)();
const getOwnPropertyNames = /* #__PURE__ */ (() => Object.getOwnPropertyNames)();
const getOwnPropertySymbols = /* #__PURE__ */ (() => Object.getOwnPropertySymbols)();
const getOwnPropertyDescriptor = /* #__PURE__ */ (() => Object.getOwnPropertyDescriptor)();
const getPrototypeOf = /* #__PURE__ */ (() => Object.getPrototypeOf)();
const objectPrototype = /* #__PURE__ */ (() => Object.prototype)();

Adding a comment without changing the structure (to, as also admitted by you, awkward one) has a much higher chance to be merged in.

@kzc
Copy link
Contributor

kzc commented Nov 17, 2019

Sorry, but the use case is not compelling. The vast majority of JS coders only use default minify settings and do not annotate code. If a library author is truly concerned about dead code elimination the existing pure annotation syntax would not be an impediment. As mentioned above, the existing mechanism has been used successfully in very large projects for years. This proposed feature would add unnecessary complexity to the minifier for something that is readily accomplishable.

@kzc
Copy link
Contributor

kzc commented Nov 17, 2019

The example cited https://github.com/mridgway/hoist-non-react-statics/blob/695c9884d19133dc97b2b2d37118836c03abcd41/src/index.js#L58-L65 could be better optimized by simply moving the const definitions in to the default function exported, since all those definitions are required by the function anyway. The key to dead code elimination is keeping everything in functions and not having toplevel property accesses. @filipesilva wrote a linting plugin to detect such scenarios in the Angular project.

@fabiosantoscode
Copy link
Collaborator

I think it's a bit awkward that getters have to be treated this way by terser, but I think the "effect vs maintenance cost" ratio here isn't worth it sadly. Even for a different annotation (like __PURE_GETTER__ or something).

@filipesilva
Copy link

For some extra context, at Angular we also think it's too onerous to write any __PURE__ comment at all and instead use https://github.com/angular/angular-cli/tree/master/packages/angular_devkit/build_optimizer to add them automatically.

But we also do a lot of testing to ensure tree shaking is working as expected in our codebases. Just making a PR for a lib to have the comment would not be enough, it would need a supporting tests to prevent regressions (e.g. ReactiveX/rxjs#4769). The linting plugin that @kzc mentioned is used in that PR along with https://github.com/filipesilva/check-side-effects.

@fabiosantoscode
Copy link
Collaborator

Maybe we need a package.json field to specify "hey, this package doesn't have any side effects outside of functions, so module init code is all game for you to drop babe". Maybe this is a better solution than spamming pure comments.

Then module bundlers would drop a comment or directive saying "(in this here closure/in this here file) getters outside of functions are pure, and calls too."

What do you fellow humans think? Reopening for discussion.

@filipesilva
Copy link

Webpack (here) and Rollup (rollup/rollup#2593) support the sideEffects flag in package.json for that purpose. There's some nuance with regards to the capabilities of each of these tools to fulfil the vision, but I think that's the intention of the flag.

I don't think Terser can do anything about that because Terser operates on individual JS files/modules. Checking package.json fields is something that bundlers can do because they actually load all the code according to some logic, but Terser does not do that.

This is part of the reason why Rollup does so much better at tree-shaking than Webpack+Terser. Rollup has the module loading information accessible for use by tree shaking, whereas that information is lost in the transition from webpack to terser.

@kzc
Copy link
Contributor

kzc commented Nov 18, 2019

This is part of the reason why Rollup does so much better at tree-shaking than Webpack+Terser.

Since we're having a meta discussion... Terser would be much more effective if it were to become an ES6+ code bundler in its own right. The projects I work on tend to produce equally small bundles with Rollup tree shaking disabled and subsequent minification with module=true,passes=3,unsafe=true. Until Webpack puts everything into a single namespace as Rollup does it won't be able to be as effective.

Uglify and Terser have to be more conservative by default because they don't know what sort of code they are dealing with. Historically Rollup has been a bit loose with rules governing property accesses in order to make some of its gains, and there are a few longstanding bugs related to that. I won't bother to link to them here, but I was commenting on one such issue in that project just yesterday. Rollup is also becoming more conservative by default as result as these issues are addressed.

Micro-optimizations ultimately have much less of an impact than user code reorganization to facilitate dead code elimination. Users by in large do not understand why their apparently unreferenced code and external libraries are being retained. There is no silver bullet.

Making upstream tooling tree shaking friendly would have the most bang for the buck. One such example is the yet-to-be-merged Babel PR @Andarist wrote concerning wrapping static class properties into the pure class IIFE would go a long way to dropping a lot of unreferenced code in the wild: babel/babel#6963.

@filipesilva
Copy link

Wrapping static class props/methods into an IIFE is one of highest yield optimizations that Build Optimizer does. Whenever something goes wrong in that one, bundle sizes balloon up.

The single namespace/module concatenation issue is not the full story though. Even if webpack were to do module concatenation perfectly, it would still not be optimal when dynamic imports are present. Terser is now able to follow the indirection provided by webpack module loader, whereas rollup performs it's tree shaking with that knowledge.

I don't know if Terser would be able to follow dynamic ES6 imports though. That might be an interesting avenue.

@fabiosantoscode
Copy link
Collaborator

I've been thinking of creating a terser-integrated module bundler for a while!

The problem is, the Terser AST is pretty memory hungry at the moment.

However, we might be able to do some interesting stuff by pre-parsing the files and storing a disk representation of the AST (hopefully with something more stable than https://nodejs.org/api/v8.html#v8_serialization_api) in a cache directory. We can even parse them in different threads if we want. Once parsed, they are not re-parsed unless the underlying file changed. Then we find out:

  • What each file imports
  • What it exports (is it a constant? A single-use function? A function with /INLINE/?)
  • Is it a pure module? (knowing of options like pure_getters and package.json's sideEffects property)
  • Does this exported value ever change? (IE do we need to use an object to wrap exports so they're modifiable?)
  • Does this module have a .min.js file extension? Then we don't minify it by default, since terser takes ages to minify already-minified files.

This would allow for some really interesting avenues, where we can compress with knowledge of what exists in other files. And also, with the knowledge of where the output file is going to run (node, worker, browser or all three), allowing us to unwrap UMDs and mark window or process as undefined upfront.

And of course we can use propmangle much more efficiently using a shared cache, which is pretty tricky to set up at the moment.

Besides optimizations, I think we can offer some things other bundlers already offer, such as:

  • Creating your HTML file for you with the correct script tags
  • A server into which you can plug your API or server or whatever. You can use this server in production if NODE_ENV is "production" and it doesn't do any hot reloading or on-the-fly building, instead relying on existing files.
  • Hot reloading in the above server

It would of course be a major undertaking.

@filipesilva
Copy link

filipesilva commented Nov 19, 2019

Were Terser to become a bundler, it would have to replace existing bundlers on a users toolchain. This sounds like a unlikely proposition as bundlers ride on top of their ecosystem of configurations, which Terser would need to match, and exceed, in order to justify the change. Changing bundlers is a undertaking that grows more difficult as projects grow and depend on the bundler featureset.

Module-aware is a cool proposition that doesn't exist elsewhere though, except in the Google Closure Compiler. i think Rollup does cross module inline though.

I wonder if there'd be a better way for Terser to remain a module processor while having access to bundler module graph information... E.g. something that bundlers can feed to Terser.

@fabiosantoscode
Copy link
Collaborator

Regarding ES6 dynamic imports, maybe the user can tell us what they are. We can expose a module with a known package name @terser/hints which you could use as such:

import { dynamicImport } `@terser/hints`

const myAppRoutes = [
  { path: '/', module: dynamicImport('home.js') },
  { path: '/info', module: dynamicImport('info.js') },
  { path: '/login', module: dynamicImport('login.js') },
]

function getModule(currentPath) {
  for (const { path, module } of myAppRoutes) if (currentPath == path) return import(module)
  return import("404.js")
}

This hint function (which will be actually be exported from an actual NPM module) is simply an identity function. However, we can statically read these calls and do what needs to be done.

@fabiosantoscode
Copy link
Collaborator

The cool thing about the import function, is that you can import data-URIs :) we might be able to use that to do something cool.

@filipesilva
Copy link

In rollup-like output it's probably enough to just be able to follow imports and dynamic imports. I think webpack is looking at ESM output as well. So if Terser has a way to understand those, it can do all the nice things Terser does without needing to bundle anything itself.

@fabiosantoscode
Copy link
Collaborator

At the very least we can start by taking what the bundlers already know and use it? But how, though? Are you aware of a way to get module bundlers to expose an import graph kind of thing?

@fabiosantoscode
Copy link
Collaborator

In rollup-like output it's probably enough to just be able to follow imports and dynamic imports. I think webpack is looking at ESM output as well. So if Terser has a way to understand those, it can do all the nice things Terser does without needing to bundle anything itself.

It would quickly run out of RAM. It's what happens when chunks grow too large.

@kzc
Copy link
Contributor

kzc commented Nov 19, 2019

Were Terser to become a bundler, it would have to replace existing bundlers on a users toolchain. This sounds like a unlikely proposition as bundlers ride on top of their ecosystem of configurations, which Terser would need to match, and exceed, in order to justify the change. Changing bundlers is a undertaking that grows more difficult as projects grow and depend on the bundler feature set.

With that mindset no one would build any software ever. ;-)

No LLVM because there's GCC, no V8 engine because there's Spidermonkey, no Webpack because there's Browserify, no Rollup because there's Webpack, no Angular because there's JQuery, no React because there's Angular - you get the idea. There's always room for improvement and innovation. Competition is good.

Writing a vanilla ES bundler with the Terser code base is probably a few weeks effort. The first and most important step would be to import modules and rename symbols to coexist in a single namespace. It gets more challenging when multiple output formats and chunking are supported. These things could be supported incrementally over time.

I wouldn't be overly concerned about the memory footprint. The other offerings are no less memory hungry.

@fabiosantoscode
Copy link
Collaborator

I think we can achieve a low memory footprint and acceptable speed as long as we cache the parsed modules on disk. That way we can achieve less cold starts. Every time I restart my create-react-app (it has webpack underneath) it takes very long, and my computer isn't slow.

@kzc
Copy link
Contributor

kzc commented Nov 19, 2019

@fabiosantoscode Rollup has no such file caching and it has acceptable speed. In watch mode Rollup appears to retain in-memory SHAs for each input.

I think it would be better to concentrate on the main features before looking at optimizations. Just import all the code into memory initially - just get ES bundling working. Sometime after the initial implementation if you think you still need caching it can be as simple as having a per-module SHA key of the original source and a minified module payload with { module: true, compress: {}, mangle: {} } and an optional source map.

@kzc
Copy link
Contributor

kzc commented Nov 22, 2019

pre-parsing the files and storing a disk representation of the AST

Caching the AST to disk wouldn't result in any gains and would likely be slower than dealing with ES input. AST in text form can be an order of magnitude larger than minified source code. Binary AST size is only roughly on par with minified source. Nevermind that Terser/Uglify AST is not compatible with ESTree and would require translation that'd erase any gains. Also keep in mind that comments are not currently handled by ESTree. For these reasons it's better to deal with minified source - ES parse time is insignificant (~10%) compared to compress and mangle.

I think Terser should strive to move up the tools food chain, which is why I suggested going the full-fledged bundler route. As a minifier library it's relegated to the bottom rung in terms of visibility and funding. Frankly, it's not worth the effort to maintain. For the time spent you'd make more money as an UberEats driver. It's not a sustainable model.

@fabiosantoscode
Copy link
Collaborator

@kzc when I said I wanted to store the AST on disk, I meant to store a straight on-disk representation of what's in memory, Terser scoping information, a dependency graph and all, not just estree. But nevermind all that, you've showed me that this is not necessary to complete a bundler :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants