Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation for pure getters (discussion) #3334

Closed
fabiosantoscode opened this issue Jan 15, 2020 · 30 comments · Fixed by #5024
Closed

Annotation for pure getters (discussion) #3334

fabiosantoscode opened this issue Jan 15, 2020 · 30 comments · Fixed by #5024

Comments

@fabiosantoscode
Copy link

I propose the addition of annotations to specify which property access expressions are pure. These annotations will also be understood by Terser, but I want to discuss how they should look like, and this project's needs, instead of implementing them there and finding out they don't suit Rollup.

relevant twitter thread where this was brought up to my attention again and I reconsidered

Feature Use Case

  1. Library authors can annotate that their library's property accesses are all pure, or, at a granular level, that an individual property access is pure.
  2. Compilers like Babel may output this annotation as well, as they do with /*#__PURE__*/ right now.

Feature Proposal

The addition of two annotations:

/*#__PUREGET__*/, which can be used at the start of a "property access chain", like this:

/*#__PUREGET__*/foo.bar.baz

And a /*#__PUREGETWITHIN__*/ annotation to be added just before a function or IIFE, like this:

/*#__PUREGETWITHIN__*/
function x() {
  this.is.pure
}

/*#__PUREGETWITHIN__*/
;(function () {
  this.is.also.pure
})()

Maybe it can also be possible to annotate a whole module or whole package (by way of a package.json field) but Terser does not read package.json fields, so Rollup would have to annotate what it gives to Terser by itself.

Hope this post finds you well, and thank you for making the internet a better place :)

@Andarist
Copy link
Member

Yes, please!

I dont immediately see a value in PUREGETWITHIN though. Annotations are only important for code evaluated during initialization time so how this helps function declarations? Or is it for annotating declaration site instead of the call site, but still really affecting only the call sites? What would happen for such annotated function that is evaluated at init time but its result stay unused? We could take your second example with an IIFE as an example for what im asking.

@fabiosantoscode
Copy link
Author

The reasoning behind puregetwithin is to annotate every property access expression inside the function or iife in one go. It would be useful for a human to specify "every property access in this iife is pure", as opposed to annotating each and every one.

@Andarist
Copy link
Member

Right, I’ve guessed that this has to work smth like that - I’m wondering though how this would impact a file consisting of such an annotated IIFE which return value is not utilized, not assigned to a variable in the source code

@kzc
Copy link
Contributor

kzc commented Jan 15, 2020

Annotations in comments are inelegant and problematic because they are not first class constructs in ECMAScript. As result, some tools drop necessary parentheses which can subvert the meaning of comment annotations resulting in incorrect code.

I suggested in the original terser thread to support a new __PURE__(expression) convention - a proper call supported by all ES tooling. Such an expression could be assumed to be side-effect-free by transpilers, minifiers and bundlers and could be dropped if not used. In the absence of a minifier or bundler, this pseudo function could be polyfilled with function __PURE__(x) { return x; } or the calls could simply be converted to (expression) without the __PURE__ in the final bundle.

@fabiosantoscode
Copy link
Author

fabiosantoscode commented Jan 15, 2020

@kzc I thought of that as well, to create a module on NPM and expose compiler annotations there. That should indeed be more robust.

import { pure } from 'annotations'

pure(this.call.and.property.access.chain.is.pure())

pure((() => {
  this.iife.is.pure()
})())

If some day we ever get function decorators in ES we could do

@pure
function something() {
  // ...
}

And more stuff could be enabled by passing things to this function, like:

pure('module')

// rest of module is all pure

@kzc
Copy link
Contributor

kzc commented Jan 15, 2020

Personally, I wouldn't put this proposed __PURE__(expression) pseudo call in a module - too many moving parts with an import. Just have the tool treat it like a well known global function and it could either drop it as side-effect-free if its return value is unused:

$ cat input.js 
var a = __PURE__(G + foo());
__PURE__(bar().whatever);
baz(a);
# proposed
$ cat input.js | rollup --silent
var a = (G + foo());
baz(a);
# proposed
$ cat input.js | terser -c
var a=G+foo();baz(a);

Or just leave these calls in place and polyfill __PURE__ as needed:

# already supported
$ cat input.js | terser --toplevel -c -d __PURE__="(x=>x)"
var a=G+foo();bar().whatever,baz(a);

Or if no minifier or bundler is used the function prefix can just be elided in the source code with a dumb regex string replacement leaving just the (expression):

$ cat input.js | sed 's/__PURE__(/(/g'
var a = (G + foo());
(bar().whatever);
baz(a);

Upstream tools that are oblivious to the __PURE__(expression) convention will just leave the code as is - and unlike the comment annotation approach, its intent will not be subverted by dropped parentheses. More flexibility with this approach.

@lukastaegert
Copy link
Member

I suggested in the original terser thread to support a new PURE(expression) convention

I see the value and semantic clarity, but I do not like that it produces broken code if the polyfill is not added, and I do not think our users will like it either. A comment is less elegant and has problematic edge cases, but it does not change the control flow and can be easily ignored if it is not supported.

@Andarist
Copy link
Member

I feel the same as @lukastaegert - those proposals have value, they are just IMHO a really hard sell for people. I hear pushback about existing annotations from time to time and those proposed calls would feel to people as even more intrusive.

@fabiosantoscode
Copy link
Author

Makes sense.

@kzc
Copy link
Contributor

kzc commented Jan 16, 2020

but I do not like that it produces broken code if the polyfill is not added

It's a catch 22. Babel by default will currently drop parentheses in complex /*@__PURE__*/ annotated call scenarios leading to incorrect non-functioning code. The problem would be resolved in Babel if createParenthesizedExpressions were enabled by default and regenerator patched to accommodate it. But other ES parsers and code generators do not respect emitting the seemingly redundant parentheses required by some annotation comments. The newly proposed annotation comments would also suffer from this problem - but because they are wrapping expressions this parens issue would be much more prevalent.

A comment is less elegant and has problematic edge cases, but it does not change the control flow and can be easily ignored if it is not supported.

The vast majority of user ES code is run through a bundler or minifier anyway. The __PURE__ call can be easily handled there and if not it can be trivially elided or polyfilled. Keep in mind that half of users presently transpile from Typescript or Babel and have to deal with polyfills via configuration right now in some way or another.

@Andarist
Copy link
Member

Isnt the given example highlighting non-real use case? I really cant imagine anyone annotating part of the expression and not the other other part - if this is needed for any reason the expression can always be split into two separate statements.

@kzc
Copy link
Contributor

kzc commented Jan 16, 2020

The Babel pure annotation comment issue above was found in real life create-react-app code when trying to transpile async code to ES5. They had incorrectly chalked it up to being a minifier bug. Such bugs are very difficult to diagnose and beyond the ability of most JS programmers. With AST optimizations and passing the code through various tools it's more common than you think.

@lukastaegert
Copy link
Member

I do not really see the difference between /*#__PUREGET__*/ and /*#__PUREGETWITHIN__*/. The first version also seems to mark two property accesses.

The problem with the latter is of course if code is inlined, then the inlined code would also be marked as having pure getters.

I would appreciate such an annotation, but if there are transformations in place, a lot of funny stuff can indeed happen.

On the other hand, having a function is problematic as the code no longer runs unmodified, which is something that more and more people are doing again in development thanks to ES module support. Thus this is nothing a library user can "just use" as it makes the library no longer universally useable. Not good for Rollup users.

@kzc
Copy link
Contributor

kzc commented Jan 17, 2020

Looking at the upstream issue that spawned this discussion, it appears to have been easily resolved with the existing pure call comment annotation. I don't see a need for any new pure annotation functionality - if anything, just a little extra documentation with examples of its use in various scenarios could be helpful. This documentation should also mention enabling the pure_getters minify option or disabling the Rollup treeshake.propertyReadSideEffects option as alternatives. Moving property reads into regular functions is another simple way of improving tree shaking of modules.

@Andarist
Copy link
Member

The getter issue can only be solved currently by:

-const foo = some.getter
+const foo = /*#__PURE__*/(() => some.getter)()

which is doable, but it's a super hard sell. Only most hardcore people would do this. Others will just tell people wanting to improve libs tree-shakeability (like me) that it's not worth the gains.

This documentation should also mention enabling the pure_getters minify option or disabling the Rollup treeshake.propertyReadSideEffects option as alternatives.

This kinda works only for final consumers (mostly app-level), whereas I would really love to figure out some solution for library code providers. Ideally, consumers shouldn't have to know about an extra flag to enable somewhere so they can benefit. A global flag (on an app-level) is also a huge risk, comparing to local annotations which gives more fine-grained control.

Q: isn't it that for example Babel could just introduce additional heuristics or output the code slightly differently to make those edge cases safer? If transpiled code is the main source of edge cases then I think it could only be a matter of tweaking its output to increase safety.

@kzc
Copy link
Contributor

kzc commented Jan 17, 2020

A global flag (on an app-level) is also a huge risk

That's a decision that can be made on an app by app basis by developers after they've tested with it. Advanced users ought to familiarize themselves with minifier and rollup options if they really want to achieve the smallest possible output size.

Q: isn't it that for example Babel could just introduce additional heuristics or output the code slightly differently to make those edge cases safer?

Babel and any other tool that generates ES from AST - plugins, loaders, the typescript AST API, https://github.com/Rich-Harris/code-red, or what have you.

The getter issue can only be solved currently by /*#__PURE__*/(() => some.getter)()
which is doable, but it's a super hard sell. Only most hardcore people would do this

Users were simply not aware of the pattern. This can be easily resolved with additional documentation.

@Andarist
Copy link
Member

Babel and any other tool that generates ES from AST - plugins, loaders, the typescript AST API, https://github.com/Rich-Harris/code-red, or what have you.

Well, just tools that opt into outputting pure annotations, which should limit the scope. If they want to emit those, they should do it in a safe manner - this seems reasonable to me. I understand that the AST might still be manipulated later by a tool not aware of those annotations, but transformations take on this stage should really rarely change the structure of the code dramatically. This is really hard to quantify though, so it probably would be better if we could gather some real-life examples and discuss this based on them.

Users were simply not aware of the pattern. This can be easily resolved with additional documentation.

I don't believe this is a whole story. I found people really hesitant to adjust code just to aid consuming tools, as it feels intrusive to them, whereas it's much easier to accept some extra comments. I thought about writing a babel transform to wrap getters in such an annotated IIFE, but I also lately was thinking that for this kind of thing ESLint plugin is better than a Babel plugin (less magic). And even then - would Terser be able to inline such undropped IIFEs?

@kzc
Copy link
Contributor

kzc commented Jan 18, 2020

Well, just tools that opt into outputting pure annotations, which should limit the scope

Rollup aside, no AST-based ES generation tool opts into outputting pure annotations - they just emit comments. Comments are just considered to be whitespace - not first class ES constructs. So unless these tools support retaining redundant parentheses with respect to comments by default, they are all subject to this problem.

$ cat generate.js 
var code_red = require('code-red');
var { x, print } = code_red;
var sources = x`
    /*COMMENT1*/ ( /*@__PURE__*/ a.b() ).c();
`;
console.log(print(sources).code);
$ node-v12.13.0 generate.js 
/*COMMENT1*/ /*@__PURE__*/ a.b().c()

This is really hard to quantify though, so it probably would be better if we could gather some real-life examples and discuss this based on them

Please take the time to review the real life bug report above and the babel issue that explains the problem in detail.

To be perfectly honest in all the years I've been responding to issues related to pure annotations everyone has found the existing functionality to be adequate once they know how it works. That combined with the optional use of pure_getters, treeshake.propertyReadSideEffects and code restructuring to move all such operations with harmless side-effects from module scope to be within regular functions, it's really a non-issue that is already addressed by rollup and uglify derived tools. There is no silver bullet for dead code elimination - it's better to educate the developer to be aware of what code patterns unnecessarily retain code and how to avoid them.

@fabiosantoscode
Copy link
Author

Because the annotation I'm talking about marks an entire property access chain it's immune to that specific bug. I don't think we can force people to use a library of sorts to mark their code.

@kzc
Copy link
Contributor

kzc commented Jan 20, 2020

Even without this proposed functionality, Terser is also subject to these sorts of pure annotation AST code generation bugs since it recently added support for preserve_annotations.

Given the input:

$ cat abug.js
/*@__PURE__*/ console.log("FAIL1 - should be dropped");
console.log("PASS1");
(function() {
    function foo() {
        return () => {
            x = "PASS2";
        };
    }
    var x = "FAIL2";
    var bar = /*@__PURE__*/ foo();
    bar();
    console.log(x);
})();
var logger = () => console.log.bind(console);
( /*@__PURE__*/ logger() )("PASS3");
$ terser -V
terser 4.6.3

Expected output when run through Terser twice and executed with Node:

$ cat abug.js | terser -c | terser -c | node
PASS1
PASS2
PASS3

Actual output when run through Terser with preserve_annotations, then through Terser again and then executed with Node:

$ cat abug.js | terser -c -b preserve_annotations,comments | terser -c | node
FAIL2

Note the incorrectly emitted annotations below. The terser project should consider reverting preserve_annotations until these issues are sorted out. Don't consider the test case to be exhaustive, there could be other bugs, it's just what I came up with in a few minutes.

$ cat abug.js | terser -c -b preserve_annotations,comments
/*@__PURE__*/ console.log("PASS1"), function() {
    function foo() {
        return () => {
            x = "PASS2";
        };
    }
    var x = "FAIL2";
    /*@__PURE__*/ foo()(), console.log(x);
}();

var logger = () => console.log.bind(console);

/*@__PURE__*/ logger()("PASS3");

fwiw, Rollup appears to handle these scenarios correctly:

$ dist/bin/rollup -v
rollup v1.29.0
$ cat abug.js | dist/bin/rollup --silent | terser -c | terser -c | node
PASS1
PASS2
PASS3
$ cat abug.js | dist/bin/rollup --silent
console.log("PASS1");
(function() {
    function foo() {
        return () => {
            x = "PASS2";
        };
    }
    var x = "FAIL2";
    var bar = /*@__PURE__*/ foo();
    bar();
    console.log(x);
})();
var logger = () => console.log.bind(console);
( /*@__PURE__*/ logger() )("PASS3");

This proposal would create additional edge cases in Terser when the code is optimized. Because the pure annotated IIFE pattern already handles the scenario that initiated this discussion, I don't see a compelling reason to extend the existing pure annotation functionality.

@sokra
Copy link

sokra commented Jan 27, 2020

("__PURE__",someCall())

This would be part of the AST, is usually not removed by tools other than minimizers, is ignored when executed, not badly rewritten during formating, has next to no runtime overhead when ignored and no runtime overhead when processed by not supporting minimizer (as it would be removed).

It also allows to reuse __PURE__ for getters (and future ideas) as it's clearly assigned to an expression: ("__PURE__", some().getter).

In contrast to /*#__PURE__*/ it could just make any expression pure (not only the call without the arguments)

If needed multiple annotations could be separated by ,: ("__PURE__", "__LAZY__", expression).

@kzc
Copy link
Contributor

kzc commented Jan 27, 2020

A sequence expression technique was mentioned in babel/babel#10306 (comment) to preserve pure annotations for calls in a parentheses hostile tool chain.

("__PURE__", some().getter)
it could just make any expression pure (not only the call without the arguments)

This proposed pure syntax would work fine for calls and most expression types, but be aware of this scenario:

$ cat ex1.js 

        function some() {
            return {
                prop: "PASS",
                getter() {
                    return this.prop;
                }
            }
        };
        var result = (some().getter)();
        console.log(result);
$ cat ex1.js | node
PASS

Replacing some().getter with ("__PURE__", some().getter) yields a different result:

$ cat ex2.js 

        function some() {
            return {
                prop: "PASS",
                getter() {
                    return this.prop;
                }
            }
        };
        var result = ("__PURE__", some().getter)();
        console.log(result);
$ cat ex2.js | node
undefined

That corner case aside which optimizing compilers and minifiers would have to avoid, the proposed pure sequence syntax has some appealing characteristics.

@kzc
Copy link
Contributor

kzc commented Jan 27, 2020

@alexlamsl Given that uglify-js is still very actively maintained, do you have any opinions on this matter?

@sokra
Copy link

sokra commented Jan 27, 2020

var result = ("__PURE__", some().getter)();

That's true. I think we could keep this as limitation. Here var result = ("__PURE__", some().getter()); would be possible. I think even if var result = ("__PURE__", some().getter)(); would work correctly, a minimizer could not take benefit from it as because of the final call in the expression the getter can't be dropped anyway, even if result is not used. But it would stay a limitation for future annotations.

Possible workaround would be an annotation __PURE_ACCESSORS__ (or __PURE_OBJECT__):
var result = ("__PURE_ACCESSORS__", some()).getter();
but I don't think this is needed.


Unrelated from this I think a __PURE_FUNC__ annotation for functions make sense to declare a function as pure:

const fn = ("__PURE_FUNC__", arg => {
  return magic(arg) + 12;
});

const x = fn(magic(42)); // x unused

// minimized to:
magic(42);
// function and call can be dropped
// bug arguments are not considered as pure

I think that's easier to reason about compared to annotations at the callsite.

Sadly it's not possible to put this annotation onto function declarations with the sequence-based approach. Another limitation.

It might be possible to do it this way for statement annotations:

"__PURE_FUNC__";
function fn(arg) {
  return magic(arg) + 12;
}

Maybe also a __PURE_CALL__ annotation to cover the existing /*#__PURE__*/ semantic:

const x = ("__PURE_CALL__", some().magic().fn(magic(42)));

// minimized to:
magic(42);

@sokra
Copy link

sokra commented Jan 27, 2020

I figured that TypeScript dislikes these construct and produces:

error TS2695: Left side of comma operator is unused and has no side effects.

But I think we can make them support these constructs should we decide on this syntax.


Prettier rewrites

"__PURE_FUNC__";
function f() {}

to

("__PURE_FUNC__");
function f() {}

I think this actually looks better. Putting it in brackets also prevents it to be a directive. So I would propose to make this the actual syntax for statement-level annotations.


Sorry if I push this too hard. A general purpose annotation system would be nice for more tools (e. g. webpack). If this finds consensus, I would volunteer to write up a little spec for that.

@kzc
Copy link
Contributor

kzc commented Jan 27, 2020

("__PURE_CALL__", call_or_new_expression) is not needed as it is covered by ("__PURE__", expression) and has the same functionality. Likewise, __PURE_ACCESSORS__ or __PURE_OBJECT__ is not needed if you're mindful of the caveat that I previously outlined.

I think ("__PURE_FUNC__", function_or_arrow_expression) would be reasonable.

I'm not a big fan of the pseudo-directive ("__PURE_FUNC__") syntax to annotate function declarations though. This would be a case where a comment annotation would be better. A comment annotation could also annotate class methods and object shorthand methods.

I think expression site annotation is generally more useful than function site annotation because the latter is not always deterministic in determining call sites, but more of a best-effort if functions are passed through variables.

@alexlamsl
Copy link

@kzc I would have to see specific use cases in order to form an opinion (or suggest an existing workaround).

It is a fascinating read on all those corner cases though 👻

As for the latest proposal, if implemented I can certainly see myself losing hair staring at some a, b, "__PURE_FUNC__", c that just won't go away.

@kzc
Copy link
Contributor

kzc commented Jan 28, 2020

As for the latest proposal, if implemented I can certainly see myself losing hair staring at some a, b, "PURE_FUNC", c that just won't go away.

Good point. Uglify and its forks heavily rely on the sequence construct for optimization. To support such a an annotation proposal, a fair bit of selective deoptimization would have to be introduced into the code base.

@developit
Copy link
Contributor

Something I wanted to point out that may or may not be useful here: Webpack's codegen for modules currently constructs something very similar to what @kzc is proposing, and I think that precedent makes his suggestion all the more interesting.

// current webpack output:
var _React = require("react");
Object(x.default.createElement)("span", null, "hello world");

It's relatively trivial to convert this to a hoisted "helper" that is really just assigned to Object. This would obviously change the context when applied to a callee, but I'd be curious to know how much the two needs intersect.

// proposed
var __PURE_GET__ = Object;
var _React = require("react");
__PURE_GET__(x.default.createElement)("span", null, "hello world");

FWIW, few/none of these seem to address destructured assignment, which seems worth fixing:

// how can this be marked as pure?
const { a: { b: { c } } } = obj;

// ...or this
let c;
({ a: { b: { c } } } = obj);

@sokra
Copy link

sokra commented Feb 17, 2020

// how can this be marked as pure?
const { a: { b: { c } } } = obj;

// ...or this
let c;
({ a: { b: { c } } } = obj);
// how can this be marked as pure?
("__PURE__")
const { a: { b: { c } } } = obj;

// ...or this
let c;
("__PURE__", { a: { b: { c } } } = obj);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants