Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve circular dependency execution order #3840

Merged
merged 16 commits into from Jan 19, 2021

Conversation

lukastaegert
Copy link
Member

@lukastaegert lukastaegert commented Oct 27, 2020

This PR contains:

  • bugfix
  • feature
  • refactor
  • documentation
  • other

Are tests included?

  • yes (bugfixes and features will not be merged without tests)
  • no

Breaking Changes?

  • yes (breaking changes will not be merged unless absolutely necessary)
  • no

List any relevant issue numbers:
#3834

Description

As #3834 showed, our current way of sorting dependencies in chunks completely messes up execution order if there are circular dependencies between chunks, which can happen when preserving modules.

Before, we just sorted dependencies by their execution order. This PR instead changes the logic so that the order of dependencies is preserved from the modules. If we skip dependencies because they have no side-effects, the algorithm crawls transitive dependencies in the same order as regular dependency execution would happen to look for transitive side-effect dependencies instead.

When merging module dependencies into chunks, the process now has two parts

  • first we start with the last module in the chunk, get the dependencies of that module. If a dependency is part of the chunk, we replace it with the dependencies of THAT module recursively. This process creates a dependency block
  • then we move on to the second to last module in the chunk, iterating backwards over the modules. If a module has already been handled as it was a dependency of a previous module it is skipped, otherwise a new dependency block is created
  • last we add all dependency blocks as dependencies to the chunk, this time in regular order, though

This new algorithm will produce better execution order when

  • preserving modules
  • only chunks are created where all modules in the chunk would be executed top to bottom before bundling

The second is not enforced yet, though. This will be a larger feature to add sometime in the future.

Unfortunately, the execution order is not perfect when preserving modules as import shortening will still introduce new imports into modules/chunks that can trigger a cycle to be executed in a non-natural order. For this case, there has been a warning added. The infrastructure introduced to generate this warning can serve as a basis to fix this issue at least for ESM, CommonJS and possibly SystemJS in the future. AMD output cannot be made to support this easily as it does not have a dedicated concept of reexports. For CommonJS, the solution must be to hoist reexports from a module directly below the corresponding require statement in the reexporting module, i.e.

export { foo } from 'bar';
import 'baz';

should become

const bar = require('bar');
exports.foo = bar.foo;
require('baz');

However, special care needs to be taken to properly support namespace reexports mixed with regular (re)exports as those will always override namespace reexports. Also, the logic should look different for reassigned exports.

Another point that was added in this PR that is not directly related to circular reexports is which reexporting modules are scanned for side-effects when moduleSideEffects are turned off. Basically the logic now goes like this:

// main.js
import { foo } from './first';
console.log(foo);

// first.js
// This module will not be considered to add a side-effect as it does not touch the reexported binding
export { foo } from './second';

// second.js
// This module WILL be scanned for side-effects as it reexports its own import.
import { foo } from './third';
export { foo };

// third.js
// This module will NOT be scanned for side-effects as the reexport is separate from the import, even though it is the same binding
export { foo } from './fourth';
import { foo } from './fourth';

So in short, a direct reexport (export ... from) is ignored when looking for side effects while a reexport using separate import/export statements is considered. The reasoning here is that when I import a binding and the export it again, I may be my intention to mutate this binding in between, e.g. add properties to an object. The commonjs plugin will rely on this feature.

@rollup-bot
Copy link
Collaborator

rollup-bot commented Oct 27, 2020

Thank you for your contribution! ❤️

You can try out this pull request locally by installing Rollup via

npm install rollup/rollup#improve-execution-order

or load it into the REPL:
https://rollupjs.org/repl/?circleci=13986

@codecov
Copy link

codecov bot commented Oct 27, 2020

Codecov Report

Merging #3840 (b652137) into master (0ed8722) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3840      +/-   ##
==========================================
+ Coverage   97.08%   97.13%   +0.04%     
==========================================
  Files         187      188       +1     
  Lines        6563     6669     +106     
  Branches     1913     1940      +27     
==========================================
+ Hits         6372     6478     +106     
  Misses        101      101              
  Partials       90       90              
Impacted Files Coverage Δ
src/ast/variables/LocalVariable.ts 90.10% <ø> (-0.22%) ⬇️
src/utils/options/normalizeInputOptions.ts 100.00% <ø> (ø)
src/Chunk.ts 100.00% <100.00%> (ø)
src/Module.ts 100.00% <100.00%> (ø)
src/ast/nodes/ExportDefaultDeclaration.ts 100.00% <100.00%> (ø)
src/ast/nodes/Identifier.ts 100.00% <100.00%> (ø)
src/ast/nodes/MemberExpression.ts 98.31% <100.00%> (ø)
src/ast/variables/ExportDefaultVariable.ts 100.00% <100.00%> (ø)
src/ast/variables/SyntheticNamedExportVariable.ts 96.66% <100.00%> (+1.66%) ⬆️
src/utils/error.ts 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ed8722...b652137. Read the comment docs.

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

I'll try this branch for our project with really messed circulars 👍

@lukastaegert
Copy link
Member Author

Also have a look if you notice any serious performance deterioration.

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

Run on project with 2412 total resulting chunks. js & js_commonjs-proxy only calculated, plus assets. There is postcss, babel, commonjs, node-resolve plugins and more. Treeshaking is disabled, but it not changes time significantly.

rollup@2.32.1
Bundled: about 56 seconds - working
Preserve modules: about 63 seconds - not working

rollup@rollup/rollup#improve-execution-order
Bundled: about 56 seconds - working
Preserve modules: about 61 seconds - not working

So, performance is not changed, but imports is not working too :) There is still circular import with TDZ error, but now it's not very easy to reason about... I trying to extract it into example.

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

Ok, first of all, with disabled and enabled tree shaking, error thrown in different files. I think it should be not so! @lukastaegert have you a thoughts why so is happens?

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

Ok, there is repro: https://github.com/Amareis/rollup-bug-repro
I used our actual store structure with minimal simplification, and errors is actually thrown in exact files as in production build.

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

I see that current implementation isn't preserving exact modules exports, even with tree shaking disabled: in repro app is reexporting everything from models/index, which reexporting all from models/internal, but in resulting app.js there is only reexports from models/data...

@lukastaegert
Copy link
Member Author

re-exports are shortened if possible. But this should not affect execution order. Or to be more precise, side-effect order. So the current implementation tries to maintain execution order between all modules that have side-effects. So e.g. a module that just contains re-exports but no side-effects will vanish, but it could happen that additional side-effect-only imports are generated to keep the side-effect order.

It should also maintain execution order between modules that declare a variable and modules that consume a variable, but maybe the implementation still has issues there.

@lukastaegert
Copy link
Member Author

In any case, if you can whip something up that runs in Node but produces TDZ errors after bundling, that would be great.

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

Actually, even example from #3834 still broken :)
Please, look at https://github.com/Amareis/rollup-bug-repro
Just clone it, make npm i and then npm run test:simple with node 14. It still will be broken even with this branch. Additionally there is test:complex for much more, uhm, complex case.

@guybedford
Copy link
Contributor

Yes, chunk execution ordering has always implicitly been based on the assumption that circular references are contained in the same chunk. This seemed a safe assumption because there is no network loading scenario where you wouldn't want this.

With manual chunks and preserve modules, it is possible to break this invariant. Ideally we could catch that case and patch it up somehow, but if circular references between chunks is desired we can certainly make this work though of course.

The important thing to note firstly though is that execution order is entirely dictated by the entry points. A cycle will execute in different orders depending on which entry module is imported first (and therefore which module in the cycle is imported first).

Thus you cannot determine execution order if you do not take into account entry points.

My concern is that this PR may be missing that detail.

I'd be very careful changing this algorithm though and would need some time to more carefully verify the post-order execution order invariants from the spec are maintained under this new algorithm to finish review.

@lukastaegert
Copy link
Member Author

Thus you cannot determine execution order if you do not take into account entry points. My concern is that this PR may be missing that detail.

I have been very aware of this problem for a long time and assure you this PR does not miss this fact, but it also does not fix it.

While the original implementation completely missed that fact as everything was based on the execution order of the first entry point, this PR actually improves this situation rather than making it worse because instead of relying on this execution order, it treats every single file as an entry point and does a kind of topological sorting of external dependencies separately for each single file.

But this aside, I finally understand the original issue which was indeed the re-export shortening logic. In short, we just cannot shorten imports in a preserveModules build if this build contains cycles. I am currently not sure how true is this in general or if it is only a problem if we import from a file that is kindof involved in the cycle.

For regular chunking builds, we do not have this problem yet but instead the other problem that you mention. But this is not the issue here, and the goal is not to solve this problem with this PR. This will be solved at another time.

Going back to the issue at hand, I think my current best solution would be

  • Try to find a logic that is applied when preserving modules that reintroduces re-exports to achieve perfect execution order (because I am rather sure now that a general solution is not easily possible without re-exports)
  • Maybe we can switch to the old logic with import-shortening if no cycles are present, but this needs investigation
  • Check if this PR is worth keeping independent of that

@guybedford
Copy link
Contributor

Right, ideally the execution order would be computed for each entry point, and then it would be ensured that the order invariants of that multi execution profile would be maintained under any transformation.

Right now, we do overclassify from the first entry point causing stricter ordering invariants than might be optimal. Although I suspect this approach will be causing looser invariants against the invariants I describe above.

Is your suggestion to disable reexport shortening for all builds? Or only preserve modules?

Perhaps preserve modules should get its own output algorithm entirely and not use the Chunk graphing system.

@lukastaegert
Copy link
Member Author

I do not think we overclassify the order as we already taken side-effects into account, optimizing and removing dependencies where possible. The goal here really is not module order but side-effect order.

But yes, my suggestion is for now to use a different algorithm when preserving modules, and think about a new algorithm for the chunking only when we fix the other execution order problem there. Because when chunking, the solution would be to create more chunks, and then it is no longer guaranteed that all cycles are within a single chunk.

@guybedford
Copy link
Contributor

guybedford commented Oct 27, 2020

The goal here really is not module order but side-effect order.

This is a great point and leads to a nicer generalization yes. See also how esbuild constructs its chunks if you haven't read it already (https://github.com/evanw/esbuild/blob/master/docs/architecture.md#code-splitting). Although I haven't verified that approach either TBH.

Because of TDZ circular references can effectively be considered to require side effect ordering.

With regards to the TDZ discussion I do again feel that assuming TDZ isn't hit is a worthy assumption because there is no valid program that wouldn't require that assumption. Like all cycles being contained in chunks, these are assumptions where there is no valid use case for the alternative, so engineering effort to support something that doesn't have a use case, when it could just be worked around doesn't sound like the best use of our scarcest resource to me!

Edit: I mean TDZ for the top-level execution specifically.

@Amareis
Copy link
Contributor

Amareis commented Oct 27, 2020

But this aside, I finally understand the original issue which was indeed the re-export shortening logic. In short, we just cannot shorten imports in a preserveModules build if this build contains cycles. I am currently not sure how true is this in general or if it is only a problem if we import from a file that is kindof involved in the cycle.

(Re)export signatures should be preserved if treeshaking is explicitly disabled in conjuction with preserveModules. I really need this for incremental builds ^_^

@lukastaegert
Copy link
Member Author

Well, the (re-)export signature is preserved, it is just not preserving where the reexports are coming from... 🤪

@Amareis
Copy link
Contributor

Amareis commented Oct 30, 2020

So, I managed how to preserve all exports - just emit additional entry chunk for every module, which has only reexports from module. It's still needed for synthetic name exports to wokring, so it's ok. But this issue is still show-stopper :) Can we make some wokaround? For example just don't shortening urls from circular chains, it seems to possible even in "old" implementation.

@lukastaegert
Copy link
Member Author

I will try to find some time to invest in this soon, but other things came up.

@lukastaegert
Copy link
Member Author

Problem is, this will not be a simple fix as importing a variable from a file where it was not declared is not easily representable with the current data model where each variable has a single module assigned. So what we need is to represent that module A imports a variable foo from module X while module B imports the same variable foo from file Y. This will require some thought. Once this is solved, though, I think I know a solution that might work well independent of preserveModules and should even retain import shortening in situations where it does not matter for execution order.

@Amareis
Copy link
Contributor

Amareis commented Nov 3, 2020

What about new option, preserveModules: 'strict'? Instead of chunks generation, it just replace all import/export modules ids to relevant chunk ids and then run finalizers. It's also requires disabled tree-shaking, of course.
For PoC it can be made purely in user-land if expose resolvedIds to ModuleInfo (or even without it, but it will re-resolve all ids again).

Then, this mode can be used for circular dependencies in simple mode.

@lukastaegert
Copy link
Member Author

Not sure what the advantage of this tool over Babel would be, then. No, I am quite sure this can be solved in a better way, but for now I would like to finish work on rollup/plugins#537 first as this has been open for months now.

@lukastaegert
Copy link
Member Author

Just to outline my thoughts so far:

  • I am quite convinced that the current logic will work flawlessly for preserveModules unless a reexport is going through a module that is part of a circular dependency cycle.
  • We already know those cycles, but we could also put markers on modules for each module that is part of a cycle.
  • Then when tracing an export from a module and we are tracing a reexport, we note the first module we hit that is part of a cycle. If that is the case, we make a note to import that variable from the cyclic module instead of the declaration module.
  • If the same variable is imported through different paths and one of them is non-cyclic, we use that import as an optimization.
  • If the same variable is imported through different paths but all of them hit cyclic modules, but it is a different module for some imports, then it becomes complicated. We'll need to see how to produce this case and if there is a good way to handle it. Or we just use the first module as a heuristic, or something based on the actual execution order.

@Amareis
Copy link
Contributor

Amareis commented Nov 3, 2020

Not sure what the advantage of this tool over Babel would be, then.

All of rollup plugins, of course! Plus, seamless switch between development and production builds, since resolve logic stay the same and all of emitted chunks/assets too. In this mode rollup becames great development tool, and it's even not-so-hard to maintain, since basically it's just full deoptimisation, just like debug build.

@Amareis
Copy link
Contributor

Amareis commented Dec 24, 2020

Nope, there is still same error and imports in resulting files is still shortened. Maybe it worth to add original example as a test?

@lukastaegert
Copy link
Member Author

Ah sorry, yes, wrong PR. This problem is still not fixed, will be soon.

@lukastaegert
Copy link
Member Author

After spending basically weeks of my limited time on trying to improve the situation here, I decided to not solve the remaining issue in this PR. The answer is:

  • It is impossible to fix the issue for AMD or SystemJS because those formats do not support reexports in a way that would work with circular dependencies.
  • It would likely be possible for CommonJS but it would be a major rewrite of the finalizer logic that is just too much for me right now to warrant blocking progress in Rollup on all other fronts.

Note that the issue is solely an issue for preserveModules at this point. I will try to add a warning to Rollup when such an issue is likely to result in broken imports or execution order for now, and maybe I tackle this at a later point again.

To illustrate, take this example:

// main.js
export { getSecond } from './first.js';
export { second } from './second.js';
console.log('main');

// first.js
import { second } from './main.js';
export const getSecond = () => second;
console.log('first');

// second.js
import { getSecond } from './main.js';
export const second = 'second';
console.log('second:', getSecond());

The correct output would be

first
second: second
main

Now if we were to naively transpile this to AMD file-by-file, we would get

// main.js
define(['exports', './first', './second.js'], function (exports, first, second) { 
  console.log('main');
  exports.getSecond = first.getSecond;
  exports.second = second.second;
});

// first.js
define(['exports', './main'], function (exports, main) { 
  console.log('first');
  exports.getSecond = () => main.second;
});

// second.js
define(['exports', './main'], function (exports, main) { 
  console.log('second:', main.getSecond());
  exports.second = 'second';
});

This could will just crash with "undefined is not a function" when second.js tries to call main.getSecond(). Why? Because the code of "main.js" has not run yet, so the reexport has not been established. And SystemJS suffers a similar problem. So to my current knowledge, the problem is just not fixable for those formats if you want to preserve modules.

For CommonJS, the problem is theoretically fixable by making sure the reexports are not assigned at the bottom of the file but directly after the corresponding imports, i.e.

// main.js
var first = require('./first');
exports.getSecond = first.getSecond;
var second = require('./second');
exports.second = second.second;

However to make this logic work in all instances, we would need a major rewrite for how star reexports are handled to ensure proper precedence of exports and duplicate a lot of logic as AMD and CJS would now diverge and could no longer share the export logic.

All in all, it is a major investment that I currently do not see as returning the same benefits as if I were to work on other things as it may likely block all other Rollup progress for another month (especially with a lot of other personal issues I have to deal with mainly concerning child-care in a pandemic).

So I will see if I can add a warning but otherwise release this PR to unblock progress on other things.

@Amareis
Copy link
Contributor

Amareis commented Dec 31, 2020

I think systemjs should works in such cases? It was developed to reflect esm semantics.

@lukastaegert
Copy link
Member Author

Not circular dependencies, at least not this aspect. This only works for ESM because all modules are parsed and analysed in advance, resolving all imports and exports first, so that an export foo from module a that reexports foo from module b can be resolved with the binding from module b without directly importing b. But @guybedford could tell you more about this.

@Amareis
Copy link
Contributor

Amareis commented Jan 3, 2021

For systemjs there is solution which is one-to-one to provided commonjs solution. At least my project is working good when I transpiled all my modules to systemjs directly, without any imports shortening. Whole point of systemjs is to fully reflect esm semantics, isn't it?

@lukastaegert
Copy link
Member Author

I extended the plugin to add a proper warning for now. It should be possible to add proper logic in the future for reexports based on this, but it will still be a lot of work to do properly, especially with regard to how external namespace reexports should play into this. One approach could also be to extend this gradually format by format, starting with ESM and showing the warning only for other formats, and then possibly adding SystemJS if it really works as intended.

My intention is to release this soon as the CommonJS plugin will rely on some part of this. Also have a look at the updated description.

@Amareis
Copy link
Contributor

Amareis commented Jan 4, 2021

Ok, so currently it still not working for my project with preserveModules, but its warns a ton. And production build (without preserveModules) working, but takes a much longer, 3 minutes vs 2 minutes before it.

@lukastaegert
Copy link
Member Author

I wonder where the slowness comes from. Is it the same project you tested here: #3840 (comment) ?

If not, how does performance change for that project?
Can you run rollup -c --perf with this branch vs. the published version to show in which step the time is lost?

@lukastaegert
Copy link
Member Author

If the slowness did not appear with the version you tested in October (which you can still try via npm i rollup/rollup#4d7651be using the commit hash), then it would be helpful to pinpoint the commit that broke performance, those four could be likely candidates:

rollup/rollup#9da52e75
rollup/rollup#eaf7fce9
rollup/rollup#fe49920a
rollup/rollup#bdc3a6b9

@Amareis
Copy link
Contributor

Amareis commented Jan 5, 2021

Oh, my bad. I checked this branch, rollup 2.35.1 and rollup 2.33.1, there is always about 4 minutes for production build (with terser and sourcemaps). I think in october I ran builds without minification and sourcemaps, that's why additional minutes added!

@lukastaegert
Copy link
Member Author

Ah, thanks for checking. I was really worried about a bad performance debugging session.

@LarsDenBakker
Copy link
Contributor

@lukastaegert see #3801 (comment) regarding the comment about support reexports in systemjs. It's not possible yet with rollup, but can be made to support it.

@lukastaegert
Copy link
Member Author

Sounds good. Hope to find some time soon to pick this up again so that we can have this at least for ESM and SystemJS, and possibly CommonJS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants