Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge chunks optimization #2090

Merged
merged 10 commits into from
Apr 16, 2018
Merged

Merge chunks optimization #2090

merged 10 commits into from
Apr 16, 2018

Conversation

guybedford
Copy link
Contributor

@guybedford guybedford commented Mar 27, 2018

This PR implements a chunk merging strategy on top of the chunk-interface-mangling branch. Fixes #2070. The optimization is off-by-default but can be enabled with:

rollup.rollup({ input: ['x'], optimizeChunks: true, chunkGroupingSize: 5000 })

What it will do is list each of the chunk execution groups, and merge together chunks that would only otherwise execute together to ensure the execution order is one that was already possible in the chunking scenario defined by the inputs.

It will merge two chunks if they can possibly execute one after the other directly according to the chunking scenario defined by the inputs, and the merged chunk will result in no more than chunkGroupingSize extra unnecessary source length for a given entry point that wouldn't use one of the chunks. This grouping size calculation is based on checking dependencies as well, with any new externals automatically opting out of the optimization.

These optimizations come into their own when dealing with large apps since the chunking algorithm can in theory produce up max(m, 2^n - 1) chunks, where m is the number of modules and n is the number of entry points. Typically the number won't be anywhere near this due to code naturally being shared well in its construction, but this optimization can apply in cases where there shared chunks become more sparse.

Missing from this PR is the implementation of a chunk.isPure method which would check if all the remaining statements in all the modules of the chunk after treeshaking are definitely not going to result in global mutations so that it is safe to have chunk code included for an entry point that that entrypoint wouldn't have needed. I think this is a prerequisite for having this optimization by default, but we can merge without it in the mean time.

In terms of the possible performance of isPure, I've commented where the usages would be at https://github.com/rollup/rollup/compare/chunk-interface-mangling...merge-chunks-optimization?expand=1#diff-7ff2e400f752aa45c0ccc662f92f201fR56. Note we only need to call this function when we know that the chunks are suitable for merging so for say 20 entry point chunks with 30 shared chunks, typically only 4 chunks would get this type of a merge call run on them, so I don't think this will become a performance constraint.

@guybedford
Copy link
Contributor Author

Note my preference with this feature would be to leave it undocumented at first, possibly even until the pure handling has been worked out. I'm very much one for undocumented flags to experiment with, that can be deprecated at any time :)

@lukastaegert
Copy link
Member

I did not check out this one yet but even the "manual chunks" feature should probably sit in the "dangerous section" of the documentation as

  1. it can totally mess up execution order (as you are probably aware), and
  2. it can thus easily pull in side-effects into otherwise side-effect-free code. I.e. if you create a manual chunk from a module that does something side-effect-full but is not imported by your entry-point and this module imports other modules that are imported by your entry-point, this will automatically pull in the side-effect (but I guess you are aware of this as well).

As for not documenting it: I would still want to mention it in a release tweet, so maybe just call the option experimentalOptimizeChunks for now?

@guybedford
Copy link
Contributor Author

Hmm, I'm not so sure manual chunks is dangerous as the definition files will be user-defined "entry points" meaning they are separable execution orderings to begin with. As far as side effects are concerned, one would typically expect manual chunks for library code, or if it is side-effect code, at least the user will be aware of what they are bringing in as well.

What makes these optimizations dangerous to me is the fact that we aren't restricted to loading in new "entry points", we are potentially merging things that sit within side-effect app code. (eg loadShoppingCart.js for a page that doesn't use it wouldn't easily be pulled into a manual chunk like it could be into this optimization). Also note manual chunks don't recursively include their dynamic imports or anything like that.

@guybedford guybedford changed the base branch from chunk-interface-mangling to master March 29, 2018 20:37
Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished my first review. This really looks like a very useful feature and I like the simplicity of having just a boolean flag + an optional numeric value.

if (
!optimized &&
!inputOptions.experimentalPreserveModules &&
inputOptions.optimizeChunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of silently ignoring optimizeChunks when modules are preserved, there should probably be an INVALID_OPTION warning or error

size = chunk.getRenderedSourceLength();
chunkSize.set(chunk, size);
return size;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This memoized way of getting the chunk size could be extracted outside optimizeChunks by wrapping it into a createGetChunkSize: () => (chunk: Chunk) => number;
or become obsolete if memoization is moved to the individual chunks.

src/Chunk.ts Outdated
* Chunk dependency output graph post-visitor
* Visitor can return "true" to indicate a propogated stop condition
*/
postVisit(visitor: (dep: Chunk | ExternalModule) => any): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition 👍
I must admit I am not really sure what the post part of the name indicates, though. Also, I would find it helpful if the name would include what is visited, e.g. visitDependencies or visitTransitiveDependencies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEachDependency might also be a nice way of naming it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the break-off condition, it is actually someDependency. Would also make some parts of the optimize chunks logic a little nicer to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"postVisit" meaning "post-order visitor" here. It is very much a visitor over a child iterator (although yes, visitors are iterators, just deep ones).

src/Chunk.ts Outdated
postVisit(visitor: (dep: Chunk | ExternalModule) => any): boolean {
// add in hashes of all dependent chunks and resolved external ids
function visitDep(dep: Chunk | ExternalModule, seen: (Chunk | ExternalModule)[]): boolean {
if (seen.indexOf(dep) !== -1) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a deep dependency tree, this will make the performance decrease quadratically with the number of dependencies. Maybe just use a Set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I just googled it: https://jsperf.com/array-indexof-vs-set-has

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were not targeting ES5 but ES6 (and thus for...of loops would not be transpiled by TypeScript), we could even do this (and get rid of the recursion):

const dependencies = new Set(this.dependencies);

// as the iterator is "live", we can add new entries while iterating
// the check if we already visited a dependency will be handled implicitly by the Set
for (const dependency of dependencies) {
  if (visitor(dependency)) return;
  if (dependency instanceof Chunk) {
    for (const subDependency of dependency.dependencies) {
      dependencies.add(subDependency);
    }
  }
}

Not sure if there is an easy way to translate this to ES5, though.

src/Chunk.ts Outdated
@@ -110,11 +110,12 @@ export default class Chunk {
// an input entry point module
entryModule: Module;
isEntryModuleFacade: boolean;
isManualChunk: boolean;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid cluttering the constructor with initialisation code, we could also start using TypeScript property initialisers, i.e. isManualChunk: boolean = false. Which is equivalent to adding the line to the constructor (at least in our setup as I tested).

This also makes it easier to see if there are uninitialised variables left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

const chunkDependencies: (Chunk | External)[] = [];
chunk.postVisit(dep => chunkDependencies.push(dep));

const countedChunks: (Chunk | External)[] = [chunk, lastChunk];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again a good candidate for a Set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the name seems a little confusing as "counting" does not seem to be what this variable is used for. But I must admit I find this whole block below confusing and I am not 100% sure what its purpose actually is. Maybe a better name could help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm being applied here is as described in #2070 (comment).

Counting referring to whether or not the given chunk has been considered as part of the size of the chunk in general. The point being if two chunks both have the same chunk dependency, we don't need to include it in the size difference between the chunks if we were to merge them as it is effectively already in the union.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically does it "count" towards the size in the size comparison between the merged and unmerged chunks.

}
// if (!chunk.isPure()) continue;

const chunkDependencies: (Chunk | External)[] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good candidate for a Set.

return size;
}

for (let i = 0; i < chunks.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables i and j are used over a span of more than 120 lines. It would really help readability if they had names that tell a little better what they stand for.

@@ -468,6 +471,9 @@ export default function rollup(
.then(chunks => {
timeEnd('BUILD', 1);

// ensure we only do one optimization pass per build
let optimized = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do no like how we need that flag but admittedly I do not have a better idea on how to get a good size estimate before we call generate for the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't like the way we handle render state in passes in general to be honest, I'm really hoping we can clean this stuff up over time, and treat render state as its own encapsulated thing.

@@ -205,6 +205,8 @@ function getInputOptions(
external: getExternal(config, command),
input: getOption('input'),
manualChunks: getOption('manualChunks'),
chunkGroupingSize: getOption('chunkGroupingSize'),
optimizeChunks: getOption('optimizeChunks'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be more transparent to add the default value of 5000 here (as a second parameter to getOption) than to add it as a default parameter somewhere hidden in the optimization logic.

@guybedford
Copy link
Contributor Author

Thanks for the detailed review here, I've finally address the feedback.

Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor change that I would like to see addressed concerning the defaults, otherwise this is good to go. You might also consider rebasing against master again.

@@ -205,6 +205,8 @@ function getInputOptions(
external: getExternal(config, command),
input: getOption('input'),
manualChunks: getOption('manualChunks'),
chunkGroupingSize: getOption('chunkGroupingSize'),
optimizeChunks: getOption('optimizeChunks', 5000),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the default for optimizeChunks should be false while the default for chunkGroupingSize should be 5000? Because right now, optimizeChunks is always on which is probably not what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also leads to unexpected warnings when preserving modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thanks.

@@ -29,6 +29,7 @@ describe('chunking form', () => {
input: [samples + '/' + dir + '/main.js'],
experimentalCodeSplitting: true,
experimentalDynamicImport: true,
optimizeChunks: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should not be necessary as optimizeChunks: false should be the default.

* chunkList allows updating references in other chunks for the merged chunk to this chunk
* A new facade will be added to chunkList if tainting exports of either as an entry point
*/
merge(chunk: Chunk, chunkList: Chunk[], options: OutputOptions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that and I am ok with that, I just don't like the fact that Chunk.ts is currently a god class with 1082 lines where it is getting harder and harder to find anything and this looks like an easy candidate for extraction.

src/Chunk.ts Outdated
@@ -826,6 +834,120 @@ export default class Chunk {
timeEnd('render modules', 3);
}

getRenderedSourceLength() {
if (this.renderedSourceLength) return this.renderedSourceLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor nit: With this implementation, empty modules will be checked each time. Maybe actually compare with undefined?

* Chunk dependency output graph post-visitor
* Visitor can return "true" to indicate a propogated stop condition
*/
postVisitChunkDependencies(visitor: (dep: Chunk | ExternalModule) => any): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

renderedHash: string;
renderedSources: MagicString[];
renderedSource: MagicStringBundle;
renderedDeclarations: { dependencies: ChunkDependencies; exports: ChunkExports };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

continue;
}

let execGroupIndex = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the improved names really help with readability.

const chunkDependencies = new Set<Chunk | External>();
chunk.postVisitChunkDependencies(dep => chunkDependencies.add(dep));

const countedChunks = new Set<Chunk | External>([chunk, lastChunk]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand these are the "chunks that count", it would be "countingChunks" 😜. No really, how about something like chunksToBeMerged as it seems this is exactly what they are? Or, if you really want to stress this is solely about size, it might be measuredChunks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't chunks to be merged, as these are chunks in the merged set that are shared dependencies and will not affect the size difference calculation or have already been counted towards the size difference calculation. Chunks not in this set then added towards the combined size check against the chunkGroupingSize limit to determine if the cost of the merge is worthwhile.

I've gone with ignoreSizeChunks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time coming up with a better name, this really helps a lot!

Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants