Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor chunking algorithm #2575

Merged
merged 23 commits into from Dec 11, 2018
Merged

Refactor chunking algorithm #2575

merged 23 commits into from Dec 11, 2018

Conversation

lukastaegert
Copy link
Member

@lukastaegert lukastaegert commented Dec 1, 2018

This PR contains:

  • bugfix
  • feature
  • refactor
  • documentation
  • other

Are tests included?

  • yes (bugfixes and features will not be merged without tests)
  • no

Breaking Changes?

  • yes (breaking changes will not be merged unless absolutely necessary)
  • no

List any relevant issue numbers:
Resolves #2444,
resolves #2284,
resolves #2549

Description

Breaking change:
The optimizeChunks option has been renamed to experimentalOptimizeChunks for now as I do not feel confident presenting this as production ready. My hope is to use the changes in this PR at some point as a base to refactor optimizeChunks to a much simpler and more stable algorithm.

This supersedes and closes #2553. This PR refines the extended bundle API defined in #2553 by changing the entryModuleId property to a facadeModuleId to stress the fact that a chunk mimics the interface of the given module. Additionally, internal dynamic imports are added to the chunk information as well.

Beyond that, lots of things are changed under the hood:

  • The chunk colouring logic has been separated from the graph analysis so that it can be run after the tree-shaking. That way, unused dynamic imports will not only not generate chunks but also not fragment existing chunks further.
  • If a dynamically imported module ends up in the same chunk as other modules which are imported from other chunks, not only is a facade chunk created to ensure the chunk has the proper interface but this facade is actually what is imported by the dynamic import (fixes a bug)
  • Manual chunks have been massively improved and fixed. As especially with manual chunks, a single chunk can contain an arbitrary amount of static and dynamic entry points, the logic now ensures that proper facades are created for all of them. In case the exports of one of them (but at most one to keep the logic simple) matches the exports of the chunk, the chunk acts as a facade for this module.

Including the changes from #2553, this is what the bundle information looks like after this PR:

interface OutputChunk {
	code: string;

	// *new* The ids of all dynamically imported chunks or external modules
	// (i.e. those that are included and could be resolved)
	dynamicImports: string[];

	// The variable names exported by this chunk
	exports: string[];

	// *new* If this is a static or dynamic entry point, then this is the module this chunk matches
	facadeModuleId: string | null;
	fileName: string;

	// The ids of all statically imported chunks or external modules
	imports: string[];

	// *new* Is this chunk target of a dynamic import? (only together with facadeModuleId)
	isDynamicEntry: boolean;

	//  Is this chunk a static entry point? (only together with facadeModuleId)
	isEntry: boolean;
	map?: SourceMap;
	modules: {
		[id: string]: RenderedModule;
	};

	// *new* The name of this chunk if defined in the input object
	name: string;
}

// No changes here
export interface RenderedModule {
	renderedExports: string[];
	removedExports: string[];
	renderedLength: number;
	originalLength: number;
}

@lukastaegert
Copy link
Member Author

@tivac: Following #2553 (comment), I added a new dynamicImports property to chunks that lists all (resolved and included) dynamic imports of a chunk (see updated description). Please check if this works for you :)

@manucorporat
Copy link
Contributor

if i understood correctly, will this PR help to solve this issue? #2444

@lukastaegert
Copy link
Member Author

Yes, that should work perfectly now, just added a link to the description :) I did not go through the existing issues yet, I hope there are more that are fixed by this.

@tivac
Copy link
Contributor

tivac commented Dec 3, 2018

@lukastaegert Looks like that did the trick! I'm now seeing dynamicImports inside each chunk and it all looks accurate with my admittedly basic dynamic import tests. Thanks so much for following up on that 🎉

https://github.com/tivac/modular-css/blob/fc97c8671b021f19516bf0fdc43daf8fe95b0765/packages/rollup/rollup.js#L153-L166

@manucorporat
Copy link
Contributor

@lukastaegert can't wait for this to be merged!!
I guess it also will help with this one too: #2407

@lukastaegert
Copy link
Member Author

Well, it is doing part of #2407. What is missing is handling the namespaces internally as single variables and getting rid of the facades for dynamic imports. Not sure I will add this soon, at the moment my focus is on refactoring code and fixing issues for a 1.0 release.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally managed to get to an initial review... thanks for your patience.

Really great work, happy to discuss further.

resolution.isDynamicEntryPoint &&
!resolution.chunkAlias
) {
dynamicImports.add(resolution);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be doing some colouring for dynamic imports to make this algorithm work properly.

Here is a test case I tried that isn't working correctly:

main.js

import './dep1.js';
console.log('main');

dep1.js

import './dep2.js';
console.log('dep1');

dep2.js

import('./dep1.js');
console.log('dep2');

The dynamic import back to dep1 should be informing a colouring of dep1.js being an entry point chunk effectively.

Instead I got the following output:

main.js

console.log('main');
console.log('dep1');
import('./generated-chunk.js');

generated-chunk.js

console.log('dep2');

The problem with this is that it has broken the execution order invariant that we require dep2 to execute statically and synchronously before dep1 and main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the key point from #2407 that I feel may have been missed here (correct me if I'm wrong):

it would actually be an improvement to the chunking algorithm if we could work out the common base dependencies of a given dynamic import (the intersection of the static dependency graph of all modules that use this dynamic import)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry but I cannot follow entirely. Using your test case, I (IMO correctly) get the following two chunks:

// main.js
import './generated-chunk.js';

console.log('main');

// generated-chunk.js
Promise.resolve().then(function () { return dep1; });
console.log('dep2');

console.log('dep1');

var dep1 = /*#__PURE__*/Object.freeze({

});

which looks correct to me. Is there something I am missing. Adding inlineDynamicImports, this all gets folded into one chunk correctly. One could argue that it could have been a single chunk all along but that is another point.

I must admit I also do not understand what you mean with your key point from #2407 . Maybe you can illustrate this with some examples of the form input -> expected output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this again and you're right it's working correctly - not sure why it wasn't working in my test previously.

@@ -142,9 +142,9 @@ export default class Chunk {
dependencies: ChunkDependencies;
exports: ChunkExports;
} = undefined;
isEmpty: boolean;
private chunkName: string | void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the chunk.id the same as the chunkName before? How are these two properties different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunkName is just the "[name]" part of the chunk id and is also used by generateId() for this purpose. The idea is that the name can be used as a stable way to identify a chunk while the actual id depends on the name pattern and possibly the hash. This name is now attached to the output.

if (!inputOptions.experimentalPreserveModules) {
if (singleInput) {
for (const chunk of chunks) {
if (chunk.entryModule === undefined) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be replaced with chunk.entryModules.length === 0 or similar to retain the logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the old logic was correct. Now we no longer have a single chunk as soon as there is more than one chunk no matter for what reason another chunk was created.

}
usedIds[chunk.id] = true;
}
assignChunkIds(chunks, singleChunk, inputOptions, outputOptions, inputBase, addons);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring.

@@ -363,13 +363,17 @@ export interface RenderedModule {
}

export interface RenderedChunk {
dynamicImports: string[];
exports: string[];
facadeModuleId: string | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would isFacade not handle this same information, meaning facadeModuleId === imports[0]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. Having a facadeModuleId does not mean this is a specially created facade but that this chunk has the same interface and purpose as the given module. When only having a single entry and no dynamic imports, the facadeModuleId of the created chunk would be the entry point id. If this does not properly reflect this meaning, maybe another name would be better?

@@ -0,0 +1,83 @@
import ExternalModule from '../ExternalModule';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this file separated.

isPureExternalModule: (id: string) => boolean;
moduleById = new Map<string, Module | ExternalModule>();
assetsById = new Map<string, Asset>();
modules: Module[] = [];
needsTreeshakingPass: boolean = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps explain what this refactoring fixed in the process if there was an underlying fix with this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there was though I no longer know the precise situation. Before, each module had its own needsTreeshakingPass. Now if a variable is declared in module 'a' but included by module 'b', it could happen that the flag would be set for module 'a' instead of module 'b'. If this module was already processed but no other modules request a new pass, it could happen that no new pass was triggered. Having a shared flag for all modules makes much more sense and fixes this kind of issue.
BTW as part of the refactorings of the other PR, I could get rid of requestTreeshakingPass on the AST context entirely and keep this logic to the module level.

if (!dynamicImportModule.chunkAlias)
dynamicImportModule.chunkAlias = dynamicImportAliases[i];
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this all moved out of here.

for (const entryModule of chunk.entryModules) {
if (chunk.facadeModule !== entryModule) {
const entryPointFacade = new Chunk(this, [], inlineDynamicImports);
entryPointFacade.turnIntoFacade(entryModule);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicer name!

@@ -109,7 +109,7 @@ export default class Chunk {
exportMode: string = 'named';
usedModules: Module[] = undefined;
id: string = undefined;
name: string;
variableName: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer!

@lukastaegert lukastaegert added this to In progress in Release 1.0.0 via automation Dec 10, 2018
@lukastaegert lukastaegert moved this from In progress to In Review in Release 1.0.0 Dec 10, 2018
Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I've read all the code, but I've checked out anything that might have concerned me about this and it's an excellent refactoring.

I noted the algorithm suggestion, perhaps that comment can be copied into #2407. This will be an important one I think though because otherwise chunk size grows quite fast with dynamic import count.

currentEntryHash = randomUint8Array(10);
modulesVisitedForCurrentEntry = { [currentEntry.id]: null };
addCurrentEntryColourToModule(currentEntry);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed that the dynamic colouring is being done here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here's an example of the algorithm I was suggesting (~> means dynamic import, -> means static import):

E1
~> D1 -> F -> T
-> S -> T      
-> R

E2
~> D2 -> G -> R
-> S -> T      
-> R

E3
~> D3 -> H -> S->T
-> S -> T      
-> R

The current colouring algorithm will result in no optimization groups at all in the above, as every module has a unique colouring.

On the other hand, we could get a more optimal chunking since we know that S, T and R are always loaded together despite the dynamic imports being different colours:

E1
~> D1 -> F -> STR
-> STR

E2
~> D2 -> G -> STR
-> STR  

E3
~> D3 -> H -> STR
-> STR

The algorithm is, for each dynamic import, to determine the intersection of the static graph of all of its importers. So for each of D1, D2 and D3 this is STR. Then when doing colouring for those dynamic imports, to skip these shared items when it comes to colouring, as the dynamic import isn't affecting their colours as they are already available.

This algorithm is also the general case of resolving the issue we have with the test case I provided:

Main -> dep1 -> dep2 ~> dep1

dep1 has parent importer static intersection Main, dep1, dep2. Therefore the colouring of dep1 doesn't recolour itself and as a result we get a single chunk.

Perhaps for another day, but glad to be able to verify the logic here. Let me know if that all sounds correct too please!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also here that dynamic import facades can be rendered inline:

import('./static-chunk').then(x => ({ get some () { return x.a }, get export () { return x.b }, get names () { return x.c } }));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this really helped understand it! Definitely worth adding this in the future!

Note also here that dynamic import facades can be rendered inline

Just a minor question, you are using getters instead of Object.freeze, any particular advantage or disadvantage to doing this? Or is Object.freeze semantically wrong here?

Otherwise this all really makes sense, I'll add it to the other issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, this is just another alternative to creating a namespace which does not require any changes to the exporting chunk.

resolution.isDynamicEntryPoint &&
!resolution.chunkAlias
) {
dynamicImports.add(resolution);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this again and you're right it's working correctly - not sure why it wasn't working in my test previously.

this.entryModule = module;
this.isEntryModuleFacade = true;
if (module.isEntryPoint || (module.isDynamicEntryPoint && !inlineDynamicImports)) {
this.entryModules.push(module);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I should have done a long time ago, thanks, a much richer representation.

@lukastaegert lukastaegert merged commit c87132b into master Dec 11, 2018
Release 1.0.0 automation moved this from In Review to Done Dec 11, 2018
@lukastaegert lukastaegert deleted the refactor-chunking branch December 11, 2018 07:15
@isidrok isidrok mentioned this pull request Dec 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Release 1.0.0
  
Done
4 participants