Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedupe emitted assets #2959

Closed
jakearchibald opened this issue Jun 22, 2019 · 10 comments
Closed

Dedupe emitted assets #2959

jakearchibald opened this issue Jun 22, 2019 · 10 comments

Comments

@jakearchibald
Copy link
Contributor

this.emitAsset('foo.txt', 'hello world');
this.emitAsset('foo.txt', 'hello world');

Expected Behavior / Situation

Outputs one asset, foo-19916f7d.txt.

Actual Behavior / Situation

Outputs two assets. foo-19916f7d.txt and foo-19916f7d2.txt.

Modification Proposal

If two plugins emit the same asset, they should dedupe.

@lukastaegert
Copy link
Member

So the suggestion is if two plugins use the same name and the same source, it should be the same asset, but if either of those differs, we deduplicate (even if the generated file name would match e.g. when not using hashes)?

That would certainly make sense but I would like to know what the situation is where this happens. To me it seems rather unlikely that two different plugins want to emit the exact same thing. Unless this is intended to be able to reference the same asset from two plugins, in which case maybe the plugins should communicate directly to exchange the reference id? This would also avoid duplicating the logic that creates the asset source.

@jakearchibald
Copy link
Contributor Author

If two plugins use the same name & a different source… that might even be an error. But I'm not sure about that.

Use cases: a CSS plugin adds a background image. Some JS also adds it for use in a canvas.

@lukastaegert
Copy link
Member

That is a valid use case, thanks! As the file names of assets are controlled by Rollup anyway and not the plugin—which fill be able to use the emitFile hook with type "file" for those things in the future—I think it makes sense to ALWAYS dedupe assets with the same source, even if the name differs. In that case, we pick one of the two names. Will add this to #2938

If two plugins use the same name & a different source… that might even be an error. But I'm not sure about that.

Scenario I see here is that two components both use an arrow.svg for some UI chrome that differs between the components. Here I would definitely want to have two distinct arrows with non-conflicting names.

@jakearchibald
Copy link
Contributor Author

Sounds good!

@frank-dspeed
Copy link
Contributor

frank-dspeed commented Jan 18, 2020

maybe we should even go for a hasing deduping even if that is expensiv it could point out the file locations and other infos how to fix that it would really help also with deep nested dependencys for example one script could require a diffrent version of a script but there got only 1 file changed all others share the same content so content hashing is what i would propose. NOT ONLY FOR ASSETS!

@lukastaegert
Copy link
Member

We cannot just dedupe JS modules that have identical content unless we can confirm they are stateless. If they retain any state in local variables, deduping will change functionality.

@frank-dspeed
Copy link
Contributor

@lukastaegert oh yes sure your right it gets a bit complex but its maybe worth the effort i am at present working on a collection of tools to clean up code bases that are using NPM and i found out that even many many npm packages depend on diffrent named same dependencys its a total mess that can only get cleaned via codeMods.

We could wrap dedubed code if its big enought to be worth that with a replacement code that references to the original source but returns Object.create(exportname)

to get a isolated state for the dedubed module.

@frank-dspeed
Copy link
Contributor

to be more technical

common helper functions like array operations and type checks are most time implamented with same code in over 1000 packages :D per Project that uses NPM

@lukastaegert
Copy link
Member

To me, this sounds like an immensely complicated, slow and error-prone process with a myriad of unhandled edge cases, but please prove me wrong. You could build a tool that does that by running Rollup twice. During the first pass, you collect all generated code by providing a transform hook and storing whatever gets passed in. Then you search for duplicates. Then you run Rollup again with a plugin that replaces code with deduplicated code.

@lukastaegert
Copy link
Member

BTW this issue has already been solved in #2999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants