Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assets with the same content are de-duplicated at build time #8632

Closed
7 tasks done
hemengke1997 opened this issue Jun 17, 2022 · 19 comments
Closed
7 tasks done

Assets with the same content are de-duplicated at build time #8632

hemengke1997 opened this issue Jun 17, 2022 · 19 comments
Labels
p3-minor-bug An edge case that only affects very specific usage (priority)

Comments

@hemengke1997
Copy link
Contributor

Describe the bug

Hi! I have many assets like svg and their [content] is the same.
After vite build, I only got two svg. The rest have been merged together.
I checked the code of vite in

if (!emittedSet.has(contentHash)) {
.
Is this made wrong? Or is this the default behavior of rollup.
How can I get all assests after build? I don't want chunk splitting.

Reproduction

https://stackblitz.com/edit/vitejs-output-file-paths-gftt9h?file=dist/manifest.json

System Info

vite 2.9.12

Used Package Manager

pnpm

Logs

No response

Validations

@hemengke1997
Copy link
Contributor Author

I found this is not rollup's behavior.
due to

if (!emittedSet.has(contentHash)) {

has cached the assests by hashcontent. How can I disable this behavior?

Looking forward to your answer

@patak-dev
Copy link
Member

I also think that this isn't the best here. Maybe we should hash the input file name instead. It is an interesting optimization but feels to me that people should opt into it.

@patak-dev patak-dev added bug p3-minor-bug An edge case that only affects very specific usage (priority) and removed pending triage labels Jun 17, 2022
@patak-dev patak-dev changed the title assets build generation result is wrong Assets with the same content are de-duplicated at build time Jun 17, 2022
@bluwy
Copy link
Member

bluwy commented Jun 17, 2022

Isn't it a good thing that same content assets are deduped to reduce bundle size?

@hemengke1997
Copy link
Contributor Author

Of course it is very good for reducing the bundle size in SPA.
But in MPA, developers want put the files together by every webpage.
If some asstes are de-duplicated, we wouldn't be able to know where the assets coming from.
Moreover, the manifest maybe like this:

  "index/index.html": {
    "file": "index/index-entry-a9d46e1c.js",
    "src": "index/index.html",
    "isEntry": true,
    "imports": [
      "_plugin-vue_export-helper-chunk-21debc91.js"
    ],
    "css": [
      "index/index.css"
    ],
    "assets": [
      "subpage/images/test.png", // this is Index but it references subpage
      "index/images/1.svg"
    ]
  },
  "subpage/index.html": {
    "file": "subpage/subpage-entry-4b9aeb4f.js",
    "src": "subpage/index.html",
    "isEntry": true,
    "imports": [
      "_plugin-vue_export-helper-chunk-21debc91.js"
    ],
    "css": [
      "subpage/subpage.css"
    ],
    "assets": [
      "subpage/images/logo.png",
      "subpage/images/test.png"
    ]
  },

This made me hard to deploy

@patak-dev
Copy link
Member

patak-dev commented Jun 17, 2022

The possible issues I see are:

  • A user has several images with proper names (and two of them have the same content but are conceptually different). Maybe there is another process in a backend integration that grabs that images and does something with them?
  • If that isn't the case and it is safe to de-duplicate, maybe a warning (that I could turn on with a comment) could be issued so the duplicated image is also removed from the repo?

There is also a perf implication of computing this hash. Maybe it is worthy, but it was surprising to me so at least we should better document it.

@bluwy
Copy link
Member

bluwy commented Jun 18, 2022

Thanks for the explanations. Yeah I think this makes sense to me too. It's rare for someone to have to completely equal assets, and if they did it's likely intended and we should separate them.

Regarding perf, that's a good point too. It seems like it's being used for various stuff though so it might not be easy to optimize it.

@bluwy
Copy link
Member

bluwy commented Jun 27, 2022

Is there a reason this is closed? I think the issue is valid.

@hemengke1997
Copy link
Contributor Author

Sorry, my bad.
I thought this issue was unnecessary.
I'm very sorry.

@dostrog
Copy link

dostrog commented Aug 30, 2022

today we also got this issue. I've checked hash in CLI - the same content, but different names

$ md5 test.png
MD5 (test.png) = e6fa9f20b661ea00978dad52102d4fa1
$ md5 test3.png
MD5 (test3.png) = e6fa9f20b661ea00978dad52102d4fa1

and there are not both files in the manifest.json. May be it is possible to calculating hashes depending not only by content, but on filename, modification date or smt else ?

Is there some temp solution for this issue ?

@timacdonald
Copy link
Contributor

timacdonald commented Sep 1, 2022

Hey folks, I've hit this with the Laravel integration as well, and I've proposed the following fix: #9928

It is my opinion that if two files are the same e.g.

echo "SVG_CONTENT_HERE" > loader.svg > spinner.svg

That is a suitable solution to include the file in the build once, but populate the manifest with two entries.

{
  "loader.svg": {
    "file": "assets/loader.01234.svg",
    "src": "loader.svg"
  },
  "spinner.svg": {
    "file": "assets/loader.01234.svg",
    "spinner.svg"
  }
  // ...
}

This means that if you are compressing images it only has to compress the one image, rather than duplicate images and also that your bundle size is small i.e. one image rather than two are ouput.

@dostrog it is my personal opinion that although your issue is related, it is a unique issue.

I wonder if Vite could offer a "hash" config option that gets the file etc. and that way for the 95% use case you don't need anything, but for the 5% when clashes occur there is an escape hatch.

export default defineConfig({
    hasher: (file) => md5(file.contents) + md5(file.fileName),
    // ...
};

I'm no JS expert though, so this might be a silly idea.

@dostrog
Copy link

dostrog commented Sep 1, 2022

@dostrog it is my personal opinion that although your issue is related, it is a unique issue.

Thanks for your reply! Yeah unique and rare, but exists.

at the moment I've just add file name + generate hash (unfortunately. do not have time for full PR) and use forked repo for Vite

in asset.ts:

...
    const map = assetHashToFilenameMap.get(config)!
    const contentHash = getHash(content + file) // <- make hash more unique
    const { search, hash } = parseUrl(id)
...

@patak-dev
Copy link
Member

We decided today that there isn't a good use case yet to justify an option to avoid deduplication. Would you check if @timacdonald's PR is enough for your needs:

If not, please provide more context about the particular use case. Thanks!

@dostrog
Copy link

dostrog commented Sep 10, 2022

I've just checked with

this PR successfully resolves our issue with the same content in different files also.

Thank you guys!

@sapphi-red
Copy link
Member

Closing as it seems #9928 was enough.

@hemengke1997
Copy link
Contributor Author

@sapphi-red sorry for the late reply. i've checked the reproduction and update vite to lastest, and found this is not solved.

Here is the repro: https://stackblitz.com/edit/vitejs-output-file-paths-ylbeyd

@sapphi-red
Copy link
Member

@hemengke1997 #8632 (comment)
Would you provide more context about the particular use case if #9928 isn't enough?

@hemengke1997
Copy link
Contributor Author

@sapphi-red What I hope is that the duplicated assets will not be deduplicated after bundle, and the problem solved by this pr #9928 is the manifest reference.
It seems that this problem can only be solved from rollup

@sapphi-red
Copy link
Member

@hemengke1997 Yes I understand the difference. But we didn't understand why you need to avoid the dedupe of the actual file. Would you provide more context about that?

@hemengke1997
Copy link
Contributor Author

@hemengke1997 Yes I understand the difference. But we didn't understand why you need to avoid the dedupe of the actual file. Would you provide more context about that?

I am trying to cache packed results. Because my project structure is multi-page, and each page is not directly related, the speed will be slower every time it is fully bundled, so I need know whether the file needs to be removed from the cache by controlling the hash. And multi-asset merging will cause cache instability.
My need is very rare, so I don't think there is any need to modify the vite source code. At first I thought it was a vite bundle bug. But I now think merging assets is reasonable for the vast majority of projects.
Thank you for your enthusiasm and responsibility

@github-actions github-actions bot locked and limited conversation to collaborators Nov 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
p3-minor-bug An edge case that only affects very specific usage (priority)
Projects
Archived in project
Development

No branches or pull requests

6 participants