Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should output UTF-8 BOM when input contains UTF-8 BOM #821

Open
sammyhk opened this issue Sep 3, 2021 · 14 comments
Open

Should output UTF-8 BOM when input contains UTF-8 BOM #821

sammyhk opened this issue Sep 3, 2021 · 14 comments

Comments

@sammyhk
Copy link

sammyhk commented Sep 3, 2021

Bug report

When integrate with sass-loader, it will output CSS with UTF-8 BOM (i.e. \uFEFF) at the beginning of the CSS when it is run in production mode. But this plugin seems trimmed the UTF-8 BOM in the final output CSS file.
This is a reproducible sample repository (https://github.com/sammyhk/scss-utf8.git).
Execute npm run sass which call sass directly produce the CSS file (./dist/in.scss.bom.min.css) with UTF-8 BOM included.
Execute npm run build which involve webpack, sass-loader and this plugin produce the CSS file (./dist/in.scss.min.css) without UTF-8 BOM.

Actual Behavior

Seems this plugin trimmed the UTF-8 BOM and output CSS file without BOM.

Expected Behavior

Should preserve the UTF-8 BOM if exists and output CSS file with BOM.

How Do We Reproduce?

This is a reproducible sample repository (https://github.com/sammyhk/scss-utf8.git).
Execute npm run sass which call sass directly produce the CSS file (./dist/in.scss.bom.min.css) with UTF-8 BOM included.
Execute npm run build which involve webpack, sass-loader and this plugin produce the CSS file (./dist/in.scss.min.css) without UTF-8 BOM.

Please paste the results of npx webpack-cli info here, and mention other relevant information

System:
OS: Windows 10 10.0.19042
CPU: (8) x64 Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Memory: 2.59 GB / 15.79 GB
Binaries:
Node: 12.22.4 - C:\Program Files\nodejs\node.EXE
npm: 6.14.14 - C:\Program Files\nodejs\npm.CMD
Browsers:
Chrome: 92.0.4515.131
Edge: Spartan (44.19041.1023.0), Chromium (92.0.902.84)
Internet Explorer: 11.0.19041.1
Packages:
html-webpack-plugin: ^5.3.2 => 5.3.2
webpack: ^5.51.2 => 5.51.2
webpack-cli: ^4.8.0 => 4.8.0
webpack-dev-server: ^4.1.0 => 4.1.0

@alexander-akait
Copy link
Member

I think oroblem in postcss, we use it as parser and postcss remove UTF-8 BOM, in this plugin we don't touch your content

@alexander-akait
Copy link
Member

alexander-akait commented Sep 3, 2021

Also due logic merging modules in one file using UTF-8 BOM is not safe because you will have BOM in the middle of the file, I strongly recommend do not use BOM

@sammyhk
Copy link
Author

sammyhk commented Sep 4, 2021

Also due logic merging modules in one file using UTF-8 BOM is not safe because you will have BOM in the middle of the file, I strongly recommend do not use BOM

This is the most frustrated part, as a webpack & sass user we do not have a choice but the BOM is generated by sass. The original SCSS file (./src/in.scss) is plain ASCII with UTF-8 escaped sequence, but sass will un-escape as UTF-8 character and add BOM to the generated output.

For bundling multiple CSS into one, according to CSS specification, either binary sequence of @charset "UTF-8"; or BOM at the beginning of the file should change the CSS charset to be UTF-8. So the bundling tools should properly handle this case (move @charset "UTF-8"; or BOM to the beginning of the final output instead leaving them in the middle of the file).

Reference:
https://developer.mozilla.org/en-US/docs/Web/CSS/@charset
https://www.w3.org/TR/CSS2/syndata.html#charset

@alexander-akait
Copy link
Member

@charset "UTF-8"; and UTF-8 BOM are different things

@alexander-akait
Copy link
Member

Honestly we have here to remove BOM in webpack, some developers want to keep them, some to remove 😕

@sammyhk
Copy link
Author

sammyhk commented Sep 4, 2021

@charset "UTF-8"; and UTF-8 BOM are different things

At least for CSS it treated as the same thing, according to the specification (https://www.w3.org/TR/CSS2/syndata.html#charset) the initial bytes

  • EF BB BF (the UTF-8 BOM binary form)
  • 40 63 68 61 72 73 65 74 20 22 55 54 46 2D 38 22 3B (the @charset "UTF-8"; binary form)

will change the charset of the CSS parse to be UTF-8

For this reason, SASS itself will generate @charset "UTF-8"; or BOM based on different mode (https://github.com/sass/dart-sass/blob/main/lib/src/visitor/serialize.dart#L64-L68)
NOTE: webpack development mode mapped to SASS expanded mode, webpack production mode mapped to SASS compressed mode by the sass-loader.

As a webpack user, seems SASS do its job properly as it can correctly generate the CSS file when not involving webpack. That must be a problem in webpack or webpack plugins which breaks the things.

@alexander-akait
Copy link
Member

alexander-akait commented Sep 4, 2021

hm, yes, you are right, just interesting why do you need UTF-8 BOM? Do you have server with other encoding by default? Why do not use @charset "UTF-8";?

@sammyhk
Copy link
Author

sammyhk commented Sep 4, 2021

hm, yes, you are right, just interesting why do you need UTF-8 BOM? Do you have server with other encoding by default? Why do not use @charset "UTF-8";?

In SCSS even you explicitly write @charset "UTF-8"; it will be erased and SASS will depends on the input source whether it contains UTF-8 character (or UTF-8 character escaped sequence) and the output mode mentioned before to automatically inject either @charset "UTF-8"; or UTF-8 BOM... That is why I'm looking for a way to fix the output CSS as currently there is no way to fix that...
For the server encoding, definitely it can return the CSS by content type text/css; charset=UTF-8 header, but it will depends on the server side. I would consider that is a workaround but not a fix as the standalone CSS is still problematic...

@alexander-akait
Copy link
Member

Just clarify, do you want:

  1. if developer has @charset "UTF-8"; we keep it but output on the top (limitation, if you have multiple files with different @charset "UTF-8"; we can't merge them)
  2. If developer has UTF-8 BOM we keep it but output on the top (here again the same limitation)

@sammyhk
Copy link
Author

sammyhk commented Sep 4, 2021

Just clarify, do you want:

  1. if developer has @charset "UTF-8"; we keep it but output on the top (limitation, if you have multiple files with different @charset "UTF-8"; we can't merge them)
  2. If developer has UTF-8 BOM we keep it but output on the top (here again the same limitation)

That should be good enough.

@alexander-akait
Copy link
Member

Related webpack-contrib/css-loader#1212

@alexander-akait
Copy link
Member

Ideally css-loader should trim @charset or UTF-8 BOM and keep it as property of modules, so we can handle it for style-loader/mini-css-extract-plugin/etc

@alexander-akait
Copy link
Member

We strip UTF-8 BOM by default for all modules to avoid problems with merging and to avoid other problems (caching for example), so using UTF-8 BOM in webpack has limitation https://github.com/webpack/loader-runner/blob/master/lib/LoaderRunner.js#L11

@alexander-akait
Copy link
Member

dart-sass adds @charset when you have non ascii characters in your file webpack-contrib/sass-loader#985 (need update to the latest version), so now we should handle it for css-loader and problems will be solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants