Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(codegen): fix unicode escaping in es5 #3636

Merged
merged 12 commits into from Feb 25, 2022

Conversation

williamtetlow
Copy link
Contributor

@williamtetlow williamtetlow commented Feb 19, 2022

Description:
Fixes #3617

Babel has the logic if target <= ES5 && original source string contains any unicode map all non-ascii to unicode.

It's a bit implicit as the logic is across a few plugins and also through the options passed to jsesc (that depend on target and babel version) but here babel sets node.extra = undefined if there's any unicode in string literal

And here if node.extra exists it will return the string and otherwise process it with jsesc

  • I've replicated this logic in this PR.
  • I've also added the logic for converting unicode characters > 0xFFFF to surrogate pairs to support ES5.
  • I've updated the TSC test suite to match TSC and assert against unicode code points instead of characters

BREAKING CHANGE:

Related issue (if exists):
#3617

@williamtetlow williamtetlow marked this pull request as ready for review February 21, 2022 18:06
@williamtetlow
Copy link
Contributor Author

v: &str,
target: EsVersion,
single_quote: bool,
emit_non_ascii_as_unicode: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting this to true is equivalent to disabling jsesc minimal option

https://github.com/mathiasbynens/jsesc#minimal

@kdy1 kdy1 self-assigned this Feb 22, 2022
}
};

if target <= EsVersion::Es5 {
let emit_non_ascii_as_unicode = UNICODE_CODEPOINT.is_match(&orig);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern about this PR I have is the performance of this line.
Do we really need regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdy1 I think that's my JS brain defaulting to regex 😛 Simple string contains is enough, updated in 4a11bab

Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly LGTM.


swc-bump:

  • swc_ecma_codegen

Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


swc-bump:

  • swc_ecma_codegen

@kdy1 kdy1 enabled auto-merge (squash) February 25, 2022 12:32
@kdy1 kdy1 added this to the v1.2.146 milestone Feb 25, 2022
@kdy1 kdy1 merged commit dba90ea into swc-project:main Feb 25, 2022
@williamtetlow williamtetlow deleted the fix-unicode-escaping-in-es5 branch February 25, 2022 13:51
@swc-project swc-project locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Improper unescaping of Unicode (ES5 and below)
3 participants