Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract string parsing to a separate package #14772

Merged

Conversation

nicolo-ribaudo
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo commented Jul 19, 2022

Q                       A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

This was hard, but it lets us avoid an external dependency with some logic that we already have in #14757.

@liuxingbaoyu you should be able to do something like this:

let unterminatedCalled = false;
try {
  const error = () => { throw new Error(); };
  cooked = parseStringContents("template", raw, 0, 0, 0, {
    unterminated() { unterminatedCalled = true; },
    strictNumericEscape: error,
    invalidEscapeSequence: error,
    numericSeparatorInEscapeSequence: error,
    unexpectedNumericSeparator: error,
    invalidDigit: error,
    invalidCodePoint: error,
  });
  if (!unterminatedCalled) cooked = null; // we must consume the whole raw string
} catch {}

This PR mostly moves code around, with one main change: instead of directly updating this.state.{pos,curLine,lineStart}, we keep track of them in variables and we update them only after returning from the helper parser functions.

@nicolo-ribaudo nicolo-ribaudo added PR: Internal 🏠 A type of pull request used for our changelog categories pkg: parser labels Jul 19, 2022
@JLHwung JLHwung self-requested a review July 20, 2022 00:17
@liuxingbaoyu
Copy link
Member

Amazing!

babel/babel.config.js

Lines 208 to 214 in 486d27a

{
test: [
"packages/babel-generator",
"packages/babel-plugin-proposal-decorators",
].map(normalize),
plugins: ["babel-plugin-transform-charcodes"],
},

Can you add it here?

@babel-bot
Copy link
Collaborator

babel-bot commented Jul 20, 2022

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/52568/

@@ -20,7 +20,7 @@
"extra": {
"rawValue": "\\u{}",
"raw": "\"\\u{}\"",
"expressionValue": "null"
"expressionValue": ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behaviour here is better than current main. Should we set expressionValue to be null when the string can not be successfully parsed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't know why the behavior changed 😬
But yes, null would be even better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, expressionValue is a property of directives that represents the .value property of the originally parsed StringLiteral. I feat that making it null would break anything relying on the fact that StringLiteral.value is a string, so I prefer the behavior currently implemented by this PR of just replacing invalid escapes with "" (for example, ab\u{}cd is abcd).

Copy link
Contributor

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Member

@liuxingbaoyu liuxingbaoyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a lax benchmark with no significant performance penalty.

current 1 jquery 3.6: 40.11 ops/sec ±5.28% (25ms)
current 4 jquery 3.6: 9.5 ops/sec ±3.78% (105ms)
current 16 jquery 3.6: 2.22 ops/sec ±7.14% (451ms)
current 64 jquery 3.6: 0.54 ops/sec ±2.83% (1861ms)
baseline 1 jquery 3.6: 41.42 ops/sec ±5.53% (24ms)
baseline 4 jquery 3.6: 9.89 ops/sec ±2.5% (101ms)
baseline 16 jquery 3.6: 2.36 ops/sec ±5% (424ms)
baseline 64 jquery 3.6: 0.55 ops/sec ±9.05% (1830ms)

@nicolo-ribaudo
Copy link
Member Author

Good! I believe rewriting all the tokenizer like this could actually improve performance, because we would skip all the this.state.* accesses and the this.... prototype lookup to call other tokenizer methods.

@nicolo-ribaudo nicolo-ribaudo merged commit ea5ff29 into babel:main Jul 20, 2022
@nicolo-ribaudo nicolo-ribaudo deleted the parser-extract-string-parsing branch July 20, 2022 15:42
Comment on lines +16 to +18
"devDependencies": {
"charcodes": "^0.2.0"
},
Copy link
Contributor

@merceyz merceyz Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if @babel/helper-string-parser will be bundled so asking to be on the safe side; shouldn't this be a normal dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a plugin that inlines charcodes.*** into the corresponding codepoints: https://github.com/xtuc/charcodes/blob/master/packages/babel-plugin-transform-charcodes/README.md

@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 21, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants