Faster checkReservedWord #13386

JLHwung · 2021-05-28T02:45:03Z

Q	A
License	MIT

This PR aims for improving the performance of checkReservedWord in common scenarios. It is based on the assumption that even words like enum, protected could be a valid identifier name, such usage is rare. So in order to improve performance, we offer fast path for common names, and fallback to the slow path when it will be likely to throw an error.

This PR also improves the parse scope utility functions: They are computationally equivalent and reflect my early work on tackling this issue.

Benchmark results

The baseline is c6bebfe, current is this PR.

Length-11 identifiers

baseline 64 length-11 identifiers: 28849 ops/sec ±1.09% (0.035ms)
baseline 128 length-11 identifiers: 14846 ops/sec ±1.06% (0.067ms)
baseline 256 length-11 identifiers: 7567 ops/sec ±0.5% (0.132ms)
baseline 512 length-11 identifiers: 3772 ops/sec ±1.01% (0.265ms)
current 64 length-11 identifiers: 30111 ops/sec ±0.38% (0.033ms)
current 128 length-11 identifiers: 15391 ops/sec ±0.58% (0.065ms)
current 256 length-11 identifiers: 7544 ops/sec ±2.2% (0.133ms)
current 512 length-11 identifiers: 3883 ops/sec ±0.43% (0.258ms)

Length-2 identifiers

baseline 64 length-2 identifiers: 30858 ops/sec ±0.94% (0.032ms)
baseline 128 length-2 identifiers: 16131 ops/sec ±0.23% (0.062ms)
baseline 256 length-2 identifiers: 8097 ops/sec ±0.16% (0.124ms)
baseline 512 length-2 identifiers: 3673 ops/sec ±2.19% (0.272ms)
baseline 1024 length-2 identifiers: 1952 ops/sec ±1.26% (0.512ms)
current 64 length-2 identifiers: 33006 ops/sec ±0.92% (0.03ms)
current 128 length-2 identifiers: 17058 ops/sec ±1% (0.059ms)
current 256 length-2 identifiers: 8522 ops/sec ±1.42% (0.117ms)
current 512 length-2 identifiers: 4301 ops/sec ±1.04% (0.232ms)
current 1024 length-2 identifiers: 2182 ops/sec ±0.26% (0.458ms)

Length-1 identifiers

baseline 64 length-1 identifiers: 34244 ops/sec ±0.72% (0.029ms)
baseline 128 length-1 identifiers: 16767 ops/sec ±1.99% (0.06ms)
baseline 256 length-1 identifiers: 7606 ops/sec ±2.63% (0.131ms)
baseline 512 length-1 identifiers: 4467 ops/sec ±0.53% (0.224ms)
baseline 1024 length-1 identifiers: 2183 ops/sec ±1.12% (0.458ms)
current 64 length-1 identifiers: 35132 ops/sec ±1.19% (0.028ms)
current 128 length-1 identifiers: 18390 ops/sec ±0.69% (0.054ms)
current 256 length-1 identifiers: 9260 ops/sec ±0.81% (0.108ms)
current 512 length-1 identifiers: 4500 ops/sec ±1.46% (0.222ms)
current 1024 length-1 identifiers: 2304 ops/sec ±0.83% (0.434ms)

await as identifier

baseline 64 await identifier: 26803 ops/sec ±1.36% (0.037ms)
baseline 128 await identifier: 14133 ops/sec ±0.9% (0.071ms)
baseline 256 await identifier: 7061 ops/sec ±1.09% (0.142ms)
baseline 512 await identifier: 3491 ops/sec ±2.81% (0.286ms)
current 64 await identifier: 26729 ops/sec ±0.74% (0.037ms)
current 128 await identifier: 13539 ops/sec ±1.15% (0.074ms)
current 256 await identifier: 6819 ops/sec ±1.2% (0.147ms)
current 512 await identifier: 3470 ops/sec ±0.49% (0.288ms)

As is expected, current is slower than the baseline when parsing valid await as identifier, because of the extra code path.

JLHwung · 2021-05-28T02:45:36Z

packages/babel-parser/src/parser/expression.js

@@ -2389,12 +2391,22 @@ export default class ExpressionParser extends LValParser {
    checkKeywords: boolean,
    isBinding: boolean,
  ): void {
-    if (this.prodParam.hasYield && word === "yield") {
-      this.raise(startLoc, Errors.YieldBindingIdentifier);
+    // All JavaScript reserved words are not longer than 10 characters.


The longest reserved word is instanceof and implements.

babel-bot · 2021-05-28T02:47:48Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/46602/

codesandbox-ci · 2021-05-28T02:49:23Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 84be70f:

Sandbox	Source
babel-repl-custom-plugin	Configuration
babel-plugin-multi-config	Configuration

nicolo-ribaudo · 2021-05-28T07:31:38Z

packages/babel-parser/src/util/identifier.js

+]);
+
+export function canBeReservedWord(word: string): boolean {
+  return reservedWordLikeSet.has(word);


To avoid maintaining another reserved words list, could this be word === "await" || word === "enum" || isStrictBindReservedWord(word)?

Instead of that long list, why not do something simple as x & ReservedWord ?

The Set#has hashes the input string and checks if the hash value is in the table (source), since the the input string is scanned only during hash key is computed and the collision is rare by definition of hash, checking presence takes 1 scan of input and O(1) of the set size.

The StringEquals first checks the length, then the memory address and then fallbacks to byte-by-byte comparison. (source) If we check word === "await" and then isStrictBindReservedWord(word), the word is read for 1.1 times (average on the universe distribution of string length), 1 for the hash computation and 1/10 for comparison with "await". word === "await" also introduces extra branch conditions.

To avoid maintaining another reserved words list

I agree it is not ideal to duplicate keywords, WDYT if I add canBeReservedWord to helper-validator-identifier in Babel 7.15? Since that will be a feature, I can draft a subseqent PR that moves canBeReservedWord to the validator package based on this one so we don't have to wait until 7.15.

why not do something simple as x & ReservedWord ?

@KFlash Can you elaborate? How can & achieves searching here?

👍 for deferring it to 7.15

@JLHwung It's possible if you make things complex and add some bitwise masks behind each reserved word token.

Can you verify whether using a prototye-less object is faster? My intuition is that Sets are algorithmically fast, but the actual implementation in V8 is slow.

@jridgewell It's faster because I "pack" all reserved words into a set of bitwise masks. It also consumes less memory. You are only dealing with series of number. see Kataw for an example. Same you can do with tokens, keywords, and everything you need. No need for a lookup.

Witch one is fastest? Set and lookup or this one
(token & ReserwedWord) === Reservedword // return true if reserved word

or a switch statement with 60 cases and 100lines of code to get all expression nodes or this one?

(token & IsExpressionNode) === IsExpressionNode // return true if expression node

It's very obvious what's fastest, but I think it's out of scope for Babel

@jridgewell I come up with a benchmark file comparing these three approaches: Set#has, Object brand checks and (switch length + string equals).

The benchmark result on Node.js 16.1.0:

object brand checks different string: 2059273 ops/sec ±0.42% (0ms) object brand checks constant string: 31712533 ops/sec ±0.22% (0ms) set#has different string: 17756450 ops/sec ±1.26% (0ms) set#has constant string: 38050433 ops/sec ±1.54% (0ms) string#equals different string: 13789367 ops/sec ±0.44% (0ms) string#equals constant string: 709703804 ops/sec ±0.42% (0ms)

On different string cases, where each input is in different memory address, Set#has is 700% faster than prototype-less object brand check. On constant string cases, Set#has is 20% faster than prototype-less object brand check.

On different string cases, Set#has is 20% faster than (switch length + string equals). The latter significantly outperform Set#has when the input is a constant string, which is expected because V8 optimizes out the per-character equality branch of StringEquals. However in Babel parser, each identifier name have different address.

@KFlash Yes we can do a single bitwise check if we replace the data structure of token storage and create different token for different keywords. However Babel currently exposes its token storage via options.tokens so we have to maintain compatibility here.

nicolo-ribaudo · 2021-05-28T07:34:20Z

packages/babel-parser/src/parser/expression.js

+    }
+    // Most identifiers are not reservedWord-like, they don't need special
+    // treatments afterward, which very likely ends up throwing errors
+    if (!canBeReservedWord(word)) {


Or maybe we can move this check after the if yield/await/arguments checks, and just use isStrictBindReservedWord? Intuitively it shouldn't be slower, but I didn't check.

packages/babel-parser/src/parser/expression.js

JLHwung added 4 commits May 27, 2021 22:16

perf: faster parser scope check

c6d17ff

perf: early return for identifier length > 10

e2ded8a

perf: early return for normal identifier names

dcbfd45

chore: add benchmark

6c4fc16

JLHwung added pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories labels May 28, 2021

JLHwung commented May 28, 2021

View reviewed changes

nicolo-ribaudo reviewed May 28, 2021

View reviewed changes

nicolo-ribaudo approved these changes May 31, 2021

View reviewed changes

existentialism reviewed Jun 1, 2021

View reviewed changes

packages/babel-parser/src/parser/expression.js Outdated Show resolved Hide resolved

existentialism approved these changes Jun 1, 2021

View reviewed changes

Update packages/babel-parser/src/parser/expression.js

84be70f

JLHwung merged commit cbad50a into babel:main Jun 1, 2021

JLHwung deleted the faster-check-reserved-words branch June 1, 2021 12:42

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Sep 1, 2021

github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster checkReservedWord #13386

Faster checkReservedWord #13386

JLHwung commented May 28, 2021 •

edited

JLHwung May 28, 2021

babel-bot commented May 28, 2021 •

edited

codesandbox-ci bot commented May 28, 2021 •

edited

nicolo-ribaudo May 28, 2021

KFlash May 28, 2021 •

edited

JLHwung May 28, 2021

nicolo-ribaudo May 28, 2021

KFlash May 28, 2021

jridgewell May 29, 2021

KFlash May 29, 2021 •

edited

JLHwung May 31, 2021

nicolo-ribaudo May 28, 2021

Faster checkReservedWord #13386

Faster checkReservedWord #13386

Conversation

JLHwung commented May 28, 2021 • edited

Benchmark results

Choose a reason for hiding this comment

babel-bot commented May 28, 2021 • edited

codesandbox-ci bot commented May 28, 2021 • edited

Choose a reason for hiding this comment

KFlash May 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KFlash May 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JLHwung commented May 28, 2021 •

edited

babel-bot commented May 28, 2021 •

edited

codesandbox-ci bot commented May 28, 2021 •

edited

KFlash May 28, 2021 •

edited

KFlash May 29, 2021 •

edited