New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster checkReservedWord #13386
Faster checkReservedWord #13386
Conversation
@@ -2389,12 +2391,22 @@ export default class ExpressionParser extends LValParser { | |||
checkKeywords: boolean, | |||
isBinding: boolean, | |||
): void { | |||
if (this.prodParam.hasYield && word === "yield") { | |||
this.raise(startLoc, Errors.YieldBindingIdentifier); | |||
// All JavaScript reserved words are not longer than 10 characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The longest reserved word is instanceof
and implements
.
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/46602/ |
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. Latest deployment of this branch, based on commit 84be70f:
|
]); | ||
|
||
export function canBeReservedWord(word: string): boolean { | ||
return reservedWordLikeSet.has(word); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid maintaining another reserved words list, could this be word === "await" || word === "enum" || isStrictBindReservedWord(word)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of that long list, why not do something simple as x & ReservedWord
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Set#has
hashes the input string and checks if the hash value is in the table (source), since the the input string is scanned only during hash key is computed and the collision is rare by definition of hash, checking presence takes 1 scan of input and O(1) of the set size.
The StringEquals
first checks the length, then the memory address and then fallbacks to byte-by-byte comparison. (source) If we check word === "await"
and then isStrictBindReservedWord(word)
, the word
is read for 1.1 times (average on the universe distribution of string length), 1 for the hash computation and 1/10 for comparison with "await"
. word === "await"
also introduces extra branch conditions.
To avoid maintaining another reserved words list
I agree it is not ideal to duplicate keywords, WDYT if I add canBeReservedWord
to helper-validator-identifier
in Babel 7.15? Since that will be a feature, I can draft a subseqent PR that moves canBeReservedWord
to the validator package based on this one so we don't have to wait until 7.15.
why not do something simple as x & ReservedWord ?
@KFlash Can you elaborate? How can &
achieves searching here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for deferring it to 7.15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JLHwung It's possible if you make things complex and add some bitwise masks behind each reserved word token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify whether using a prototye-less object is faster? My intuition is that Sets are algorithmically fast, but the actual implementation in V8 is slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jridgewell It's faster because I "pack" all reserved words into a set of bitwise masks. It also consumes less memory. You are only dealing with series of number. see Kataw for an example. Same you can do with tokens, keywords, and everything you need. No need for a lookup.
Witch one is fastest? Set and lookup or this one
(token & ReserwedWord) === Reservedword // return true if reserved word
or a switch statement with 60 cases and 100lines of code to get all expression nodes or this one?
(token & IsExpressionNode) === IsExpressionNode // return true if expression node
It's very obvious what's fastest, but I think it's out of scope for Babel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jridgewell I come up with a benchmark file comparing these three approaches: Set#has, Object brand checks and (switch length + string equals).
The benchmark result on Node.js 16.1.0:
object brand checks different string: 2059273 ops/sec ±0.42% (0ms)
object brand checks constant string: 31712533 ops/sec ±0.22% (0ms)
set#has different string: 17756450 ops/sec ±1.26% (0ms)
set#has constant string: 38050433 ops/sec ±1.54% (0ms)
string#equals different string: 13789367 ops/sec ±0.44% (0ms)
string#equals constant string: 709703804 ops/sec ±0.42% (0ms)
On different string cases, where each input is in different memory address, Set#has
is 700% faster than prototype-less object brand check. On constant string cases, Set#has
is 20% faster than prototype-less object brand check.
On different string cases, Set#has
is 20% faster than (switch length + string equals). The latter significantly outperform Set#has
when the input is a constant string, which is expected because V8 optimizes out the per-character equality branch of StringEquals
. However in Babel parser, each identifier name have different address.
@KFlash Yes we can do a single bitwise check if we replace the data structure of token storage and create different token for different keywords. However Babel currently exposes its token storage via options.tokens
so we have to maintain compatibility here.
} | ||
// Most identifiers are not reservedWord-like, they don't need special | ||
// treatments afterward, which very likely ends up throwing errors | ||
if (!canBeReservedWord(word)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we can move this check after the if yield/await/arguments
checks, and just use isStrictBindReservedWord
? Intuitively it shouldn't be slower, but I didn't check.
This PR aims for improving the performance of
checkReservedWord
in common scenarios. It is based on the assumption that even words likeenum
,protected
could be a valid identifier name, such usage is rare. So in order to improve performance, we offer fast path for common names, and fallback to the slow path when it will be likely to throw an error.This PR also improves the parse scope utility functions: They are computationally equivalent and reflect my early work on tackling this issue.
Benchmark results
The baseline is c6bebfe, current is this PR.
Length-11 identifiers
Length-2 identifiers
Length-1 identifiers
await as identifier
As is expected, current is slower than the baseline when parsing valid
await
as identifier, because of the extra code path.