Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parse() performance #278

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Improve parse() performance #278

wants to merge 9 commits into from

Conversation

romgrk
Copy link

@romgrk romgrk commented Jun 14, 2022

Changes

I was reading the parser code and saw a few things that could improve the performance, they are summarized below.

Use integers constants for states

I saw that the state constants where strings. I thought that it could be a quick gain to replace them with named integer constants. The benchmark results I get are that there is a ~20% improvement to parse() by doing so.

Avoid allocation

While I was looking for other improvements, I also noticed that the token variable was assigned a new object everytime, I therefore created an object tokenRef, which is updated every time token needs to be assigned. I couldn't get a statistically significant improvement here, results were either 1% better or equal. But I've included it anyway because removing allocations alleviates the job of the garbage collector, which is something that's hard to benchmark but always nice to have.

Use codepoints instead of chars

Similarly to the first point, I thought it could be faster to use (integer) codepoints instead of chars. I ran the code through some code to get new code:

newCode = fs.readFileSync('./lib/parse.js').toString()
  .replaceAll(/'(\\\w|.|\\u....)'/g, (m) => `${eval(m).codePointAt(0)} /* ${m} */`)

And made sure functions were adapted to receive codepoints instead of chars. This last change had a bigger impact, I would say about 70-80% improvement over the previous one.

Summary

The changes above end up giving an improvement of around 100% compared to the original version. All three of them are independant of each other, so you could choose to pick some of them only. If this PR seems too dauting to read in one go, each commit is complete and represents one change.

The benchmark code: https://gist.github.com/romgrk/eb4a2a16422e50bc37e811c934a66f8f

Small file:

$ node benchmark.js ./package.json5
parseOriginal x 4,210 ops/sec ±0.93% (95 runs sampled)
parseImproved x 8,285 ops/sec ±0.92% (94 runs sampled)
Fastest is parseImproved

Large file:

$ node benchmark.js ./data.json5
parseOriginal x 255 ops/sec ±0.31% (92 runs sampled)
parseImproved x 544 ops/sec ±0.78% (96 runs sampled)
Fastest is parseImproved

# `data.json5` is `package.json5` pasted a bunch of times in an array, with a few 
# added values to hit the numbers & booleans codepaths

@jordanbtucker
Copy link
Member

jordanbtucker commented Jun 14, 2022

Thank you for these performance suggestions! Would you mind rebasing as we've made some CI changes and I'd like to see the results of this PR in CI. Thanks!

@romgrk
Copy link
Author

romgrk commented Jun 14, 2022

Done, I've rebased on main.

edit: And it's failing due to coverage. How can I see which lines aren't covered?

Copy link
Member

@jordanbtucker jordanbtucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, except for a few minor changes! I'll take a closer look at everything when I have a bit more time.

lib/parse.js Outdated Show resolved Hide resolved
lib/parse.js Show resolved Hide resolved
lib/util.js Outdated Show resolved Hide resolved
lib/util.js Show resolved Hide resolved
@jordanbtucker
Copy link
Member

jordanbtucker commented Jun 14, 2022

To view coverage, run npm run coverage. This will create a file at coverage/lcov-report/index.html where you can review coverage in a browser.

lib/util.js Outdated Show resolved Hide resolved
@romgrk
Copy link
Author

romgrk commented Jun 15, 2022

Build passing & all good on my side, this can be merged if you're satisfied with it.

Copy link
Member

@jordanbtucker jordanbtucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is looking good. We just need to run npm run lint and fix one issue and a nitpick.

lib/parse.js Outdated Show resolved Hide resolved
lib/parse.js Outdated Show resolved Hide resolved
@romgrk
Copy link
Author

romgrk commented Jun 15, 2022

Linting complete.

edit: Oh wait no missed the 4 digits.
edit: Ok now we're good.

@jordanbtucker
Copy link
Member

Awesome! I'm going to sit on this PR for a day just to make sure I'm not missing anything and vet the code a bit more. I really appreciate the code improvements and well-written description.

@jordanbtucker jordanbtucker added this to the v2.3.0 milestone Jun 18, 2022
@romgrk
Copy link
Author

romgrk commented Oct 14, 2022

Ping @jordanbtucker: is there still interest for this PR? If not feel free to close.

I should also note, for completeness' sake, that if someone is really after performance, then using a pure javascript implementation of JSON5 won't cut it. The native JSON implementation in node/V8 is 20-40x faster in the same benchmarks used above. But saving cycles is always nice.

Copy link
Member

@jordanbtucker jordanbtucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is good except for the hex numbers. It would be great if you could refactor them as constants instead of using block comments to identify them.

@jordanbtucker
Copy link
Member

If you're not able to make those changes, I don't mind merging this and making them myself.

@jordanbtucker
Copy link
Member

Also, I think I might rebase this against the v3 branch instead of main.

@romgrk
Copy link
Author

romgrk commented Oct 16, 2022

Everything is good except for the the hex numbers. It would be great if you could refactor them as constants instead of using block comments to identify them.

Something like this?

const C_0 = 0x30
const C_1 = 0x31
// ...

@jordanbtucker
Copy link
Member

Yeah, that works. Maybe use CHAR_0 instead though, just to make it more clear that these represent characters?

@romgrk
Copy link
Author

romgrk commented Nov 30, 2022

I've been trying to find time for this but I've been super busy with work. If you're ok with merging as-is I think it's going to be faster. I've rebased on master, do you also need it rebased on v3?

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

romgrk:perf-parse

@mbehr1
Copy link

mbehr1 commented Apr 15, 2023

any update here? Imho those improvements would be quite useful.

@romgrk
Copy link
Author

romgrk commented Jun 11, 2023

Ping @jordanbtucker, if you can list the minimum required changes to merge this, I might be able to complete it soon.

@niczak
Copy link

niczak commented Apr 20, 2024

Somewhat criminal this branch was never merged.

@romgrk
Copy link
Author

romgrk commented Apr 21, 2024

Open-source maintainers are free to merge or not, performance might not be a priority for this project. Merging this PR also means the maintainer(s) would need to support these changes, which do decrease readability a bit :|

Note that from what I remember while benchmarking this PR, even with these changes the native JSON.parse is something like 20x faster so anyone using this JSON5 parser should be using it because readability is more important for their use-case than performance. If you need to pass a lot of data around, use standard JSON. I would find it nice to add a note on the readme about the performance characteristics of the project though, to avoid people using it in cases where it wouldn't be a good fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants