Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: New CSS selector engine for jsdom #3644

Merged
merged 5 commits into from
Jan 7, 2024

Conversation

asamuzaK
Copy link
Contributor

nwsapi has been working great so far, and I appreciate the developer's hard work, but nwsapi is unable to support Selectors Level 4 properly at the moment.
You can find many selector-related bug reports in the jsdom issues.

Therefore, I recommend changing to a different CSS selector engine dom-selector.
It supports many features of the Selectors Level 4 and also the Shadow Hosts.
Ref Supported CSS selectors

Benchmark results

npm run benchmark
API nwsapi dom-selector
closest() jsdom x 126,133 ops/sec ±0.75% (73 runs sampled) jsdom x 7,449 ops/sec ±13.28% (64 runs sampled)
matches() jsdom x 120,369 ops/sec ±8.31% (65 runs sampled) jsdom x 8,507 ops/sec ±7.84% (60 runs sampled)
querySelector() jsdom x 3.12 ops/sec ±2.60% (12 runs sampled) jsdom x 14.72 ops/sec ±1.94% (35 runs sampled)
querySelectorAll() jsdom x 2.93 ops/sec ±1.52% (11 runs sampled) jsdom x 12.41 ops/sec ±2.47% (31 runs sampled)

For closest() and matches(), nwsapi is about 15 times faster, but in reality, dom-selector processed in 0.11 to 0.13 msec, so I don't think it's extremely slow.
Rather, it is querySelector() and querySelectorAll() that often have speed performance issues, and in this benchmark, dom-selector is about 4 times faster than nwsapi.

@asamuzaK asamuzaK marked this pull request as draft December 23, 2023 15:14
@asamuzaK asamuzaK marked this pull request as ready for review December 23, 2023 15:37
@domenic
Copy link
Member

domenic commented Dec 27, 2023

Hi! Thank you for all your work here. I think we should merge this, and I agree that querySelector and querySelectorAll are more important than matches and closest. Here are some things I'd appreciate from you before doing so, if you are able to make the time:

  • Can you add a large-scale benchmark? In particular, let's run against this page with these selectors. We should commit those files to the repo, and then in the benchmark setup load the page into jsdom, and load the selectors from a file and then into an in-memory array. Then we can test how long in total running all the selectors takes.

  • Do you have a plan for closing the remaining performance gap with closest and matches?

  • Would you be willing to maintain a changelog or releases page? https://github.com/asamuzaK/domSelector/releases is not informative, and https://github.com/asamuzaK/domSelector/commits/main/ has lots of "update x" commits. That makes it harder to do updates confidently in the future.

  • I see you've enabled lots of new web platform tests. This is great!! Can you share a comparison with those directories and tests enabled on jsdom's current branch? Are there any regressions, or only progressions?

@asamuzaK
Copy link
Contributor Author

* Can you add a large-scale benchmark? In particular, let's run against [this page](https://github.com/jquery/sizzle/blob/main/speed/data/selector.html) with [these selectors](https://github.com/jquery/sizzle/blob/main/speed/selectors.large.css). We should commit those files to the repo, and then in the benchmark setup load the page into jsdom, and load the selectors from a file and then into an in-memory array. Then we can test how long in total running all the selectors takes.

I'm planning to submit it as a separate PR.
And rebase it to this PR afterwards.
How's that sounds?

* Do you have a plan for closing the remaining performance gap with closest and matches?

With closest and matches, dom-selector is more than 4 times slower than happy-dom or linkeDom.
I would like to do something about it if possible, but I haven't found any clue to improve so far.

* Would you be willing to maintain a changelog or releases page?

I will create change logs.

* Can you share a comparison with those directories and tests enabled on jsdom's current branch? Are there any regressions, or only progressions?
  • ced0ecc
    This update is for preparing and not directly related to this PR.
  • 375fa4f
    53dc3bc
    Added DIR: css/selectors and DIR: css/css-scoping tests.
    Those failures are of the current main branch.
  • 47a2a74
    Test results with dom-selector.
    The deleted rows are the errors resolved by dom-selector.
    Note that some new failures have been added, but these are errors that should have been detected before but were not for some reason.
    They are basically not caused by dom-selector.

@domenic
Copy link
Member

domenic commented Dec 28, 2023

I'm planning to submit it as a separate PR.
And rebase it to this PR afterwards.
How's that sounds?

That sounds reasonable, although it'd be most helpful if we had some comparisons using the larger benchmark between nwsapi and dom-selector before merging. They don't need to be committed to the repository, but just something to check to make sure there's not a hidden significant performance issue with larger documents.

I realize that, as you discussed in #3647, the comparison is difficult due to the number of failing selectors being different between the two implementations. Maybe you could just test the subset that pass in both, and report the results?

  • 47a2a74
    Test results with dom-selector.

This is perfect. Thank you so much for doing this work.

@asamuzaK
Copy link
Contributor Author

Log of sizzle-speed with dom-selector:

# selectors/selector/querySelectorAll #
9/273 fails.
jsdom  x 0.37 ops/sec ±10.00% (5 runs sampled)

# selectors/selector/querySelectorAll only nwsapi supports #
jsdom  x 0.38 ops/sec ±5.31% (5 runs sampled)

Exception thrown by dom-selector:

[
  {
    selector: 'h1[id]:contains(Selectors)',
    message: 'Unknown pseudo-class :contains()'
  },
  {
    selector: 'div[class!=made_up]',
    message: 'Attribute selector (=, ~=, ^=, $=, *=, |=) is expected'
  },
  {
    selector: 'p:contains(selectors)',
    message: 'Unknown pseudo-class :contains()'
  },
  { selector: 'p   p, ',
    message: 'Invalid selector p   p, '
  },
  { selector: 'p   .5cm ',
    message: 'Identifier is expected'
  },
  { selector: 'p   [*=test] ',
    message: 'Vertical line is expected'
  },
  { selector: 'p  .13 ',
    message: 'Percent sign is expected'
  },
  {
    selector: 'div  p::first-child ',
    message: 'Unknown pseudo-element ::first-child'
  },
  {
    selector: 'p ..test .foo..quux .bar. ',
    message: 'Identifier is expected'
  }
]

So, nwsapi is about 3 times faster.

@domenic
Copy link
Member

domenic commented Dec 29, 2023

3x slowdown is not great... I think we should probably still proceed, since jsdom in general favors correctness over speed and the selector engine problems have been significant recently.

But, if there's anything we can do to improve performance before the next release, that would help a lot.

I made a flamegraph using 0x and uploaded it here. I have to go out to dinner now but maybe it could be helpful for you...

One thing I noticed is that sometimes you call expensive jsdom methods like the indexed or named getters on HTMLCollection. There might be faster ways to do that. (E.g. just using item() might be faster than the indexed getter, since it avoids proxy machinery? Or iterating manually might be faster than using the named getter? Or avoiding HTMLCollection instances entirely, instead operating on the original tree? Or just caching things whenever possible, to avoid touching jsdom more often than necessary?)

Like, I found this line [].slice.call(parentNode.children) in https://github.com/asamuzaK/domSelector/blob/802febbac20f1024b57bf73b66d7ce7a91f0fc74/src/js/matcher.js#L381 which seems potentially quite expensive.

@asamuzaK
Copy link
Contributor Author

asamuzaK commented Jan 3, 2024

dom-selector now uses TreeWalker instead of creating arrays from node.children, node.childNodes.
Thank you for the very useful advice.

Comparing v1.2.2...v1.2.6 · asamuzaK/domSelector

Latest benchmark results:

benchmark nwsapi dom-selector
# dom/css-selector/closest # jsdom x 126,133 ops/sec ±0.75% (73 runs sampled) jsdom x 8,078 ops/sec ±15.55% (59 runs sampled)
# dom/css-selector/matches # jsdom x 120,369 ops/sec ±8.31% (65 runs sampled) jsdom x 9,898 ops/sec ±8.28% (60 runs sampled)
# dom/css-selector/querySelector # jsdom x 3.12 ops/sec ±2.60% (12 runs sampled) jsdom x 2,527 ops/sec ±3.31% (55 runs sampled)
# dom/css-selector/querySelectorAll # jsdom x 2.93 ops/sec ±1.52% (11 runs sampled) jsdom x 11.39 ops/sec ±1.65% (29 runs sampled)
# selectors/selector/querySelectorAll # jsdom x 1.20 ops/sec ±10.21% (7 runs sampled) 18/273 fails. jsdom x 0.41 ops/sec ±5.93% (6 runs sampled) 9/273 fails.
# selectors/selector/querySelectorAll only nwsapi supports # jsdom x 1.24 ops/sec ±8.02% (7 runs sampled) jsdom x 0.42 ops/sec ±1.54% (5 runs sampled)

The performance of querySelector() is much better than before (was 14.72 ops/sec).
On the other hand, sizzle benchmark is about 10% better, but still 3x slower than nwsapi.

See domSelector#performance for other comparisons with nwsapi.

@asamuzaK asamuzaK marked this pull request as ready for review January 3, 2024 15:15
@asamuzaK asamuzaK marked this pull request as draft January 4, 2024 04:14
@asamuzaK asamuzaK marked this pull request as ready for review January 4, 2024 14:13
@asamuzaK
Copy link
Contributor Author

asamuzaK commented Jan 6, 2024

@asamuzaK asamuzaK requested a review from domenic January 6, 2024 05:49
This update simplifies the integration with jsdom's exceptions, and fixes some of the WPTs introduced in this roll.
@domenic domenic merged commit 908f27d into jsdom:main Jan 7, 2024
10 checks passed
@domenic
Copy link
Member

domenic commented Jan 7, 2024

Thank you for all your work! Although the slowdown is unfortunate, I think it is great to have a maintained selector engine with modern features.

I have tagged almost all open selectors issues as "maybe fixed (add tests to confirm)". It would be a nice bonus if you were willing to spend some time checking if there are existing WPTs covering the issue that are now passing, and leaving a comment on the issue so I can close them. Or, if there are no existing WPTs, we can work on adding them to the to-upstream WPTs. (Check out the existing files under test/web-platform-tests/tests/to-upstream/dom/nodes/ParentNode-querySelector* for examples.)

This is not necessary, but it would be helpful to me and to all the people who have previously reported various issues!

@asamuzaK
Copy link
Contributor Author

asamuzaK commented Jan 7, 2024

Although it does not follow the WPT format, issues reported on jsdom are tested on domSelector/test/jsdom-issues.test.js.
Is it not possible to closing them, for example, marking them as tested upstream?
I don't feel like rewriting all of them just for testing again...

@asamuzaK asamuzaK deleted the css-selector branch January 7, 2024 12:57
@domenic
Copy link
Member

domenic commented Jan 7, 2024

Ah, thank you for pointing to that file! It's best to have them in WPT format so that the coverage can be applied to all browsers, and so that we have tests in the jsdom repo to more easily catch regressions. But if you'd rather not do the rewriting yourself, that's not a problem. I can work on it!

@domenic
Copy link
Member

domenic commented Jan 7, 2024

In fact I think I can automate it :) https://chat.openai.com/share/3892a3b2-1c1c-4b9c-9a0f-fe3df1c317ae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants