Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Optimize proxify and stringDistance #1098

Merged
merged 1 commit into from Jan 7, 2018

Conversation

sophiebits
Copy link
Contributor

  • Fill 2D array with ints upfront to reduce property access cost
  • Change from recursive to simpler iterative (DP) solution
  • Add cap parameter to stringDistanceCapped to limit computation
  • Make candidate generation use a simple loop to avoid allocating unnecessary arrays and to call stringDistance only once on each pair of strings instead of every time in the sort callback

This improves chai perf by about 13% on @bmeurer's https://github.com/v8/web-tooling-benchmark.

- Fill 2D array with ints upfront to reduce property access cost
- Change from recursive to simpler iterative (DP) solution
- Add cap parameter to stringDistanceCapped to limit computation
- Make candidate generation use a simple loop to avoid allocating unnecessary arrays and to call stringDistance only once on each pair of strings instead of every time in the sort callback

This improves chai perf by about 13% on @bmeurer's https://github.com/v8/web-tooling-benchmark.
@sophiebits sophiebits requested a review from a team as a code owner December 2, 2017 22:52
@codecov
Copy link

codecov bot commented Dec 2, 2017

Codecov Report

Merging #1098 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1098      +/-   ##
==========================================
+ Coverage   93.61%   93.64%   +0.03%     
==========================================
  Files          32       32              
  Lines        1628     1637       +9     
  Branches      393      393              
==========================================
+ Hits         1524     1533       +9     
  Misses        104      104
Impacted Files Coverage Δ
lib/chai/utils/proxify.js 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eae99d1...5467f07. Read the comment docs.

Copy link
Contributor

@bmeurer bmeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising! I was actually thinking of using this algorithm instead which requires fewer temporary arrays. Have you considered/tested that as well?

@sophiebits
Copy link
Contributor Author

I didn't try that but it makes sense. I did try hoisting the 2D array as a static 41x41 array (and bailing out on names longer than 40 chars) and it didn't seem to help much. Not sure where the bottleneck is now.

// memo[i][j] is the distance between strA.slice(0, i) and
// strB.slice(0, j).
for (var i = 0; i <= strA.length; i++) {
memo[i] = Array(strB.length + 1).fill(0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using new Array(…) instead of Array(…). The [[Call]] syntax lacks elements kind tracking in V8 and is therefore less efficient.

@bmeurer
Copy link
Contributor

bmeurer commented Dec 3, 2017

In this case it doesn't matter, since the arrays contain only small integers.

memo[i] = [];
}
function stringDistanceCapped(strA, strB, cap) {
if (Math.abs(strA.length - strB.length) >= cap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good catch 👍

@lucasfcosta
Copy link
Member

lucasfcosta commented Jan 3, 2018

Whoah! I'm really glad that we have so many awesome contributors that are also performance specialists.

I've taken a look through the code and the logic LGTM but since I don't feel like I could give any other valuable insights when it comes to performance I'll be happy to review this again when you are happy with it.

Thanks everyone for your collaboration 😄

@sophiebits
Copy link
Contributor Author

I don't think I'll get a chance to look back at this so feel free to merge! Making it 2 x n and replacing 1 - i for i - 1 should work though if you want to test that yourself.

Copy link
Contributor

@meeber meeber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sophiebits Thanks for contributing! This LGTM. Let's merge. Any future perf improvements can come in follow-up PRs.

@keithamus keithamus merged commit c02f64b into chaijs:master Jan 7, 2018
@bmeurer
Copy link
Contributor

bmeurer commented Jan 8, 2018

Thanks a lot @sophiebits and @keithamus! Excited to see this happening. Anyone interested to explore the memory efficient variant of the Levenshtein algorithm? That should give another nice boost. 👍

@not-an-aardvark
Copy link
Contributor

This improvement looks great, although I'm confused about why it's affecting performance so much -- ostensibly the stringDistance function only gets called when an invalid property has been accessed, which would usually be a result of programmer error. Is there a reason that stringDistance ends up getting called in the benchmark?

@bmeurer
Copy link
Contributor

bmeurer commented Jan 8, 2018

The chai test in the web-tooling-benchmark was derived from the chai test suite itself. So it tests the chai library using it's own assert/expect machinery. See the chai-benchmark.js file for the exact test cases.

@meeber
Copy link
Contributor

meeber commented Jan 8, 2018

stringDistance is being called in the benchmark due to the "proxify" tests. Search "pizza" in the benchmark file for all occurrences of this. As definitely-an-aardvark suggested, those tests are centered around programmer error, most likely hunger-induced. Although delicious, the pizza tests have the side-effect of causing the benchmarks to overstate the impact of perf improvements in the stringDistance function on a typical Chai workload. I don't normally advocate for the removal of pizza, but it may be warranted in the case of benchmarks.

@bmeurer
Copy link
Contributor

bmeurer commented Jan 8, 2018

@meeber But those are only 7 tests?

@meeber
Copy link
Contributor

meeber commented Jan 9, 2018

@bmeurer Yup, just verified those are the only tests calling stringDistance. Based on an admittedly small sample size, removing those tests seems to have a noticeable impact on the benchmark in Chai 4.1.2, but not very noticeable in the Chai master branch (which suggests that these perf improvements did their job!) Either way, those tests involve a code path that's only hit when a developer writes a non-existent assertion (e.g., expect(true).to.be.ture), so they're not really meaningful in regard to benchmarking actual Chai workloads. On the other hand, faster is better in every code path, and it's valuable to know if some change to the JavaScript engine negatively impacts performance in even a rarely hit code path.

@bmeurer
Copy link
Contributor

bmeurer commented Jan 9, 2018

@meeber Thanks for the explanation. Interesting to see that it made such a big difference. I'll think we still keep the 7 tests following your reasoning, especially since in the near future when we roll a new version of Chai, the impact will be less noticeable.

@keithamus
Copy link
Member

Yes I suppose it is worth noting - on a typical users test suite, passing tests won't hit these code paths. This likely makes most of a difference to a users development machine, where they'll see this paths hit from typos.

Recently I've been considering how we can improve test times for passing tests, so that tests in typical workloads (e.g. CI) run as quick as possible. One example of slowdown that is instric to the architecture of chai at current - is that we format error messages even for passing tests.

@bmeurer
Copy link
Contributor

bmeurer commented Jan 9, 2018

@keithamus This is a good point. We've observed that error message formatting is indeed quite significant for chai.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants