Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance greatly #337

Merged
merged 12 commits into from Jul 12, 2019
Merged

Improve performance greatly #337

merged 12 commits into from Jul 12, 2019

Conversation

stroncium
Copy link
Contributor

@stroncium stroncium commented Mar 27, 2019

Fixes #333

Benchmark:

  • updated to reflect essential use cases and performance eaters
  • NOTE: benchmark framework reports incorrect high number for first bench on some runs, reason unknown

Based on my own usage of chalk, I select KPM:

  • cached: average of cached: 1 style and cached: 2 styles metrics
  • non-cached: average of 1 style and 2 styles metrics
  • total: average of all metrics

KPM increase:

  • cached: x23
  • non-cached: x129
  • total: x24

Changes(incrementally increasing preformance):

  • replaced regexp replaces by indexOf-slice implementation
  • streamlined builder instantiation
  • moved argument handling out of hot zone and optimized
  • removed all regexp stuff and escape-string-regexp dependency
  • replaced arrays by linked lists
  • precomputed chains of open/close codes
  • close code replace not applied to codes inserted around CRLF, it can change output in some cases, but doesnt change effective styles
  • added fastpath for do no close code replaces if there is no escape sequences in source string
  • cached simple getters results(in getters this)

Original(with new benchmark) metrics:

         269,472 op/s » 1 style
         130,959 op/s » 2 styles
          86,590 op/s » 3 styles
       1,407,114 op/s » cached: 1 style
         749,141 op/s » cached: 2 styles
         502,693 op/s » cached: 3 styles
       1,245,477 op/s » cached: 1 style with newline
       1,386,202 op/s » cached: 1 style nested intersecting
       1,368,144 op/s » cached: 1 style nested non-intersecting

Original KPM:

  • cached: 1078 kop/s
  • non-cached: 200 kop/s
  • total: 794 kop/s

Result metrics:

      29,685,124 op/s » 1 style
      21,955,435 op/s » 2 styles
      18,687,591 op/s » 3 styles
      26,847,185 op/s » cached: 1 style
      23,004,420 op/s » cached: 2 styles
      18,772,011 op/s » cached: 3 styles
      12,552,691 op/s » cached: 1 style with newline
       5,585,706 op/s » cached: 1 style nested intersecting
      15,220,157 op/s » cached: 1 style nested non-intersecting

Result KPM:

  • cached: 24925 kop/s
  • non-cached: 25820 kop/s
  • total: 19145 kop/s

@stroncium
Copy link
Contributor Author

stroncium commented Mar 27, 2019

cached: 1 style nested intersecting might be a way to improve the speed further. It hits replaces of close codes with open codes, which thrashes strings a lot.

@stroncium
Copy link
Contributor Author

Originally, api allowing multiple arguments was killing performance quite a lot, but with moving it out of hot path and providing fast-path for single argument it doesn't seem to be an issue anymore, I didn't look at JIT info, but my guess would be with root function being so small it just gets inlined most times.

@stroncium
Copy link
Contributor Author

stroncium commented Mar 27, 2019

Another insight is that for full benchmark, time is spent:

  • 6.9% + sum of some small numbers = probably around 15% on benchmark own code
  • 8.3% + 6.5% = 14.8% on operations related to setting prototypes
  • 10.4% on unoptimized indexOf(which currently happens when replacing close codes)
  • 7.2% + 2.7% = 9.9% on traversing prototypes
  • 5.4% on optimized indexOf, 6.5% on string concatenation

Which shows us that:

  • using better benchmark framework might help with obtaining clearer results
  • caching might not work entirely as planned(which is what I want to check next)
  • replacing close codes might need more work, small state machine for that could probably speed stuff up a lot, hugely increasing performance in 1 style nested non-intersecting and especially in 1 style nested intersecting
  • complex prototypes might deoptimize cache retrieval from objects, so it might be effective to precompute low nesting level builders and deploy them without using complex prototype, also proxy stuff might be solution too(but I didn't test it yet)
  • at least 12% of time is spent on operations we MUST perform (Which is in line with my reference tests which show that performing checks and generating resulting strings for simple 1 style application is roughly 8 times faster than what is done by chalk. For the record, it means that simple case handling was more than 1000 times slower than close to ideal implementation for original code.), which is quite good result, though some of this time might be spent on repetitive operations.

@stroncium
Copy link
Contributor Author

Plus, another important thing is that not about benchmark framework. First of all, it doesn't perform dry-run, so on your normal PC it first benches show weird results in any case. Second, it seem to make this mistake I mentioned from the beginning with just plain incorrect calculation of first bench sometimes.

But having played with manual dry runs, I can say that performance difference between cached and non-cached versions for simple styling(complex styles like RGB aren't cached at all at the moment), is usually <2%, so for most use cases caching won't be needed at all anymore.

@stroncium
Copy link
Contributor Author

stroncium commented Mar 27, 2019

Providing shorthands like redBold(and using them) is supposed to increase performance by ~20%, and it's also easier to type, so I'd happily use them.

@sindresorhus
Copy link
Member

Providing shorthands like redBold(and using them) is supposed to increase performance by ~20%, and it's also easier to type, so I'd happily use them.

Not sure whether you mean internally or externally. My intention was to just optimize for common use-cases internally, not to expose properties like redBold.

@sindresorhus
Copy link
Member

complex prototypes might deoptimize cache retrieval from objects, so it might be effective to precompute low nesting level builders and deploy them without using complex prototype, also proxy stuff might be solution too(but I didn't test it yet)

Yes, from experience, V8 is not very good at optimizing complicated prototypes.

index.js Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
@sindresorhus
Copy link
Member

This is really awesome! 🙌

Are there any more optimization opportunities you can think of? Or are you done?

@sindresorhus
Copy link
Member

Can you give the PR a proper descriptive title?

@Qix-
Copy link
Member

Qix- commented May 12, 2019

Really wish I had @RemindMe running. I want to devote some time to this PR.

@stroncium
Copy link
Contributor Author

stroncium commented May 19, 2019

@sindresorhus

I also want to incorporate require speed improvements in this patch, which will should also allow to get some runtime speed improvements.

It was sitting in the back of my mind, and I don't recall all my iterations and if I ruled it out, but enabled getters should still be in hot part and there is a chance it can be eliminated.

I'm already considering writing a github parser to extract common use cases, but I'm not sure I will get to it. There is a lot of questions, but one really important thing is if people frequently use nested styles as in running chalk on chalk output. If that's the case, one more interesting optimization can be applied, which is caching the last rendered line(or few) as an array of tokens to not require parsing it again(parsing/scanning takes a huge part of overall time) . But allocation of array is a really slow operation, so I have to run more benchmarks on that and think about the ways it can be done cheapest, to then decide if it can be done with such a small overhead it will be worth the speedup(we only get in a fraction of cases).

Also, I've tried to add things incrementally with checks at every step to not ruin anything, but I don't completely understand how enabled is supposed to work in non-trivial cases and I'm not sure tests cover it all. Caching might have broken something. So, I have to recheck that part and will be happy to hear some opinions/insights on that.

@sindresorhus
Copy link
Member

I also want to incorporate require speed improvements in this patch, which will should also allow to get some runtime speed improvements.

Can you do that in a follow-up PR? This PR is already large and has been open for a long time. I would like to be able to merge this soon.

@sindresorhus
Copy link
Member

So what's realistically left to do here?

@stroncium
Copy link
Contributor Author

@sindresorhus Yeah, I will leave other changes to separate PR, the core changes seem to be there already, so should be easy enough to extend on them, plus it's a huge step as it is already. Rebased, fixed style, moved utils to separate file.

@sindresorhus sindresorhus changed the title Speed Buff Lvl. 129 Improve performance greatly Jul 12, 2019
@sindresorhus sindresorhus merged commit c08417e into chalk:master Jul 12, 2019
@sindresorhus
Copy link
Member

@stroncium This looks great. I really appreciate your work on this 🙌

Regarding the require speed improvements, do you think you'll be able to get to that within a couple of weeks? I'm asking as I would like to get out Chalk 3 soon and wondering whether or not I should wait for the require speed improvements.

@stroncium
Copy link
Contributor Author

@sindresorhus I reckon I should be able to. Just got a lot of work in last weeks plus procfs lib documentation writing proved to be a bit bigger task than anticipated(at 20-40 pages of documentation atm depending on how you look at it lol), but the latter is almost finished and I actually hope to make major release today or tomorrow. After that, I'm getting back to my backlog

@sindresorhus
Copy link
Member

Ok. Hehe. No worries.

farnabaz pushed a commit to farnabaz/chalk that referenced this pull request Jul 12, 2019
@sindresorhus
Copy link
Member

@stroncium FYI, it's finally out: https://twitter.com/sindresorhus/status/1193065505780727808 Thanks again for all your work on the performance improvements. 👌

@aminya
Copy link

aminya commented Jan 8, 2021

@stroncium Looking at this a year later, something took my attention. 😄

You have written replaced arrays by linked lists as a performance improvement change. FYI, a linked list is horrible in terms of performance when compared to dense data structures. I could not find the change you are referring to, but I just wanted to mention this. Sorry, if I am too late to the party.

This video goes into the details:
CppCon 2014_ Chandler Carruth Efficiency with Algorithms, Performance with Data Structures 35-2 screenshot

@Qix-
Copy link
Member

Qix- commented Jan 8, 2021

That's a blanket statement and not very constructive. Linked lists are useful and can very much be performant depending on how they're used. Otherwise, nobody would use them had they no use.

If you have a criticism of a specific piece of code, please go into detail. Pointing someone to an absolutist video about data structures when clearly the OP knows a bit of what they're doing can be seen as condescending. Further, it doesn't provoke conversation.

@stroncium
Copy link
Contributor Author

@aminya Apart from what have been said, I'm really curios why did you place this critics here seemingly out of random when there are whole languages where there is no arrays of any kind at all traditionally(hey Python, hey Lisp!). Anyway, going back to the very video you posted, I'd advise you to watch it first, because it alone describes half of why linked lists are appliable in this case in general. But speaking about programming in javascript specifically, linked list argument is also invalid in most other cases as long as you use any kind of objects(which are objects as in pointers to vtable headed structures and not structures itself).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve runtime performance
4 participants