Improve performance greatly #337

stroncium · 2019-03-27T06:54:33Z

Fixes #333

Benchmark:

updated to reflect essential use cases and performance eaters
NOTE: benchmark framework reports incorrect high number for first bench on some runs, reason unknown

Based on my own usage of chalk, I select KPM:

cached: average of cached: 1 style and cached: 2 styles metrics
non-cached: average of 1 style and 2 styles metrics
total: average of all metrics

KPM increase:

cached: x23
non-cached: x129
total: x24

Changes(incrementally increasing preformance):

replaced regexp replaces by indexOf-slice implementation
streamlined builder instantiation
moved argument handling out of hot zone and optimized
removed all regexp stuff and escape-string-regexp dependency
replaced arrays by linked lists
precomputed chains of open/close codes
close code replace not applied to codes inserted around CRLF, it can change output in some cases, but doesnt change effective styles
added fastpath for do no close code replaces if there is no escape sequences in source string
cached simple getters results(in getters this)

Original(with new benchmark) metrics:

         269,472 op/s » 1 style
         130,959 op/s » 2 styles
          86,590 op/s » 3 styles
       1,407,114 op/s » cached: 1 style
         749,141 op/s » cached: 2 styles
         502,693 op/s » cached: 3 styles
       1,245,477 op/s » cached: 1 style with newline
       1,386,202 op/s » cached: 1 style nested intersecting
       1,368,144 op/s » cached: 1 style nested non-intersecting

Original KPM:

cached: 1078 kop/s
non-cached: 200 kop/s
total: 794 kop/s

Result metrics:

      29,685,124 op/s » 1 style
      21,955,435 op/s » 2 styles
      18,687,591 op/s » 3 styles
      26,847,185 op/s » cached: 1 style
      23,004,420 op/s » cached: 2 styles
      18,772,011 op/s » cached: 3 styles
      12,552,691 op/s » cached: 1 style with newline
       5,585,706 op/s » cached: 1 style nested intersecting
      15,220,157 op/s » cached: 1 style nested non-intersecting

Result KPM:

cached: 24925 kop/s
non-cached: 25820 kop/s
total: 19145 kop/s

stroncium · 2019-03-27T06:57:02Z

cached: 1 style nested intersecting might be a way to improve the speed further. It hits replaces of close codes with open codes, which thrashes strings a lot.

stroncium · 2019-03-27T07:07:21Z

Originally, api allowing multiple arguments was killing performance quite a lot, but with moving it out of hot path and providing fast-path for single argument it doesn't seem to be an issue anymore, I didn't look at JIT info, but my guess would be with root function being so small it just gets inlined most times.

stroncium · 2019-03-27T09:08:54Z

Another insight is that for full benchmark, time is spent:

6.9% + sum of some small numbers = probably around 15% on benchmark own code
8.3% + 6.5% = 14.8% on operations related to setting prototypes
10.4% on unoptimized indexOf(which currently happens when replacing close codes)
7.2% + 2.7% = 9.9% on traversing prototypes
5.4% on optimized indexOf, 6.5% on string concatenation

Which shows us that:

using better benchmark framework might help with obtaining clearer results
caching might not work entirely as planned(which is what I want to check next)
replacing close codes might need more work, small state machine for that could probably speed stuff up a lot, hugely increasing performance in 1 style nested non-intersecting and especially in 1 style nested intersecting
complex prototypes might deoptimize cache retrieval from objects, so it might be effective to precompute low nesting level builders and deploy them without using complex prototype, also proxy stuff might be solution too(but I didn't test it yet)
at least 12% of time is spent on operations we MUST perform (Which is in line with my reference tests which show that performing checks and generating resulting strings for simple 1 style application is roughly 8 times faster than what is done by chalk. For the record, it means that simple case handling was more than 1000 times slower than close to ideal implementation for original code.), which is quite good result, though some of this time might be spent on repetitive operations.

stroncium · 2019-03-27T09:19:54Z

Plus, another important thing is that not about benchmark framework. First of all, it doesn't perform dry-run, so on your normal PC it first benches show weird results in any case. Second, it seem to make this mistake I mentioned from the beginning with just plain incorrect calculation of first bench sometimes.

But having played with manual dry runs, I can say that performance difference between cached and non-cached versions for simple styling(complex styles like RGB aren't cached at all at the moment), is usually <2%, so for most use cases caching won't be needed at all anymore.

stroncium · 2019-03-27T09:42:34Z

Providing shorthands like redBold(and using them) is supposed to increase performance by ~20%, and it's also easier to type, so I'd happily use them.

sindresorhus · 2019-05-12T08:36:35Z

Providing shorthands like redBold(and using them) is supposed to increase performance by ~20%, and it's also easier to type, so I'd happily use them.

Not sure whether you mean internally or externally. My intention was to just optimize for common use-cases internally, not to expose properties like redBold.

sindresorhus · 2019-05-12T08:37:28Z

complex prototypes might deoptimize cache retrieval from objects, so it might be effective to precompute low nesting level builders and deploy them without using complex prototype, also proxy stuff might be solution too(but I didn't test it yet)

Yes, from experience, V8 is not very good at optimizing complicated prototypes.

index.js

sindresorhus · 2019-05-12T08:38:27Z

This is really awesome! 🙌

Are there any more optimization opportunities you can think of? Or are you done?

sindresorhus · 2019-05-12T08:38:36Z

Can you give the PR a proper descriptive title?

Qix- · 2019-05-12T12:43:49Z

Really wish I had @RemindMe running. I want to devote some time to this PR.

stroncium · 2019-05-19T00:43:45Z

@sindresorhus

I also want to incorporate require speed improvements in this patch, which will should also allow to get some runtime speed improvements.

It was sitting in the back of my mind, and I don't recall all my iterations and if I ruled it out, but enabled getters should still be in hot part and there is a chance it can be eliminated.

I'm already considering writing a github parser to extract common use cases, but I'm not sure I will get to it. There is a lot of questions, but one really important thing is if people frequently use nested styles as in running chalk on chalk output. If that's the case, one more interesting optimization can be applied, which is caching the last rendered line(or few) as an array of tokens to not require parsing it again(parsing/scanning takes a huge part of overall time) . But allocation of array is a really slow operation, so I have to run more benchmarks on that and think about the ways it can be done cheapest, to then decide if it can be done with such a small overhead it will be worth the speedup(we only get in a fraction of cases).

Also, I've tried to add things incrementally with checks at every step to not ruin anything, but I don't completely understand how enabled is supposed to work in non-trivial cases and I'm not sure tests cover it all. Caching might have broken something. So, I have to recheck that part and will be happy to hear some opinions/insights on that.

sindresorhus · 2019-06-22T16:43:04Z

I also want to incorporate require speed improvements in this patch, which will should also allow to get some runtime speed improvements.

Can you do that in a follow-up PR? This PR is already large and has been open for a long time. I would like to be able to merge this soon.

sindresorhus · 2019-06-22T16:43:26Z

So what's realistically left to do here?

stroncium · 2019-06-29T10:47:28Z

@sindresorhus Yeah, I will leave other changes to separate PR, the core changes seem to be there already, so should be easy enough to extend on them, plus it's a huge step as it is already. Rebased, fixed style, moved utils to separate file.

sindresorhus · 2019-07-12T06:41:22Z

@stroncium This looks great. I really appreciate your work on this 🙌

Regarding the require speed improvements, do you think you'll be able to get to that within a couple of weeks? I'm asking as I would like to get out Chalk 3 soon and wondering whether or not I should wait for the require speed improvements.

stroncium · 2019-07-12T13:28:39Z

@sindresorhus I reckon I should be able to. Just got a lot of work in last weeks plus procfs lib documentation writing proved to be a bit bigger task than anticipated(at 20-40 pages of documentation atm depending on how you look at it lol), but the latter is almost finished and I actually hope to make major release today or tomorrow. After that, I'm getting back to my backlog

sindresorhus · 2019-07-12T16:24:15Z

Ok. Hehe. No worries.

sindresorhus · 2019-11-09T07:22:13Z

@stroncium FYI, it's finally out: https://twitter.com/sindresorhus/status/1193065505780727808 Thanks again for all your work on the performance improvements. 👌

aminya · 2021-01-08T21:53:26Z

@stroncium Looking at this a year later, something took my attention. 😄

You have written replaced arrays by linked lists as a performance improvement change. FYI, a linked list is horrible in terms of performance when compared to dense data structures. I could not find the change you are referring to, but I just wanted to mention this. Sorry, if I am too late to the party.

This video goes into the details:

Qix- · 2021-01-08T23:35:56Z

That's a blanket statement and not very constructive. Linked lists are useful and can very much be performant depending on how they're used. Otherwise, nobody would use them had they no use.

If you have a criticism of a specific piece of code, please go into detail. Pointing someone to an absolutist video about data structures when clearly the OP knows a bit of what they're doing can be seen as condescending. Further, it doesn't provoke conversation.

stroncium · 2021-01-09T16:50:17Z

@aminya Apart from what have been said, I'm really curios why did you place this critics here seemingly out of random when there are whole languages where there is no arrays of any kind at all traditionally(hey Python, hey Lisp!). Anyway, going back to the very video you posted, I'd advise you to watch it first, because it alone describes half of why linked lists are appliable in this case in general. But speaking about programming in javascript specifically, linked list argument is also invalid in most other cases as long as you use any kind of objects(which are objects as in pointers to vtable headed structures and not structures itself).

sindresorhus requested changes May 12, 2019

View reviewed changes

index.js Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

stroncium added 12 commits June 29, 2019 13:42

wider benchmark

1fb6daf

improved replaces, builder instantiation

b251c00

optimized arguments handling, moved out of hot zone

091e8ed

removed all regexp stuff

42d45ba

replaced arrays by linked lists

efab07c

relaxed loops

fb5da89

precomputed codes, fastpath for no-lf/no-escape strings

2003737

caching simple getters

cbc9f16

added test for crlf

5f0835a

comment

f58b05c

style

e63d61d

fixed examples/screenshot.js style

0c671f5

stroncium force-pushed the speedbuff branch from 3ba882b to 0c671f5 Compare June 29, 2019 10:45

sindresorhus changed the title ~~Speed Buff Lvl. 129~~ Improve performance greatly Jul 12, 2019

sindresorhus merged commit c08417e into chalk:master Jul 12, 2019

farnabaz pushed a commit to farnabaz/chalk that referenced this pull request Jul 12, 2019

Improve performance greatly (chalk#337)

5fc53df

snyk-bot mentioned this pull request Nov 20, 2019

[Snyk] Upgrade chalk from 2.4.2 to 3.0.0 omrilotan/finjan#2

Merged

This was referenced Dec 7, 2019

[Snyk] Upgrade chalk from 1.1.3 to 3.0.0 lirantal/operations-orchestration-crontab#16

Merged

[Snyk] Upgrade chalk from 1.1.3 to 3.0.0 lirantal/operations-orchestration-backup#9

Merged

Leko mentioned this pull request Dec 31, 2019

Update chalk to the latest version Leko/hothouse#517

Open

This was referenced Jan 7, 2020

Update chalk to the latest version Leko/hothouse#554

Open

Update chalk to the latest version Leko/hothouse#567

Open

This was referenced Jan 21, 2020

Update chalk to the latest version Leko/hothouse#601

Open

Update chalk to the latest version Leko/hothouse#626

Open

This was referenced Feb 4, 2020

Update chalk to the latest version Leko/hothouse#663

Open

Update chalk to the latest version Leko/hothouse#686

Open

Leko mentioned this pull request Feb 18, 2020

Update chalk to the latest version Leko/hothouse#695

Open

Leko mentioned this pull request Feb 25, 2020

Update chalk to the latest version Leko/hothouse#739

Open

snyk-bot mentioned this pull request Mar 8, 2020

[Snyk] Upgrade chalk from 0.4.0 to 3.0.0 tjenkinson/generator-clappr-plugin#3

Open

aminya mentioned this pull request Jan 8, 2021

Bump parcel cli dependencies parcel-bundler/parcel#5615

Merged

3 tasks

This was referenced Mar 11, 2021

Bump chalk from 2.4.2 to 4.1.0 Xenfo/maintainer#151

Merged

Bump chalk from 2.4.2 to 4.1.0 Kerron-Hutton/lambda-manager#17

Closed

atjn mentioned this pull request Jun 30, 2021

Update Chalk anc95/inquirer-file-tree-selection#32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance greatly #337

Improve performance greatly #337

stroncium commented Mar 27, 2019 •

edited by sindresorhus

stroncium commented Mar 27, 2019 •

edited

stroncium commented Mar 27, 2019

stroncium commented Mar 27, 2019 •

edited

stroncium commented Mar 27, 2019

stroncium commented Mar 27, 2019 •

edited

sindresorhus commented May 12, 2019

sindresorhus commented May 12, 2019

sindresorhus commented May 12, 2019

sindresorhus commented May 12, 2019

Qix- commented May 12, 2019

stroncium commented May 19, 2019 •

edited

sindresorhus commented Jun 22, 2019

sindresorhus commented Jun 22, 2019

stroncium commented Jun 29, 2019

sindresorhus commented Jul 12, 2019

stroncium commented Jul 12, 2019

sindresorhus commented Jul 12, 2019

sindresorhus commented Nov 9, 2019

aminya commented Jan 8, 2021 •

edited

Qix- commented Jan 8, 2021 •

edited

stroncium commented Jan 9, 2021

Improve performance greatly #337

Improve performance greatly #337

Conversation

stroncium commented Mar 27, 2019 • edited by sindresorhus

stroncium commented Mar 27, 2019 • edited

stroncium commented Mar 27, 2019

stroncium commented Mar 27, 2019 • edited

stroncium commented Mar 27, 2019

stroncium commented Mar 27, 2019 • edited

sindresorhus commented May 12, 2019

sindresorhus commented May 12, 2019

sindresorhus commented May 12, 2019

sindresorhus commented May 12, 2019

Qix- commented May 12, 2019

stroncium commented May 19, 2019 • edited

sindresorhus commented Jun 22, 2019

sindresorhus commented Jun 22, 2019

stroncium commented Jun 29, 2019

sindresorhus commented Jul 12, 2019

stroncium commented Jul 12, 2019

sindresorhus commented Jul 12, 2019

sindresorhus commented Nov 9, 2019

aminya commented Jan 8, 2021 • edited

Qix- commented Jan 8, 2021 • edited

stroncium commented Jan 9, 2021

stroncium commented Mar 27, 2019 •

edited by sindresorhus

stroncium commented Mar 27, 2019 •

edited

stroncium commented Mar 27, 2019 •

edited

stroncium commented Mar 27, 2019 •

edited

stroncium commented May 19, 2019 •

edited

aminya commented Jan 8, 2021 •

edited

Qix- commented Jan 8, 2021 •

edited