Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activity.setStatus causes a massive performance hit during build (1s slower per 1.000 queries) #17452

Closed
georgiee opened this issue Sep 6, 2019 · 3 comments · Fixed by #19866

Comments

@georgiee
Copy link
Contributor

georgiee commented Sep 6, 2019

Summary

activity.setStatus which is used by gatsby during the build to report the page query and html page building progress introduces a performance hit in the dimension of 1s for every 1000 queries. It's like the status call costs exactly 1ms per call. This heavily builds up in environments with 10k's of queries.

Demonstration:
https://github.com/georgiee/gatsby/tree/performance-test-activity/benchmarks/query

(run on my machine, 2,7 GHz Intel Core i7 13 macbook 2018)

Description

I observed a rather dramatic performance difference while using gatsby's activity tool in our internal source plugin where we process 10-20k pages at the moment. First we thought it's caused by the code block but soon we identified the status update as the cause.


exports.sourceNodes = ({ reporter }) => {
  const activity = reporter.activityTimer(`foo`);
  activity.start();

  const count = 10000;
  for(let i = 0; i < count; i++) {
    activity.setStatus('bar' + i);
  }

  activity.end();
}

This will take 8.5s on my machine during the build and is doing nothing. From this point I checked on the page query and static html output as those two things also rely on the status update.

Here the results for 10.000 pages/queries. The pages are almost empty, there are actually no page queries to run but still being processed. .cache is always cleared.

1. With activity updates

Queries Static HTML
12.476 s 4.170 s
13.538 s 3.981 s
13.190 s 4.802 s
12.780 s 4.976 s

here how it looks like (excerpt). You can see the status being updated. Each update costs around 1ms.
on

2. Without activity updates
(disabled in gatsby core, behind a new flag)

Queries Static HTML
4.181 s 4.236 s
4.289 s 4.518 s
4.262 s 4.327 s
4.158 s 4.051 s

Here how it looks like (excerpt). No more status updates.
off

You can see how much faster the queries are running now. The page process is also a little bit faster but their update frequency is larger so the impact is smaller (0.5-1s faster, comapred to 8-9s faster queries).

This scales almost linear. 20.000 queries causes almost an overhead of 20 seconds.

Example/Reproduction

Here is the modified Gatsby Project. I used the benchmark project and synced it with gatsby-dev-cli.

https://github.com/georgiee/gatsby/tree/performance-test-activity/benchmarks/query

Solution/Workarounds

  1. Let me silence the gatsby build output entirely (I couldn't find any option https://www.gatsbyjs.org/docs/gatsby-cli/#options-1)
  2. Let me choose to display progress updates or not
  3. Like with the pages update in larger batches not per query.

Edit/Appendix

This has also an impact on gatsby develop (startup, refresh if enabled) of course as the output is the same for the page queries runner.

@georgiee georgiee changed the title activity.setStatus causes a massive performance hit during build (1s slower per 10.000 queries) activity.setStatus causes a massive performance hit during build (1s slower per 1.000 queries) Sep 6, 2019
@sidharthachatterjee
Copy link
Contributor

Thanks for the detailed issue! This is known issue and is being tracked upstream in vadimdemedes/ink#212

Also in #15505

@georgiee
Copy link
Contributor Author

Hello thanks for those useful references and the upstream issue. My initial idea to turn off status reports entirely with an option would get rid of the Ink dependency and clean up the logs.

Is this even an option as I don't find any signs of different log output levels in the sources nor do people request that feature. I found the issue with the number 3 from Kyle #3 which was open for quite some time but never implemented.

Should we leave this issue open or close it as a duplicate of #15505 ? Thanks for taking your time, very much appreciated!

@wardpeet
Copy link
Contributor

thanks @georgiee i'll close this as a duplicate of #15505, let's move the discussion to that thread. This issue will still be linked so we don't lose any information.

Thanks a ton you for the awesome write-down. This is really great.

pvdz added a commit that referenced this issue Nov 28, 2019
…his way

This change explicitly throttles the progress bar updates (`tick` and `total`) to be clamped to 10 fps max.

This drops the govbook benchmark from 210 seconds down to 140 seconds.

Fixes #15505
Fixes #17452
Fixes #17966
Fixes #18801

Relates to #17873
Relates to vadimdemedes/ink#212
gatsbybot pushed a commit that referenced this issue Nov 28, 2019
…19866)

* perf(gatsby-cli): throttle progress bar, build significantly faster this way

This change explicitly throttles the progress bar updates (`tick` and `total`) to be clamped to 10 fps max.

This drops the govbook benchmark from 210 seconds down to 140 seconds.

Fixes #15505
Fixes #17452
Fixes #17966
Fixes #18801

Relates to #17873
Relates to vadimdemedes/ink#212

* Force flush on done() and use local var for total
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants