Parallel Bootstrap #21308

aminya · 2020-09-14T12:32:48Z

Description of the change

Run the bootstrap script on different threads in parallel:
- apm installation runs in parallel to installing script/ dependencies
- using JOBS env variable node-gyp is parallelized

Benefits

Running this on a computer with multiple cores gives a good speedup.

Even in the CI with a processor with only 2 cores.

For example, on Linux, the bootstrap time is reduced from 7:30 to 3:41, which is 203% faster (2X).

Verification

The CI passes. This is tested locally as well.

Release Notes

Faster bootstrapping by using parallelization

Closes #21315

Aerijo

Overall, I disagree with the need for threading in the first place. Theoretically, there is no significant difference between this and just running the installer child processes in async. Both approaches are limited by the installers themselves; the overhead of spawning them (either way) is insignificant.

And because I wanted to backup my claims with data, I implemented the async version (to be clear: the async refers to how it is spawned; the installers are executing in parallel just as much as this PR). The following bootstrap times are what I have gathered from the last couple of runs of both PRs:

(Order: W64, W86, Mac, Linux)
Async bootstrap (#21303):

9:38, 11:34, 16:03, 4:23
11:29, 11:21, 4:54, 4:37

Parallel Bootstrap (#21308):

9:54, 11:26, 5:56, 4:30
15:22, 12:42, 5:13, 4:33

Besides the anomaly with the mac bootstrap time on the first line, which cannot reasonably be attributed to the PR itself, there is no significant difference. In fact, given how variable the CI is, the consistency is almost impressive. I did see the faster time quoted for the atomcommunity build, but I have not seen anything close to that in the CI results here.

Perhaps this approach could be shown to be appropriate for the build function (I would need to see measurements of worker threads vs async child processes first), but I conclude this is unnecessary for bootstrapping and so not worth the complexity at this time.

Edit: Just to clear up some confusion I've noticed: the async bootstrap PR is not in a state I would merge, it was a quick sketch to demonstrate the alternative approach. Appealing to diff sizes is nonsensical.

script/config.js

script/build

script/lib/install-apm.js

script/lib/install-script-runner-dependencies.js

script/bootstrap

script/script-runner/package.json

aminya · 2020-09-24T06:37:36Z

Overall, I disagree with the need for threading in the first place. Theoretically, there is no significant difference between this and just running the installer child processes in async.

Here the code runs outside of Node (chil_process), so you should not expect performance difference between multi-threading and async. The benefit of using threads is that you will have a cleaner code while having the top performance.

Comparing this to your async PR:

If you look at the code diff in the following, this is totally visible. You are doubling the amount of code that needs to be maintained.
You have copy-pasted code twice just to support both Sync and Async. This doubles the amount of code that we need to maintain. That means we should constantly be careful to keep both codes in sync with each other. This issue does not exist in the multi-threaded code, since the actual functions are not changed at all.

Multi-threaded bootstrap code diff

(by ignoring 70 lines of code which are for package-lock.json):

+85 
-22

Details

Async bootstrap code diff:

If in the future, if we use direct API from npm or apm to install the dependencies or run things (instead of spawning external process), we can get more performance benefits from using threads. Not using multi-threading, in that case, means you have to run this on the main thread.

Perhaps this approach could be shown to be appropriate for the build function

That PR is awaiting merging this. This PR includes script-runner which is directly used in the build.

P.S: I am not against using async and child_processes, but there are suitable places for that. For example, in atom-community#155 I use async functions to parallelize copy/pasting assets which is a I/O operation and runs by the operating system.

It is not used in script-runner

removes redundant config double require

use 'max' for JOBs to use all available cores nodejs/node-gyp#1770 don't set JOBS if already defined.

This reverts commit de47280.

sadick254 · 2021-06-03T18:32:46Z

Thanks for the PR!

That would definitely be a useful addition to our build process. However, it is not something that is within the scope that we have envisioned for Atom. We are not planning to make any major improvements to our build process for the time being. We are focused on fixing user-facing bugs. For that reason, I'm going to close this.

Thanks again for contributing to Atom 👍

This was referenced Sep 14, 2020

Async bootstrap #21303

Closed

Define npm_config_jobs env variable in build scripts #21315

Merged

sadick254 requested a review from Aerijo September 23, 2020 14:06

Aerijo suggested changes Sep 24, 2020

View reviewed changes

aminya force-pushed the parallel-bootstrap-4upstream branch 2 times, most recently from 1b50c61 to 117fdf9 Compare September 24, 2020 07:57

This was referenced Sep 28, 2020

Parallel transpiling (build) #21401

Closed

Parallel transpiling (build) atom-community/atom#195

Merged

aminya force-pushed the parallel-bootstrap-4upstream branch from 117fdf9 to 373c084 Compare October 14, 2020 08:46

aminya mentioned this pull request Oct 19, 2020

Parallel Clean #21431

Closed

aminya force-pushed the parallel-bootstrap-4upstream branch from 373c084 to 1fa40ca Compare October 21, 2020 04:22

aminya mentioned this pull request Nov 27, 2020

the bulid have long time #21760

Closed

aminya added 7 commits January 29, 2021 14:21

script runner (threads)

c22f653

add script-runner/package json files to cache tracker

8a26352

script runner package-lock.json

87e18b6

run installScriptDependencies in a thread

670937d

run installApm in a thread

570801f

parallelize bootstrap

dd9cdb3

execFileSync bootstrap in build script

a08324d

aminya force-pushed the parallel-bootstrap-4upstream branch from 483f829 to 7a16aba Compare January 29, 2021 20:23

aminya added 6 commits January 29, 2021 14:24

Cache script/script-runner/node_modules

3925f71

remove unnecessary env variable

682caa9

It is not used in script-runner

add option for showing version

01ad60b

print the Atom version currently being bootstrapped

8aa4c33

removes redundant config double require

set JOBS to enabled parallel builds in node-gyp

de47280

use 'max' for JOBs to use all available cores nodejs/node-gyp#1770 don't set JOBS if already defined.

Revert "set JOBS to enabled parallel builds in node-gyp"

f0d28a2

This reverts commit de47280.

aminya force-pushed the parallel-bootstrap-4upstream branch from 7a16aba to f0d28a2 Compare January 29, 2021 20:25

sadick254 closed this Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Bootstrap #21308

Parallel Bootstrap #21308

aminya commented Sep 14, 2020 •

edited

Aerijo left a comment •

edited

aminya commented Sep 24, 2020 •

edited

sadick254 commented Jun 3, 2021

Parallel Bootstrap #21308

Parallel Bootstrap #21308

Conversation

aminya commented Sep 14, 2020 • edited

Description of the change

Benefits

Verification

Release Notes

Aerijo left a comment • edited

Choose a reason for hiding this comment

aminya commented Sep 24, 2020 • edited

Multi-threaded bootstrap code diff

Async bootstrap code diff:

sadick254 commented Jun 3, 2021

aminya commented Sep 14, 2020 •

edited

Aerijo left a comment •

edited

aminya commented Sep 24, 2020 •

edited