Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lint files concurrently #15508

Closed
wants to merge 1 commit into from

Conversation

paulbrimicombe
Copy link

@paulbrimicombe paulbrimicombe commented Jan 11, 2022

added a conncurency option that sets the number of threads to be used when linting files.

fixes #3565

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[ ] Add something to the core
[x] Other, please explain:

What changes did you make? (Give an overview)

I have added a concurrency option that sets the number of worker threads used when linting files. This defaults to 1.

How the concurrency works:

  • When concurrency > 1, all of the files that are to be linted are fetched up-front.
  • A pool of concurrency worker threads is created, each containing an ESLint instance with concurrency = 1.
  • Chunks of files to lint are passed to these worker threads until all files have been linted.
  • When the workers finish, they pass the result array back to the main thread where the results are combined.

This is a fairly simple solution to check that the maintainers are happy with the general approach. I'm sure that there are lots of things that I've missed — please point these out!

Performance:

Below is a table indicating how the linting time for the eslint repository varies with concurrency. Results will probably vary considerably according to the number of files to be linted, the machine spec, the plugins used, etc.).

Concurrency Time to run eslint **/*.js (s)
1 65
2 34
3 26
4 28
5 27
6 29

This is all based on running on my local machine (8 cores). Clearly there is great benefit in concurrency > 1 but there are diminishing returns above concurrency = 3, presumably because of resource contention.

Caching:

One thing I do know is that caching doesn't quite work correctly. With the current implementation, workers read from the cache when they start linting a batch of files and then write back when the batch is finished. Any updates to the cache in the meantime (e.g. from another thread) are lost. This means that to populate the cache fully you have to run eslint multiple times. With concurrency = 5 the cache is still only ~85% filled after 10 runs.

Some options to fix this include:

  • Read the cache file contents again immediately before writing, merge the current results and then write. This decreases the window of time during which changes will be lost.
  • Say that concurrency and caching cannot be used together.
  • Refactor the cache code so that we read the cache once before we start, accumulate all of the changes in all of the different threads and then write the result at the end.

Some guidance on the best approach to take would be appreciated.

Testing:

I have added tests for the new CLI argument but I haven't added any tests specifically for concurrent linting. Ideally I'd like to run pretty much all of the tests with and without concurrency but some guidance on how that might be achieved would be welcome. I have manually adjusted the code to force a concurrency of 2 for a test run and all but three tests pass — two of these are because they spy on function interactions that are now in workers. Running the tests with concurrency is also slower than without because of the overhead of continually spawning and terminating workers.

Notes:

  • NodeJS worker threads seem to be supported in all of the NodeJS versions listed in the package.json engines field.
  • You can only pass data that can be serialised using the Structured Clone algorithm to worker threads. There are some properties of ESLintOptions that can be functions (e.g. the plugins and fix properties). When the options cannot be cloned the current implementation falls back to linting in a single thread.

Is there anything you'd like reviewers to focus on?

  • Advice on how to deal with the caching issues described above.
  • Advice on how / where to test this.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 11, 2022

CLA Signed

The committers are authorized under a signed CLA.

  • ✅ Paul Douglas Brimicombe (6cbb7ab)

@eslint-github-bot eslint-github-bot bot added triage An ESLint team member will look at this issue soon feature This change adds a new feature to ESLint labels Jan 11, 2022
added a conncurency option that sets the number of threads to be used when linting files.

fixes eslint#3565
@nzakas
Copy link
Member

nzakas commented Jan 12, 2022

Oh wow, thanks for digging into this. For changes this large, we do require an RFC be created before we consider merging. So while we can take a look at this as inspiration, we can't move forward at this point.

As it is, we are in the process of removing CLIEngine, so we don't want to be adding a lot of new functionality into it at this point. We are also working on a new config system that will change how the core works, so we really need to wait for these pieces to be finished before working on parallel linting.

@paulbrimicombe
Copy link
Author

paulbrimicombe commented Jan 13, 2022

Oh wow, thanks for digging into this. For changes this large, we do require an RFC be created before we consider merging. So while we can take a look at this as inspiration, we can't move forward at this point.

Ah. Thanks for getting in touch @nzakas . I hadn't spotted that there was an RFC in progress. Is it worth me linking this PR to the existing RFC?

As it is, we are in the process of removing CLIEngine, so we don't want to be adding a lot of new functionality into it at this point. We are also working on a new config system that will change how the core works, so we really need to wait for these pieces to be finished before working on parallel linting.

Hmm. OK. The approach taken in this PR should work with whatever changes you make provided that:

  • A list of the files to be linted can be generated (could be a generator or just an array as it is in this PR).
  • The constructor arguments for ESLint remain serialisable by Structure Clone for most users.

Given that the ESLintConcurrent class just uses ESLint but forces concurrency = 1 that part of the implementation would remain largely unchanged.

Is there any reason to think that changing the configuration system is going to cause any problems with this pattern given that I'm not actually interacting with the configuration values at all, just passing them verbatim into ESLint?

@nzakas
Copy link
Member

nzakas commented Jan 15, 2022

You can definitely review the RFC and see if your approach aligns with what’s in there. It’s been a while since I read it over so I can’t tell you off the top of my head.

In the new config system, the configs aren’t serializable, so that will definitely have an effect on the design. It’s worth reviewing that RFC as well: https://github.com/eslint/rfcs/tree/main/designs/2019-config-simplification

@nzakas
Copy link
Member

nzakas commented Mar 9, 2022

Closing because we need to wait for the new config system.

@nzakas nzakas closed this Mar 9, 2022
@eslint-github-bot eslint-github-bot bot locked and limited conversation to collaborators Sep 6, 2022
@eslint-github-bot eslint-github-bot bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Sep 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
archived due to age This issue has been archived; please open a new issue for any further discussion feature This change adds a new feature to ESLint triage An ESLint team member will look at this issue soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lint multiple files in parallel [$500]
2 participants