Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Parallel/Clustered Prettier #4980

Open
connor4312 opened this issue Aug 14, 2018 · 36 comments
Open

Feature Request: Parallel/Clustered Prettier #4980

connor4312 opened this issue Aug 14, 2018 · 36 comments
Labels
area:cli Issues with Prettier's Command Line Interface type:perf Issue with performance of Prettier

Comments

@connor4312
Copy link
Contributor

connor4312 commented Aug 14, 2018

Hi prettier 👋

We have a couple larger codebases which we use prettier on. Soon we're going to merge many of these large codebases into an even larger monorepo. Prettier is pretty fast, but over a couple hundred thousand lines of code, it could benefit from running on multiple cores.

Edit from 2021: Later on in this thread I published the wrapper cli, which I continue to use daily 🙂 https://www.npmjs.com/package/@mixer/parallel-prettier

In some of our projects we created a small wrapper that uses the programmatic API and Node's cluster to run prettier on all available CPU cores. This makes it quite a bit faster:

➜  cloc src
    1900 text files.
    1889 unique files.
      72 files ignored.

github.com/AlDanial/cloc v 1.76  T=2.02 s (906.3 files/s, 63545.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
// ...
-------------------------------------------------------------------------------
SUM:                          1829          15143           8514         104580
-------------------------------------------------------------------------------

➜  time prettier -l "src/**/*.{ts,scss}"
prettier -l "src/**/*.{ts,scss}"  21.77s user 1.40s system 128% cpu 18.070 total
➜  time node build/tools/prettier.js --assert
node build/tools/prettier.js --assert  52.26s user 2.02s system 678% cpu 6.997 total

There's some further optimizations we could make, but generally we've found this functionality pretty useful :)

I was wondering whether there's interest in bringing this functionality into prettier, either as a default run mode and/or under a --parallel flag. --parallel alone could run a subprocess for each CPU core, or you could provide the concurrency afterwards e.g. --parallel 4.

@kachkaev
Copy link
Member

Another performance optimization request: #4459

(not quite relevant, but might be still worth linking)

@j-f1 j-f1 added the type:perf Issue with performance of Prettier label Aug 14, 2018
@jlisee
Copy link

jlisee commented Oct 18, 2018

@connor4312 do you have a gist of that tool you can share util a more formal pull request gets made?

@SimenB
Copy link
Contributor

SimenB commented Oct 18, 2018

As a workaround, it might be interesting to test out https://github.com/keplersj/jest-runner-prettier? Then you should be able to take advantage of Jest's parallelisation.

EDIT: Seems like it doesn't have --write, should probably add that the same way the eslint runner supports fix

@MarkTiedemann
Copy link

MarkTiedemann commented Oct 19, 2018

I agree that a --parallel flag is a good idea for speeding up the initial formatting of a codebase. But for consecutive formatting, I think using pre-commit hooks or other sorts of incremental formatting, meaning that only files that were modified will be re-formatted, is probably more efficient in most cases.

@azz
Copy link
Member

azz commented Oct 19, 2018

Yeah I personally think the combination of pre-commit hooks and pretty-quick in dev, and using eslint-plugin-prettier in CI to be the preferable combo. I find it pretty rare that I use the -l feature of Prettier nowadays.

Similar to what @SimenB mentioned, you could then use potentially use jest-runner-eslint for concurrency (I don't think ESLint has concurrent linting built in? eslint/eslint#3565)

Regardless of personal workflow opinions, I'd like to see this built as a standalone module first, it should be pretty straight forward to expose your build/tools/prettier.js script as a standalone library.

@connor4312
Copy link
Contributor Author

I threw together a quick implementation of a parallel prettier runner here. It's a tiny bit slower on small projects due to the overhead of starting worker processes, but is faster on larger ones. On a project that's about 1.2k files:

prettier --write "src/**/*.{ts,tsx,json,scss}"  27.09s user 3.26s system 123% cpu 24.496 total
pprettier --write "src/**/*.{ts,tsx,json,scss}"  1.21s user 0.27s system 14% cpu 10.580 total

@kevgo
Copy link

kevgo commented Aug 2, 2019

I think it would still be useful to get parallel execution built into Prettier in some way. Pre-commit hooks for all contributors are unrealistic, at least on the projects I work on. Not every project uses Jest. I still see a need to test that all files in a branch conform to Prettier style, both on CI and on local dev machines. Prettier parallelizes relatively well. Using @connor4312's pprettier tool I was able to bring down the test time from 5s to 0.6s on an 8-core machine.

@SimenB
Copy link
Contributor

SimenB commented Aug 3, 2019

FWIW Jest's parallelization is implemented using https://yarnpkg.com/en/package/jest-worker. It supports both spawning processes and the new worker threads api (if available) and has a promise interface. Might be a good fit for an implementation within prettier itself if the maintainers wants to go that route

@liesislukas
Copy link

any progress on this? Would love to use multi cores without 3rd parties

@fisker

This comment has been minimized.

@fisker
Copy link
Sponsor Member

fisker commented Jan 27, 2021

@connor4312 Do you think we can add your implementation to our CLI?

@connor4312
Copy link
Contributor Author

connor4312 commented Jan 27, 2021

Sure. Probably the easiest way would be to make the formatFiles loop dispatch to workers processes or threads, though there may be some additional refactoring needed. If you're interested I could try to submit a PR.

@thorn0
Copy link
Member

thorn0 commented Jan 27, 2021

@connor4312 Certainly interested. Please go ahead.

@connor4312
Copy link
Contributor Author

connor4312 commented Feb 6, 2021

I started some work in connor4312@f5f8168. I didn't realize that the prettier CLI, and tests, were fully synchronous. This does not work with parallelism, since all surfaces we have available are async. As a result expect to see a ton of churn in tests, converting runPrettier to be async and moving describe blocks to test (describe blocks are not made to run tests and don't support async functions).

This also made me realize why aws saw such a large improvement running my package over Promise.all'ing prettier for each of their subpackages: both more efficient delegation of work but also async io to enable parallel reading/writing/formatting.

@fisker
Copy link
Sponsor Member

fisker commented Feb 6, 2021

Actually we plan to make api async , see #4459, so we need make CLI async anyway.

@connor4312
Copy link
Contributor Author

Okay, cool. I'll keep going forward in that direction then 🙂

@fisker
Copy link
Sponsor Member

fisker commented Feb 7, 2021

@connor4312 I start to make CLI async few days ago, happened the test part didn't get started, cherry-picked your commit 😄

fisker#1091

@connor4312
Copy link
Contributor Author

thanks for the heads up! I'll work off your branch tomorrow.

@fisker
Copy link
Sponsor Member

fisker commented Feb 7, 2021

I found it's hard to make tests pass, describe can't use async, you already changed to test, but test can't be nested..

@thorn0 thorn0 added the area:cli Issues with Prettier's Command Line Interface label Mar 12, 2021
@fisker fisker mentioned this issue Apr 30, 2021
4 tasks
@fisker
Copy link
Sponsor Member

fisker commented May 11, 2021

@connor4312

#10804
#10841
#10852 (Not merged yet)

Do you need something else before you start work on this?

@cspotcode
Copy link

cspotcode commented May 23, 2021

Have worker_threads been considered? They incur less overhead than spawning external processes, and you can use something like comlink for simple communication between worker and leader thread.

It is technically possible to run workloads across multiple CPU cores totally synchronously using worker_threads. Work is dispatched to the threads, it executes synchronously, and the main thread blocks waiting for the results.

@connor4312
Copy link
Contributor Author

@cspotcode I investigated worker threads for parallel-prettier before, as did trivikr here, and found them to be marginally slower.

@liesislukas
Copy link

@cspotcode I investigated worker threads for parallel-prettier before, as did trivikr here, and found them to be marginally slower.

how many CPU cores?

@connor4312
Copy link
Contributor Author

@liesislukas 24 logical processors on my desktop, 16 on my laptop. I think I might have tested on my old 8-thread 2016 macbook but I don't recall.

@liesislukas
Copy link

ok, thanks. btw, interesting laptop with 16 cores :)

@connor4312
Copy link
Contributor Author

16 logical processors / 8 physical cores

@simkessy
Copy link

@connor4312 any updates on this work. I tried your package but am getting errors about path cannot be empty also getting this error when I specify the path:

/.nvm/versions/node/v16.14.2/lib/node_modules/@mixer/parallel-prettier/node_modules/rxjs/internal/util/hostReportError.js:4
    setTimeout(function () { throw err; }, 0);
                             ^

[Error: EISDIR: illegal operation on a directory, read] {
  errno: -21,
  code: 'EISDIR',
  syscall: 'read'
}
me/.nvm/versions/node/v16.14.2/lib/node_modules/@mixer/parallel-prettier/node_modules/rxjs/internal/util/hostReportError.js:4
    setTimeout(function () { throw err; }, 0);
                             ^

WorkerExitedError: Worker exited with unexpected 1 code
    at Worker.<anonymous> (me/.nvm/versions/node/v16.14.2/lib/node_modules/@mixer/parallel-prettier/dist/worker-pool.js:57:59)
    at Worker.emit (node:events:526:28)

@gunta
Copy link

gunta commented Apr 13, 2022

@connor4312 same here

@liesislukas
Copy link

liesislukas commented May 9, 2022

@cspotcode I investigated worker threads for parallel-prettier before, as did trivikr here, and found them to be marginally slower.

can it be that disk was the bottleneck? i run prettier in "parallel" by just running 14 times exec. With 1 prettier my task took ±50s with, with 14 prettier tasks down to ±15sec. The point is, running several instances can be very useful and speed may depend on actual machine. I run on apple's m1.

update: took same test with 14 instances which took 15s on m1's SSD and ran it on ram disk. Mounted some Volume into RAM to check how much SSD is lagging and from 15s it went down to 5s. now. So with correct disk prettier is already running 10x faster with threading. I'm pretty sure your test was failing because of disk.

If prettier would scan the target directory & would equally divide files per instance, that would drastically increase on top of these results, while in my case distribution of files count per instance is very far from perfect.

@skrtheboss
Copy link

If someone is interested, I have implemented pretty-parallel which runs prettier on multiple workers/threads. You can give it a try, and let me know if you find some issues.

@larsilus
Copy link

larsilus commented Aug 4, 2023

@connor4312 @fisker @cspotcode @liesislukas Any updates on parallelization - i would also spend a beer or more, if you can save time in my life.

I tried both https://github.com/microsoft/parallel-prettier and https://www.npmjs.com/package/pretty-parallel.
Results:
https://github.com/microsoft/parallel-prettier
=> is not really faster as the prettier with option "--cache" (on my 790 files it took 30s, whereas prettier --cache just needed 11s)
=> and it fails on .xml

https://www.npmjs.com/package/pretty-parallel
=> seems to analyse/parse files on his own, fails on json files including formatting (vscode does this for their extensions.json e.g.) and currently also fails on ,js, because it interprets it with the yaml parser. so probably the architecture seems to be not relying on prettier code.

As those options seem to be not integrated in prettier owned code, and so have their issues (content detection/parsing and also not profiting from cache options), we probably would still need an option in prettier itself.

So inside prettier we would need to setup some threads for parallel file detection (cache check itself could be parallezied as well) and feeding a queue of tasks todo, that get queried by multiple processes/threads.

Anything i can support, if reachable by amazon, beer or chips or chocolate would be an option for me 8) Maybe if you can hint me to any source code analysis/todo's i probably can do as well.

@cspotcode
Copy link

You can use dprint as inspiration. It has a prettier plugin which performs faster than vanilla prettier when caching is enabled. This is because dprint decides what needs to be formatted and delegates only changed files to prettier in an external process.

@connor4312
Copy link
Contributor Author

I looked at getting this in Prettier a long time ago but it involved a lot of changes as basically all prettier logic was synchronous and fairly coupled at the time. I haven't looked at --cache and that did not exist at the time I made parallel-prettier. If that's faster for you, you should use that instead 👍

@cspotcode
Copy link

It is possible to make synchronous/blocking calls that delegate work to another thread. This means work can be dispatched to threads without changing sync logic to async. We can do any combination of:

sync dispatcher & async workers
async dispatcher & sync workers
both async
both sync

@connor4312
Copy link
Contributor Author

It sure is, if you're feeling ambitious you should try putting in a PR!

@bradzacher
Copy link

For reference - the reason that worker_threads are slower than child_process et al is because of how node handles its "native async workloads" (fs, crypto, etc).

https://nodejs.org/en/docs/guides/dont-block-the-event-loop

The TL;DR is that NodeJS spawns with a number of threads that it uses to do its async workloads such as FS, DNS, zlib, and crypto. Currently this number is 4, but it is also controllable via the UV_THREADPOOL_SIZE env var.

This means that when you're using worker_threads you have n threads within that process all contending for the same (by default) 4 async worker threads to access the disk. ESP if you're using more than 4 threads this can quickly lead to blocking cases which slows things down.

OTOH if you use child_process then you physically spawn a new NodeJS instance for your new "thread" - this means you get 4 dedicated "async worker threads" for each of your threads. This essentially helps push the bottleneck down to the OS/disk layer instead - improving performance.


By using worker_threads you do get the lower cost of IPC, but the (likely) higher cost of FS access. From what I've found - as long as you don't have an "overly chatty" IPC protocol you end up with more performance and throughput from using child_process.
Workloads like prettier only really need to push paths across as IPC for the CLI (i.e. IPC is far from "chatty") - so you end up with more wins from using child_process


For context: a big part of my last 6 months has been optimising NodeJS at scale at Canva so I've had a lot of time to learn and experiment with parallelisation in different contexts (webpack, eslint and prettier).

I have a working implementation of a CLI using node:cluster and I'd love to contribute it upstream. I've used it internally and Canva (it was how I was able to migrate our monorepo from dprint to prettier without going insane or destroying CI times!). It formats ~60k files in ~60s with "just" 4 worker processes on my m1 macbook pro.

For the maintainers - what would be the expected design for a 1st-party parallel API for prettier? I'd love to figure out how we can improve the state here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:cli Issues with Prettier's Command Line Interface type:perf Issue with performance of Prettier
Projects
None yet
Development

No branches or pull requests