New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Parallel/Clustered Prettier #4980
Comments
Another performance optimization request: #4459 (not quite relevant, but might be still worth linking) |
@connor4312 do you have a gist of that tool you can share util a more formal pull request gets made? |
As a workaround, it might be interesting to test out https://github.com/keplersj/jest-runner-prettier? Then you should be able to take advantage of Jest's parallelisation. EDIT: Seems like it doesn't have |
I agree that a |
Yeah I personally think the combination of pre-commit hooks and Similar to what @SimenB mentioned, you could then use potentially use Regardless of personal workflow opinions, I'd like to see this built as a standalone module first, it should be pretty straight forward to expose your |
I threw together a quick implementation of a parallel prettier runner here. It's a tiny bit slower on small projects due to the overhead of starting worker processes, but is faster on larger ones. On a project that's about 1.2k files:
|
I think it would still be useful to get parallel execution built into Prettier in some way. Pre-commit hooks for all contributors are unrealistic, at least on the projects I work on. Not every project uses Jest. I still see a need to test that all files in a branch conform to Prettier style, both on CI and on local dev machines. Prettier parallelizes relatively well. Using @connor4312's |
FWIW Jest's parallelization is implemented using https://yarnpkg.com/en/package/jest-worker. It supports both spawning processes and the new worker threads api (if available) and has a promise interface. Might be a good fit for an implementation within prettier itself if the maintainers wants to go that route |
any progress on this? Would love to use multi cores without 3rd parties |
This comment has been minimized.
This comment has been minimized.
@connor4312 Do you think we can add your implementation to our CLI? |
Sure. Probably the easiest way would be to make the formatFiles loop dispatch to workers processes or threads, though there may be some additional refactoring needed. If you're interested I could try to submit a PR. |
@connor4312 Certainly interested. Please go ahead. |
I started some work in connor4312@f5f8168. I didn't realize that the prettier CLI, and tests, were fully synchronous. This does not work with parallelism, since all surfaces we have available are async. As a result expect to see a ton of churn in tests, converting This also made me realize why aws saw such a large improvement running my package over |
Actually we plan to make api async , see #4459, so we need make CLI async anyway. |
Okay, cool. I'll keep going forward in that direction then 🙂 |
@connor4312 I start to make CLI async few days ago, happened the test part didn't get started, cherry-picked your commit 😄 |
thanks for the heads up! I'll work off your branch tomorrow. |
I found it's hard to make tests pass, |
Have It is technically possible to run workloads across multiple CPU cores totally synchronously using worker_threads. Work is dispatched to the threads, it executes synchronously, and the main thread blocks waiting for the results. |
@cspotcode I investigated worker threads for parallel-prettier before, as did trivikr here, and found them to be marginally slower. |
how many CPU cores? |
@liesislukas 24 logical processors on my desktop, 16 on my laptop. I think I might have tested on my old 8-thread 2016 macbook but I don't recall. |
ok, thanks. btw, interesting laptop with 16 cores :) |
16 logical processors / 8 physical cores |
@connor4312 any updates on this work. I tried your package but am getting errors about
|
@connor4312 same here |
can it be that disk was the bottleneck? i run prettier in "parallel" by just running 14 times exec. With 1 prettier my task took ±50s with, with 14 prettier tasks down to ±15sec. The point is, running several instances can be very useful and speed may depend on actual machine. I run on apple's m1. update: took same test with 14 instances which took 15s on m1's SSD and ran it on ram disk. Mounted some Volume into RAM to check how much SSD is lagging and from 15s it went down to 5s. now. So with correct disk prettier is already running 10x faster with threading. I'm pretty sure your test was failing because of disk. If prettier would scan the target directory & would equally divide files per instance, that would drastically increase on top of these results, while in my case distribution of files count per instance is very far from perfect. |
If someone is interested, I have implemented pretty-parallel which runs prettier on multiple workers/threads. You can give it a try, and let me know if you find some issues. |
@connor4312 @fisker @cspotcode @liesislukas Any updates on parallelization - i would also spend a beer or more, if you can save time in my life. I tried both https://github.com/microsoft/parallel-prettier and https://www.npmjs.com/package/pretty-parallel. https://www.npmjs.com/package/pretty-parallel As those options seem to be not integrated in prettier owned code, and so have their issues (content detection/parsing and also not profiting from cache options), we probably would still need an option in prettier itself. So inside prettier we would need to setup some threads for parallel file detection (cache check itself could be parallezied as well) and feeding a queue of tasks todo, that get queried by multiple processes/threads. Anything i can support, if reachable by amazon, beer or chips or chocolate would be an option for me 8) Maybe if you can hint me to any source code analysis/todo's i probably can do as well. |
You can use dprint as inspiration. It has a prettier plugin which performs faster than vanilla prettier when caching is enabled. This is because dprint decides what needs to be formatted and delegates only changed files to prettier in an external process. |
I looked at getting this in Prettier a long time ago but it involved a lot of changes as basically all prettier logic was synchronous and fairly coupled at the time. I haven't looked at |
It is possible to make synchronous/blocking calls that delegate work to another thread. This means work can be dispatched to threads without changing sync logic to async. We can do any combination of: sync dispatcher & async workers |
It sure is, if you're feeling ambitious you should try putting in a PR! |
For reference - the reason that https://nodejs.org/en/docs/guides/dont-block-the-event-loop The TL;DR is that NodeJS spawns with a number of threads that it uses to do its async workloads such as FS, DNS, zlib, and crypto. Currently this number is 4, but it is also controllable via the This means that when you're using OTOH if you use By using For context: a big part of my last 6 months has been optimising NodeJS at scale at Canva so I've had a lot of time to learn and experiment with parallelisation in different contexts (webpack, eslint and prettier). I have a working implementation of a CLI using For the maintainers - what would be the expected design for a 1st-party parallel API for prettier? I'd love to figure out how we can improve the state here! |
Hi prettier 👋
We have a couple larger codebases which we use prettier on. Soon we're going to merge many of these large codebases into an even larger monorepo. Prettier is pretty fast, but over a couple hundred thousand lines of code, it could benefit from running on multiple cores.
In some of our projects we created a small wrapper that uses the programmatic API and Node's
cluster
to run prettier on all available CPU cores. This makes it quite a bit faster:There's some further optimizations we could make, but generally we've found this functionality pretty useful :)
I was wondering whether there's interest in bringing this functionality into prettier, either as a default run mode and/or under a
--parallel
flag.--parallel
alone could run a subprocess for each CPU core, or you could provide the concurrency afterwards e.g.--parallel 4
.The text was updated successfully, but these errors were encountered: