New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Ability to run Fixer with parallel runner 🎉 #7777
base: master
Are you sure you want to change the base?
Conversation
OK, let me just merge as-is |
Congratulation to 7777! 😎 |
19f4115
to
6b5ff67
Compare
6b5ff67
to
47acbaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some early feedback while screening the proposal
d72fb36
to
96354b4
Compare
Most of the projects I use php-cs-fixer on do not have more than 500 files.
Wouldn't 250 per process be a bit of a waste of all the cores available
then?
(But maybe cause of the max 500 files, I never had speed issues with
php-cs-fixer to be honest :)).
…On Tue, 30 Jan 2024 at 23:00, Dariusz Rumiński ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/Runner/Parallel/ParallelConfig.php
<#7777 (comment)>
:
> + * This source file is subject to the MIT license that is bundled
+ * with this source code in the file LICENSE.
+ */
+
+namespace PhpCsFixer\Runner\Parallel;
+
+use Fidry\CpuCoreCounter\CpuCoreCounter;
+use Fidry\CpuCoreCounter\Finder\DummyCpuCoreFinder;
+use Fidry\CpuCoreCounter\Finder\FinderRegistry;
+
+/**
+ * @author Greg Korba ***@***.***>
+ */
+final class ParallelConfig
+{
+ private const DEFAULT_FILES_PER_PROCESS = 10;
That's why I was wondering not about using 1000, but 250 - still having
chance to fairly distribute problematic files and not have one handing job
while everyone else finished, but still avoiding the overhead of
initialising full app for 10 files only.
ah, so the timeout is not to pass any msg from worker to supervisor, but
for whole worker to finish the job? if we would have worker reporting each
file progress, would it prevent having big timeouts? (ie it's timeout till
we hear anything back from worker, or timeout to worker to finish
everything?)
[not judging for next steps direction, simply in need to understand how
this work]
—
Reply to this email directly, view it on GitHub
<#7777 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALMD2DWVMNIMNB73Y6PHOXTYRFUQXAVCNFSM6AAAAABCJFJP56VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNJSGQ2TAMBYGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@jorismak default chunk size is 10, so in your case it's ~50 chunks distributed across available workers. But it's configurable (amount of cores, chunk size, timeout) 😉. |
cool. I replied on a comment to set it to 250 and I was thinking 'oh
but...' :).
Configurable is of course the best answer.
…On Wed, 31 Jan 2024 at 10:53, Greg Korba ***@***.***> wrote:
@jorismak <https://github.com/jorismak> default chunk size is 10, so in
your case it's ~50 chunks distributed across available workers. But *it's
configurable* (core's amount, chunk size, timeout) 😉.
—
Reply to this email directly, view it on GitHub
<#7777 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALMD2DVJN2W4UP3YR7WGYBDYRIICZAVCNFSM6AAAAABCJFJP56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJYG42TSOBZHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
6deb7e0
to
d660808
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome!
Do you think it would be possible to implement high level tests?
@julienfalque thank you very much for the review ❤️. I'll address your comments soon. In terms of tests, I thought about running separate job in CI workflow that would run Fixer in Docker (using |
58a51a6
to
12e893d
Compare
- always expect CPU(s) usage info in the output - always use dist config for PHAR parallel test (because local config can use sequential analysis)
@keradus I've addressed and resolved all your comments where applicable, the remaining open discussions require your review (accepting my answer or no). From my perspective it's pretty ready, considering:
Thanks in advance for taking an action as soon as possible (with no pressure of course). |
i picked up the not finished threads topic again, and tried to debug that via a simple approach: via sed i stripped out all declares, and rerun phpcsfixer in order to determine if everything was touched and this is a reporting issue. $ sed -i 's/declare(strict_types=1);//' src/**/*.php at least all those changes get reverted by running csfixer, so it seems to be "just" a reporting issue. sorry late the reply; as for benchmarks: multicore
vs singlecore
btw, i just noticed now, that also running single-threaded yields the same report bug 🤔 |
@verfriemelt-dot-org so basically there's some issue with reporting progress, but not related to parallel runner - it's good information from this PR's perspective 🙂. Still interesting, did you run with In terms of parallelisation - good improvement in your case 😊. |
10 files per process sounds like small batch size. does a worker die after processing a batch or will it work thru another batch afterwards? the perf benefit of parallel workers (and also the memory consumption) will grow with a bigger batch size |
@staabm in this case it does not matter that much. The main process spawns N workers and each worker gets a file batch, when the worker finishes processing it reports it to the main process and it gets another batch. In the meantime the analysis result is reported per-file to keep real-time progress output. Increasing the batch size will only decrease the amount of server-worker requests (batches) and the worker-server confirmations, analysis reports will be exactly the same. To utilise more CPUs smaller batch size is better, at least for small projects where a bigger chunk would hit total file size sooner (e.g. for a project with 500 files and batch set to 100 there would be only 5 workers, so if there are more CPUs available it would not use the resources optimally). It's configurable though, so everyone can set it as they want 😊. |
----- marker btw, git-rebase breaks links for comments like one in: #7777 (comment) |
// Actions handled by the runner | ||
public const RUNNER_ERROR_REPORT = 'errorReport'; | ||
public const RUNNER_HELLO = 'hello'; | ||
public const RUNNER_RESULT = 'result'; | ||
public const RUNNER_GET_FILE_CHUNK = 'getFileChunk'; | ||
|
||
// Actions handled by the worker | ||
public const WORKER_RUN = 'run'; | ||
public const WORKER_THANK_YOU = 'thankYou'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Actions handled by the runner | |
public const RUNNER_ERROR_REPORT = 'errorReport'; | |
public const RUNNER_HELLO = 'hello'; | |
public const RUNNER_RESULT = 'result'; | |
public const RUNNER_GET_FILE_CHUNK = 'getFileChunk'; | |
// Actions handled by the worker | |
public const WORKER_RUN = 'run'; | |
public const WORKER_THANK_YOU = 'thankYou'; | |
// Actions handled by the runner and created by worker | |
public const RUNNER_ERROR_REPORT = 'errorReport'; | |
public const RUNNER_HELLO = 'hello'; | |
public const RUNNER_RESULT = 'result'; | |
public const RUNNER_GET_FILE_CHUNK = 'getFileChunk'; | |
// Actions handled by the worker and created by runner | |
public const WORKER_RUN = 'run'; | |
public const WORKER_THANK_YOU = 'thankYou'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it's superfluous - if action is handled by the worker it means the action was requested by the runner and vice versa. I don't feel the need to add it explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my habit in projects was to name things after the producer and not the consumer, as there could be many consumers but usually dedicated, single producer.
I know here is 1-1 relation, but it took me a bit to switch the habit.
If no harm to accept suggestion, please consider it. Or maybe switch to Actions consumed by...
// Worker requests for another file chunk when all files were processed | ||
foreach ($workerResponse['errors'] ?? [] as $workerError) { | ||
$this->errorsManager->report(new Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment here - how it's connected to iterating over errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, I don't know why it's here. Probably added here by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, then we can drop it :)
}); | ||
}, | ||
static function (\Throwable $error) use ($errorOutput): void { | ||
$errorOutput->writeln($error->getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to provoke this to run (throwing throw new \LogicException("aaa")
few lines above) and I didn't notice any result of this callback.
When does this happen?
should we redirect it to $out->on(error...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's related to React connectivity errors (onRejected) and is printed to the output of worker's stdErr, see WorkerCommandTest::testWorkerCantConnectToServerWhenExecutedDirectly()
. But most probably it should throw an exception that would be printed by Application::doRenderThrowable()
in format that could be parsed by the main runner and displayed to the user 🤔.
'using-cache' => $input->getOption('using-cache'), | ||
'cache-file' => $input->getOption('cache-file'), | ||
'diff' => $input->getOption('diff'), | ||
'stop-on-violation' => false, // @TODO Pass this option to the runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memo ;)
* | ||
* @covers \PhpCsFixer\Runner\Parallel\ParallelConfig | ||
* | ||
* @TODO Test `detect()` method, but first discuss the best way to do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, if CpuCoreCounter
would provide interface, we would be easily able to mock it...
(one of reasons why I like to have interfaces all around, and not only implementations)
non-actionable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think of it, maybe we can do something like this (following your changes from Wirone#5):
final class ParallelConfigFactory
{
private static ?CpuCoreCounter $cpuDetector = null;
private function __construct() {}
public static function sequential(): ParallelConfig
{
return new ParallelConfig(1);
}
/**
* @param null|positive-int $filesPerProcess
* @param null|positive-int $processTimeout
*/
public static function detect(
?int $filesPerProcess = null,
?int $processTimeout = null,
): ParallelConfig {
if (null === self::$cpuDetector) {
self::$cpuDetector = new CpuCoreCounter([
...FinderRegistry::getDefaultLogicalFinders(),
new DummyCpuCoreFinder(1),
]);
}
return new ParallelConfig(
...array_filter(
[$counter->getCount(), $filesPerProcess, $processTimeout],
static fn ($value): bool => null !== $value
)
);
}
}
and then in tests we could do:
$parallelConfigFactoryReflection = new \ReflectionClass(ParallelConfigFactory::class);
$cpuDetector = $parallelConfigFactoryReflection->getProperty('cpuDetector');
$cpuDetector->setAccessible(true);
$cpuDetector->setValue($parallelConfigFactoryReflection, new CpuCoreCounter([
new DummyCpuCoreFinder(7),
]));
$config = ParallelConfigFactory::detect(1, 100);
self::assertSame(7, $config->getMaxProcesses());
That brings 2 advantages: CPU detector is initialised only once + we can "mock" it. I've tested it locally and it works. Feels little dirty, but does what's needed 😅.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with your suggestion 👍🏻
(agree about "dirty" part for using reflection to access private property - but again hay, no interface to mock)
} | ||
|
||
// Worker requests for another file chunk when all files were processed | ||
foreach ($workerResponse['errors'] ?? [] as $workerError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand right, on RUNNER_RESULT action the errors
payload is result of ErrorsManager->forPath(): list<Error>
, the variable called $workerError
is misleading with WorkerError
class. Can we rename it ?
foreach ($workerResponse['errors'] ?? [] as $workerError) { | |
foreach ($workerResponse['errors'] ?? [] as $error) { |
} | ||
|
||
/** | ||
* @requires OS Linux|Darwin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially blocking
how will parallel execution behave on Windows?
if it would not work, let's have a check when enabling it under windows to crash and complain (or auto-switch to non-parallel execution)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cause of this lays in react/child-process
component
- PhpCsFixer\Tests\Console\Command\WorkerCommandTest::testWorkerCommunicatesWithTheServer
Process pipes are not supported on Windows due to their blocking nature on Windows
But at the same time running Fixer in parallel works on Windows, so it's rather not about parallel runner itself, but about how tests are created. Maybe I did something wrong and just couldn't work around it. @WyriHaximus do you know why I got that error, since React processes are created with explicit $fds
:
$this->process = new ReactProcess($this->command, null, null, [
1 => $this->stdOut,
2 => $this->stdErr,
]);
(stdOut
and stdErr
are created with tmpfile()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can put it as a comment to test, would be awesome.
then, non-actionable for me (if we enable this test one day, awesome. if not, OKish)
…dicate it shall not be called from outside of this class, ref PHP-CS-Fixer#7777 (comment)
…dicate it shall not be called from outside of this class, ref PHP-CS-Fixer#7777 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I feel really shitty that we need whole this own implementation and kind-of simulate parallel execution by spawning TCP process.
PHP really needs proper, native, syntax-supported async execution.
Yet, I ack that it's not actionable till we have proper support on PHP. (sorry for venting)
I reviewed whole yet another round. Overall, quite a nice job 🎉 and I have few requests to it.
Actionable:
- I left few comments (above and below this msg). [For lot of things, i went directly with PRs and avoided replying in long threads.]
- Parallel execution - minor changes Wirone/PHP-CS-Fixer#5
- Parallel execution - minor cleanup to move more strict Wirone/PHP-CS-Fixer#6
- Parallel execution - use parallel config by default for future mode Wirone/PHP-CS-Fixer#7
- Parallel execution - reuse existing cache hash to avoid IO read operation Wirone/PHP-CS-Fixer#8
* | ||
* @internal | ||
*/ | ||
final class WorkerError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not enjoy this.
I wanted Errors to be recoverable issues related to wrong input or single fixer failure (eg producing invalid code).
Here, this is Exception like error on parallelization itself.
Please:
- call it exception and not error
- move it away from ErrorsManager
- move it to src/Parallel namespace
- when encountered in Runner (via cross-process message), crash the execution (same as if exception would happen in Runner and not in side-process)
Parallel Runner
Fixer is a great and widely used tool, but comparing to other modern PHP tools it lacked one crucial thing: ability to utilise more CPU cores. Until now 🥳 !
I've managed to hook into current runner and provide parallel analysis, heavily inspired by the PHPStan's implementation. By default Fixer still uses sequential analysis, but parallel runner can be easily enabled through config with
->setParallelConfig(\PhpCsFixer\Runner\Parallel\ParallelConfig::detect())
(with core auto-detection) or->setParallelConfig(new \PhpCsFixer\Runner\Parallel\ParallelConfig(5, 20))
(explicit config).Fixes #2803
ℹ️ If you like this change, consider following me and/or sponsoring my OSS work 😎.
Test it on your code! #
Just add this to your config (assuming you're using default config builder):
and then you have 2 options:
Docker image #
docker run --rm -it -v $(pwd):/code wirone/php-cs-fixer:parallel check
Image is multi-arch, so it should work on any kind of hardware/software. Let me know if you have any problems.
Override Composer package with a fork #
Modify your
composer.json
:and run
composer update friendsofphp/php-cs-fixer -w
Concern: more dependencies #
Everything comes at some cost, we can't achieve parallel analysis with our internal code only. I mean, we could, but it does not make sense 😅. I managed to lower Composer constraints for ReactPHP packages so these should be compatible everywhere (or at least almost). For example
react/promise
v2.6 orreact/socket
v1.0 (installed on PHP 7.4 with--prefer-lowest
) are from 2018. All these packages support PHP >=5.3, so I believe they should not cause any issues when it comes to compatibility with people's runtimes and apps.CPU core auto-detection #
Auto-detection works properly, at least for cases I could test locally (native execution on the host, execution in Docker with limited CPU cores):
As you can see in the CI, it also properly works in Github Actions, where it detects 4 CPUs, which also speeds up all the Fixer jobs 🙂.
TODO #
I wanted to provide this change as a draft to collect the feedback - both technical from the review, but also from users' perspective (UX, performance, potential problems).
PR is marked as draft to prevent merge, butreview can be done, having this in mind:xargs
-based referencesReal world impact #
I made some workbench tests and below you can find the numbers for sequential and parallel runs for several projects. Analysis for external projects was done with locally built Docker image containing code from this branch, with parallel auto-detection (effectively 7 cores on MacBook Pro M1, because I have limit set on OrbStack level), and code mounted as a volume:
friendsofphp/php-cs-fixer
180.956 seconds, 152.279 respectively before iterator fix
symfony/symfony
CuyZ/Valinor
(info)