Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Ability to run Fixer with parallel runner 🎉 #7777

Open
wants to merge 72 commits into
base: master
Choose a base branch
from

Conversation

Wirone
Copy link
Member

@Wirone Wirone commented Jan 24, 2024

Parallel Runner

Fixer is a great and widely used tool, but comparing to other modern PHP tools it lacked one crucial thing: ability to utilise more CPU cores. Until now 🥳 !

I've managed to hook into current runner and provide parallel analysis, heavily inspired by the PHPStan's implementation. By default Fixer still uses sequential analysis, but parallel runner can be easily enabled through config with ->setParallelConfig(\PhpCsFixer\Runner\Parallel\ParallelConfig::detect()) (with core auto-detection) or ->setParallelConfig(new \PhpCsFixer\Runner\Parallel\ParallelConfig(5, 20)) (explicit config).

Fixes #2803

ℹ️ If you like this change, consider following me and/or sponsoring my OSS work 😎.

Test it on your code! #

Just add this to your config (assuming you're using default config builder):

->setParallelConfig(\PhpCsFixer\Runner\Parallel\ParallelConfig::detect())

and then you have 2 options:

Docker image #

docker run --rm -it -v $(pwd):/code wirone/php-cs-fixer:parallel check

Image is multi-arch, so it should work on any kind of hardware/software. Let me know if you have any problems.

Override Composer package with a fork #

Modify your composer.json:

{
    "repositories": [
        {
            "type": "vcs",
            "url": "https://github.com/Wirone/PHP-CS-Fixer"
        }
    ],
    "require": {
        "friendsofphp/php-cs-fixer": "dev-codito/bombazo as 3.7777"
    }
}

and run composer update friendsofphp/php-cs-fixer -w

Concern: more dependencies #

Everything comes at some cost, we can't achieve parallel analysis with our internal code only. I mean, we could, but it does not make sense 😅. I managed to lower Composer constraints for ReactPHP packages so these should be compatible everywhere (or at least almost). For example react/promise v2.6 or react/socket v1.0 (installed on PHP 7.4 with --prefer-lowest) are from 2018. All these packages support PHP >=5.3, so I believe they should not cause any issues when it comes to compatibility with people's runtimes and apps.

CPU core auto-detection #

Auto-detection works properly, at least for cases I could test locally (native execution on the host, execution in Docker with limited CPU cores):

image

As you can see in the CI, it also properly works in Github Actions, where it detects 4 CPUs, which also speeds up all the Fixer jobs 🙂.

TODO #

I wanted to provide this change as a draft to collect the feedback - both technical from the review, but also from users' perspective (UX, performance, potential problems). PR is marked as draft to prevent merge, but review can be done, having this in mind:

  • Remove BC break before continuing
  • Tests. I did not write them because I wasn't sure how the final contract will look like.
  • Handling errors in worker (displaying them on the main process' side)
  • Proper cache support
  • Check required ReactPHP versions, maybe we can lower the constraints and make installation more inclusive (for projects that already use Fixer and ReactPHP with lower version)
  • Usage docs
  • Remove quasi-parallel script from the Composer scripts and any xargs-based references

Real world impact #

I made some workbench tests and below you can find the numbers for sequential and parallel runs for several projects. Analysis for external projects was done with locally built Docker image containing code from this branch, with parallel auto-detection (effectively 7 cores on MacBook Pro M1, because I have limit set on OrbStack level), and code mounted as a volume:

docker build --target dist -t fixer:local .
cd /path/to/project/for/analyse
docker run --rm -it -v $(pwd):/code fixer:local check -vvv
Repository Files count Sequential Parallel
friendsofphp/php-cs-fixer 1080 65.319 seconds 8.088 seconds (11.623 before iterator fix)
GetResponse (with `@PER-CS2.0` ruleset) 31376 518.740 seconds 75.879 seconds (7 cores in Docker), 76.262 (10 cores natively on host)

180.956 seconds, 152.279 respectively before iterator fix
symfony/symfony 6220 183.977 seconds 35.440 seconds (64.352 before iterator fix)
CuyZ/Valinor (info) 605 6.369 seconds 1.959 seconds (4.785 before iterator fix)

@Wirone Wirone self-assigned this Jan 24, 2024
@coveralls
Copy link

coveralls commented Jan 24, 2024

Coverage Status

coverage: 95.699% (-0.4%) from 96.121%
when pulling dc15acf on Wirone:codito/bombazo
into c3af946 on PHP-CS-Fixer:master.

@keradus
Copy link
Member

keradus commented Jan 24, 2024

OK, let me just merge as-is

@mvorisek
Copy link
Contributor

Congratulation to 7777! 😎

@Wirone Wirone changed the title feat: The one you're waiting for 😎 feat: Ability to run Fixer with parallel runner 🎉 Jan 28, 2024
@Wirone Wirone added topic/I/O topic/core Core features of Fixer's engine labels Jan 28, 2024
composer.json Outdated Show resolved Hide resolved
src/Runner/Runner.php Outdated Show resolved Hide resolved
src/Runner/Runner.php Outdated Show resolved Hide resolved
Copy link
Member

@keradus keradus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some early feedback while screening the proposal

src/Runner/Runner.php Show resolved Hide resolved
src/Runner/Runner.php Outdated Show resolved Hide resolved
src/Runner/Runner.php Outdated Show resolved Hide resolved
src/Runner/Parallel/ParallelConfig.php Show resolved Hide resolved
src/Runner/Parallel/ParallelConfig.php Outdated Show resolved Hide resolved
src/Runner/Parallel/Process.php Show resolved Hide resolved
@jorismak
Copy link

jorismak commented Jan 31, 2024 via email

@Wirone
Copy link
Member Author

Wirone commented Jan 31, 2024

@jorismak default chunk size is 10, so in your case it's ~50 chunks distributed across available workers. But it's configurable (amount of cores, chunk size, timeout) 😉.

@jorismak
Copy link

jorismak commented Jan 31, 2024 via email

@Wirone Wirone force-pushed the codito/bombazo branch 2 times, most recently from 6deb7e0 to d660808 Compare February 1, 2024 07:18
Copy link
Member

@julienfalque julienfalque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!

Do you think it would be possible to implement high level tests?

src/Runner/FileCachingLintingFileIterator.php Outdated Show resolved Hide resolved
src/Runner/Parallel/ProcessPool.php Outdated Show resolved Hide resolved
src/Runner/Parallel/ProcessPool.php Outdated Show resolved Hide resolved
src/Runner/Parallel/ProcessPool.php Show resolved Hide resolved
src/Runner/Parallel/ProcessPool.php Outdated Show resolved Hide resolved
tests/Console/ConfigurationResolverTest.php Outdated Show resolved Hide resolved
src/Runner/Runner.php Outdated Show resolved Hide resolved
src/Runner/Parallel/Process.php Show resolved Hide resolved
src/Runner/Parallel/Process.php Outdated Show resolved Hide resolved
src/Console/Command/WorkerCommand.php Outdated Show resolved Hide resolved
@Wirone
Copy link
Member Author

Wirone commented Feb 2, 2024

@julienfalque thank you very much for the review ❤️. I'll address your comments soon.

In terms of tests, I thought about running separate job in CI workflow that would run Fixer in Docker (using docker run) because we can utilise --cpus 2 to ensure stable CPUs amount, and assert that Running analysis on 2 cores with X files per process is in the output. I did not look how it's tested in PHPStan yet, I wanted to make final internal API for this and then figure out how to test it 😅. Any suggestions are highly welcome, though.

@Wirone
Copy link
Member Author

Wirone commented Mar 26, 2024

@keradus I've addressed and resolved all your comments where applicable, the remaining open discussions require your review (accepting my answer or no). From my perspective it's pretty ready, considering:

  • Decreased coverage (-0.4%) is acceptable having in mind the amount and specificity of changes
  • Infection reports many escaped mutants, but this particular PR is not a good place for mutation playground 😅. Many of reported mutants are false-positives (it's general Infection characteristic)
  • there are still @Slamdunk's comments to solve, to summarise:
    • do we want enable parallel by default? I would say: it's OK to enable 2 CPUs by default.
    • do we want to print "experimental feature" warning? I prefer to keep it, however it somehow does not fit with parallel by default.

Thanks in advance for taking an action as soon as possible (with no pressure of course).

@verfriemelt-dot-org
Copy link
Contributor

@romm @verfriemelt-dot-org @mfn would you be so kind and re-verify the parallel runner on your codebases? Memory usage fix apparently improves also analysis time even 2.5x comparing to what you tested before 🙂. I can add your projects to the table in the PR's description when I know the data.

image

i picked up the not finished threads topic again, and tried to debug that via a simple approach:

via sed i stripped out all declares, and rerun phpcsfixer in order to determine if everything was touched and this is a reporting issue.

$ sed -i 's/declare(strict_types=1);//' src/**/*.php

at least all those changes get reverted by running csfixer, so it seems to be "just" a reporting issue.

sorry late the reply; as for benchmarks:

multicore

$  php8.3 /home/easteregg/src/PHP-CS-Fixer/php-cs-fixer fix
PHP CS Fixer 3.52.2-DEV 15 Keys by Fabien Potencier, Dariusz Ruminski and contributors.
PHP runtime: 8.3.4
Running analysis on 16 cores with 10 files per process.
Parallel runner is an experimental feature and may be unstable, use it at your own risk. Feedback highly appreciated!
Loaded config default from "/home/easteregg/src/eos/tickeos/core/.php-cs-fixer.dist.php".
 5268/5340 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░]  98%
Fixed 0 of 5340 files in 13.373 seconds, 14.000 MB memory used

vs singlecore

 $  php8.3 /home/easteregg/src/PHP-CS-Fixer/php-cs-fixer fix
PHP CS Fixer 3.52.2-DEV 15 Keys by Fabien Potencier, Dariusz Ruminski and contributors.
PHP runtime: 8.3.4
Running analysis on 1 core sequentially.
Loaded config default from "/home/easteregg/src/eos/tickeos/core/.php-cs-fixer.dist.php".
 5276/5340 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░]  98%
Fixed 0 of 5340 files in 86.950 seconds, 24.000 MB memory used

btw, i just noticed now, that also running single-threaded yields the same report bug 🤔

@Wirone
Copy link
Member Author

Wirone commented Apr 2, 2024

@verfriemelt-dot-org so basically there's some issue with reporting progress, but not related to parallel runner - it's good information from this PR's perspective 🙂. Still interesting, did you run with -vvv --verbose? Please open a separate issue if you find a way to reproduce this behaviour.

In terms of parallelisation - good improvement in your case 😊.

@staabm
Copy link
Contributor

staabm commented Apr 2, 2024

Running analysis on 16 cores with 10 files per process.

10 files per process sounds like small batch size. does a worker die after processing a batch or will it work thru another batch afterwards?

the perf benefit of parallel workers (and also the memory consumption) will grow with a bigger batch size

@Wirone
Copy link
Member Author

Wirone commented Apr 2, 2024

@staabm in this case it does not matter that much. The main process spawns N workers and each worker gets a file batch, when the worker finishes processing it reports it to the main process and it gets another batch. In the meantime the analysis result is reported per-file to keep real-time progress output. Increasing the batch size will only decrease the amount of server-worker requests (batches) and the worker-server confirmations, analysis reports will be exactly the same. To utilise more CPUs smaller batch size is better, at least for small projects where a bigger chunk would hit total file size sooner (e.g. for a project with 500 files and batch set to 100 there would be only 5 workers, so if there are more CPUs available it would not use the resources optimally). It's configurable though, so everyone can set it as they want 😊.

@keradus
Copy link
Member

keradus commented May 5, 2024

----- marker

btw, git-rebase breaks links for comments like one in: #7777 (comment)

src/Console/Command/WorkerCommand.php Show resolved Hide resolved
Comment on lines +24 to +32
// Actions handled by the runner
public const RUNNER_ERROR_REPORT = 'errorReport';
public const RUNNER_HELLO = 'hello';
public const RUNNER_RESULT = 'result';
public const RUNNER_GET_FILE_CHUNK = 'getFileChunk';

// Actions handled by the worker
public const WORKER_RUN = 'run';
public const WORKER_THANK_YOU = 'thankYou';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Actions handled by the runner
public const RUNNER_ERROR_REPORT = 'errorReport';
public const RUNNER_HELLO = 'hello';
public const RUNNER_RESULT = 'result';
public const RUNNER_GET_FILE_CHUNK = 'getFileChunk';
// Actions handled by the worker
public const WORKER_RUN = 'run';
public const WORKER_THANK_YOU = 'thankYou';
// Actions handled by the runner and created by worker
public const RUNNER_ERROR_REPORT = 'errorReport';
public const RUNNER_HELLO = 'hello';
public const RUNNER_RESULT = 'result';
public const RUNNER_GET_FILE_CHUNK = 'getFileChunk';
// Actions handled by the worker and created by runner
public const WORKER_RUN = 'run';
public const WORKER_THANK_YOU = 'thankYou';

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's superfluous - if action is handled by the worker it means the action was requested by the runner and vice versa. I don't feel the need to add it explicitly here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my habit in projects was to name things after the producer and not the consumer, as there could be many consumers but usually dedicated, single producer.

I know here is 1-1 relation, but it took me a bit to switch the habit.

If no harm to accept suggestion, please consider it. Or maybe switch to Actions consumed by...

Comment on lines +267 to +269
// Worker requests for another file chunk when all files were processed
foreach ($workerResponse['errors'] ?? [] as $workerError) {
$this->errorsManager->report(new Error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment here - how it's connected to iterating over errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, I don't know why it's here. Probably added here by mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, then we can drop it :)

});
},
static function (\Throwable $error) use ($errorOutput): void {
$errorOutput->writeln($error->getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to provoke this to run (throwing throw new \LogicException("aaa") few lines above) and I didn't notice any result of this callback.
When does this happen?
should we redirect it to $out->on(error...) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's related to React connectivity errors (onRejected) and is printed to the output of worker's stdErr, see WorkerCommandTest::testWorkerCantConnectToServerWhenExecutedDirectly(). But most probably it should throw an exception that would be printed by Application::doRenderThrowable() in format that could be parsed by the main runner and displayed to the user 🤔.

'using-cache' => $input->getOption('using-cache'),
'cache-file' => $input->getOption('cache-file'),
'diff' => $input->getOption('diff'),
'stop-on-violation' => false, // @TODO Pass this option to the runner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memo ;)

*
* @covers \PhpCsFixer\Runner\Parallel\ParallelConfig
*
* @TODO Test `detect()` method, but first discuss the best way to do it.
Copy link
Member

@keradus keradus May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, if CpuCoreCounter would provide interface, we would be easily able to mock it...
(one of reasons why I like to have interfaces all around, and not only implementations)

non-actionable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think of it, maybe we can do something like this (following your changes from Wirone#5):

final class ParallelConfigFactory
{
    private static ?CpuCoreCounter $cpuDetector = null;

    private function __construct() {}

    public static function sequential(): ParallelConfig
    {
        return new ParallelConfig(1);
    }

    /**
     * @param null|positive-int $filesPerProcess
     * @param null|positive-int $processTimeout
     */
    public static function detect(
        ?int $filesPerProcess = null,
        ?int $processTimeout = null,
    ): ParallelConfig {
        if (null === self::$cpuDetector) {
            self::$cpuDetector = new CpuCoreCounter([
                ...FinderRegistry::getDefaultLogicalFinders(),
                new DummyCpuCoreFinder(1),
            ]);
        }

        return new ParallelConfig(
            ...array_filter(
                [$counter->getCount(), $filesPerProcess, $processTimeout],
                static fn ($value): bool => null !== $value
            )
        );
    }
}

and then in tests we could do:

$parallelConfigFactoryReflection = new \ReflectionClass(ParallelConfigFactory::class);
$cpuDetector = $parallelConfigFactoryReflection->getProperty('cpuDetector');
$cpuDetector->setAccessible(true);
$cpuDetector->setValue($parallelConfigFactoryReflection, new CpuCoreCounter([
    new DummyCpuCoreFinder(7),
]));

$config = ParallelConfigFactory::detect(1, 100);

self::assertSame(7, $config->getMaxProcesses());

That brings 2 advantages: CPU detector is initialised only once + we can "mock" it. I've tested it locally and it works. Feels little dirty, but does what's needed 😅.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with your suggestion 👍🏻

(agree about "dirty" part for using reflection to access private property - but again hay, no interface to mock)

}

// Worker requests for another file chunk when all files were processed
foreach ($workerResponse['errors'] ?? [] as $workerError) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand right, on RUNNER_RESULT action the errors payload is result of ErrorsManager->forPath(): list<Error>, the variable called $workerError is misleading with WorkerError class. Can we rename it ?

Suggested change
foreach ($workerResponse['errors'] ?? [] as $workerError) {
foreach ($workerResponse['errors'] ?? [] as $error) {

}

/**
* @requires OS Linux|Darwin
Copy link
Member

@keradus keradus May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially blocking

how will parallel execution behave on Windows?
if it would not work, let's have a check when enabling it under windows to crash and complain (or auto-switch to non-parallel execution)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cause of this lays in react/child-process component

  1. PhpCsFixer\Tests\Console\Command\WorkerCommandTest::testWorkerCommunicatesWithTheServer
    Process pipes are not supported on Windows due to their blocking nature on Windows

But at the same time running Fixer in parallel works on Windows, so it's rather not about parallel runner itself, but about how tests are created. Maybe I did something wrong and just couldn't work around it. @WyriHaximus do you know why I got that error, since React processes are created with explicit $fds:

$this->process = new ReactProcess($this->command, null, null, [
    1 => $this->stdOut,
    2 => $this->stdErr,
]);

(stdOut and stdErr are created with tmpfile()).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can put it as a comment to test, would be awesome.

then, non-actionable for me (if we enable this test one day, awesome. if not, OKish)

keradus added a commit to keradus/PHP-CS-Fixer that referenced this pull request May 5, 2024
…dicate it shall not be called from outside of this class, ref PHP-CS-Fixer#7777 (comment)
Copy link
Member

@keradus keradus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I feel really shitty that we need whole this own implementation and kind-of simulate parallel execution by spawning TCP process.
PHP really needs proper, native, syntax-supported async execution.
Yet, I ack that it's not actionable till we have proper support on PHP. (sorry for venting)


I reviewed whole yet another round. Overall, quite a nice job 🎉 and I have few requests to it.

Actionable:

*
* @internal
*/
final class WorkerError
Copy link
Member

@keradus keradus May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not enjoy this.
I wanted Errors to be recoverable issues related to wrong input or single fixer failure (eg producing invalid code).

Here, this is Exception like error on parallelization itself.
Please:

  • call it exception and not error
  • move it away from ErrorsManager
  • move it to src/Parallel namespace
  • when encountered in Runner (via cross-process message), crash the execution (same as if exception would happen in Runner and not in side-process)

@Wirone Wirone mentioned this pull request May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/core Core features of Fixer's engine topic/I/O
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for parallelisation of analysis (utilise several CPUs)