New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async fixing #4024
Comments
it's not possible to run fixers per file in parallel, as there are dependencies between fixers, they have to run in proper order. it's possible to run files in parallel |
Thats what I mean. Run all files in parallel but the fixers per file in the same process. In case such a thing would be welcome? |
in general - yes. |
For a first prototype I would use a fixed number of subprocesses and let run php-cs-fixer per file while managing the remaining work and the result printing from a master process. More or less parallelizing this loop |
👎 for that. also, during fixing of one file, we are already linting next one in background. finally, if there are 4 cores, it's pointless to create 1000 processes. what we need is a WorkerPool / JobPool, that would be feeded with next and next jobs (files) and assign file to one of fixed amount of workers in the pool (== number of cores) |
I totally agree, and thats also how I imagine it to work in the final version. for the first shot, I want to get a feeling how much speedup we can get, without a big investment in code-changes. |
then, just use |
some initial numbers:
with the following setup:
on a 4 CPUs ubuntu16 vmware VM takeaway
=> A) so a initial obvious optimization would be to try to speedup php-cs-fixer for the special case of fixing only 1 file |
so, as I said, WorkerPool / JobPool |
after having another a more in-depth look into the codebase, I guess we can speedup things by using async io for some components. step1: |
please find some initial coding on how a async io based linter could look like: |
yes, we need to read the file to lint in in TokenizerLinter - but what we just loaded to internal buffer and what we just gonna lint, in next step we gonna fix - for which we need to have the file read from IO anyway |
After thinking a bit more about the problem and possible solutions I came to another idea: I will try to use pcntl fork, so we dont have to pay the bootstrap cost for each worker. |
Don't waste your time:
|
I am fine with a feature detected parallization feature (which can work on windows via linux subsystem) - in case it doesnt require ugly code and leads to a decent perf improvement |
I'm picking up by the comment from @staabm #4024 (comment)
The issue here is the massive overhead of spawning a new process for each file 😅 as was pointed out already. Besides coding a solution, with First, for this experiment, php-cs-fixer needs a one-line change because currently it does not accept multiple paths provided on the command line: diff --git a/src/Console/ConfigurationResolver.php b/src/Console/ConfigurationResolver.php
index 72352c350..29041bf5e 100644
--- a/src/Console/ConfigurationResolver.php
+++ b/src/Console/ConfigurationResolver.php
@@ -599,7 +599,7 @@ private function computeConfigFiles()
if ($this->isStdIn() || 0 === \count($path)) {
$configDir = $this->cwd;
} elseif (1 < \count($path)) {
- throw new InvalidConfigurationException('For multiple paths config parameter is required.');
+ $configDir = $this->cwd;
} elseif (!is_file($path[0])) {
$configDir = $path[0];
} else { This now allows to call My sample set (some private project):
Running a single invocation:
Now, using
Or going more extreme:
or
The improvements don't increase linearly with the number of processes but still are very measurable. Note: tests were conducted on a MacBook Pro (16-inch, 2019), 2.4Ghz 8-core i9 This is due to the diversity of files to scan, something which also already was pointed out
However, I do not agree with the conclusion
But the answer was given via the follow-up comment #4024 (comment)
The Further:
I guess some challenges are the flow of data back:
ps: any idea about the "For multiple paths config parameter is required" limitation? |
|
nice reading, @mfn ;) It was never a big prio for core project maintainers, as usually the performance problem is happening only for FIRST run of the tool. Then, when we have If we would build-in some solution, i would suggest to either use some 3rd party solution, so we can use all the hard work from open source community and not come up with something custom, eg swoole php or other lib. |
Thx for the followup. We see php-cs fixing taking several minutes in CI builds - which was the initial motivation for this change. I think the most strait forward (and less maintenance work) way forward would be to allow several file paths as cli arguments and do the fork/parallel stuff at bash level with Do you guys agree? |
do you have the cache file in your CI? if not, consider adding it
possible since creation of this github issue. just explicitly fingerpoint the config that should be used. |
🤦 Now it hit me, I didn't get this from the other issue. I can confirm, this works with an official release: "problem solved" ;) then I guess |
The drawback here, btw, is that you can't use the "finder" as defined per the I initially tried to come up with some super smart separation of the finder config from the |
i tried doing so with several differents attempts but it did not work. Maybe the reason is git does not properly set modification stamps of files when cloning or similar.. don‘t know whats going on..
maybe it would make sense to add a command which prints a list of files (like e.g. find would) but based on thr $finder used in the config, so we can use this result with xargs to parallelize stuff? |
phpunit supports for example |
I can't believe I bothered coming up with this hack, but this solves my needs for faster fixing in CI (where, as pointed out, the cache does not work reliable => I've the same issues with phpstan and CI caching, FWIW) by re-using the defined "finder":
=> 🤦 but works. |
just went ahead and added a new |
…keradus) This PR was merged into the 2.19-dev branch. Discussion ---------- feature #4024 added a `list-files` command as discussed in #4024 this PR proposes a `list-files` command, which from a user-perspective can be used to utilize `xargs` to get some basic parallelization into php-cs-fixer. this is especially interessting for CI build cases, in which the php-cs-fixer builtin caching mechanism doesn't work (e.g. because git does use the correct file modification stamps on clone or the CI build is running within a readonly environment). example usage: ``` vendor/bin/php-cs-fixer list-files | xargs -n 10 -P 8 vendor/bin/php-cs-fixer fix --config=.php_cs.dist --using-cache=no ``` `-n` defines how many files a single subprocess process `-P` defines how many subprocesses the shell is allowed to spawn for parallel processing (usually similar to the number of CPUs your system has) see xargs help page: https://wiki.ubuntuusers.de/xargs/ as can be seen in #4024 (comment) you might get a perf boost by a factor of 3-6x depending on the number of files beeing fixed, number of CPUs etc. this isn't the most elegant way of how parallelzation can be applied on the php-cs-fixer case, but its a really simple one which works with very simple means and therefore has a rather low entry barrier. you don't need fancy php extension or any rather complicated setup. simple bash means are enought to get a decent perf boost. also its pretty simple from php-cs-fixer perspective to keep the maintenance burden as low as possible. in case you would consider this change acceptable, I am willing to provide automated tests, docs and everything else required and missing atm. any opinions? Commits ------- ed65a9e Update doc/usage.rst c95d430 Update doc/usage.rst 81753b4 try getRealPath() a3ea03a fix test on windows a32a8ff fix typo add9542 to provide multiple files to fix command, you need to specify config file d3f420d fis CS d107a1f fix typo 47acf3a Update doc/usage.rst 172d577 fix deprecation 3f6a255 fix CS d9dbec3 try to fix windows compat 06beee5 fix test ac8773d .php_cs -> .php-cs-fixer.php 925f47c added in/out test 4e93714 moved ListFilesTest project into test/Fixtures/ 725cc86 added separate path for SplFileInfo and sfSplFileInfo 444500c docs 7e92ca7 fix test-expectation 2f9cd68 fix test 5d2104d added ListFilesCommandTest 1f6f32b rm un-used imports 44bbe92 use relative paths instead of real-path to shorten output 22315d9 remove getHelp(), seems to be only used for the FixCommand 8c87ac9 Update src/Console/Command/ListFilesCommand.php 2a25c8c alphabetical order 149d757 removed un-used imports 302733d added a list-files command
if you have any further improvements, we are happy to welcome a PR! :) |
The project is already able to do linting in a parallel process.
Wouldnt it make sense to run the fixers in parallel per file?
(Assuming this would speedup the process considerably)
Whats your opinion?
The text was updated successfully, but these errors were encountered: