feat: Add FixerBlame class and one test #7865

connorhu · 2024-03-05T08:32:37Z

First step to fixing #7171 and similar issues

What are the known problems:

slow. Instead of constantly converting tokens to text, the diff should be done on the tokens
class names, class locations, array key names
slow
does not use the result even when producing the output
slow
low amount of test files
and slow

It will remain a draft until the known problems are solved.

Note: Since it works on the same files, the PR is rebased to the Wirone:codito/bombazo branch. Once #7777 is merged, it can be rebased back to the master.

Empty for now, because logic for running fixers must be moved here. Basically it's a PoC of command being hidden - works on PHP 7.4 and 8.3 with latest Composer packages on each version. Command's arguments/options are more or less what we need in the worker, but it may be changed later.

The same collection of files has to be determined for both sequential and parallel run.

It's working, but needs polishing here and there. It speeds up analysis significantly (like 5x for this codebase on my computer).

Parallel runner must be introduced slowly, and start as an opt-in. When it's stable enough, we can change it to be default one.

Using `ParallelConfig::detect()` can be used for utilising all available resources. Tested by running analysis natively on the host and inside Docker (with core limit set on the OrbStack level). In both scenarios it detects available cores correctly.

…o the error manager

Detected by Composer Require Checker

Cache check is done during file iteration - if file is in cache and wasn't changed, then it's skip by iterator and doesn't even go to the worker. That's why `ReadonlyCacheManager` used on worker's side is a bit superfluous, but I thought it's safer to use it instead of `NullCacheManager`. Anyway, each file is stored to cache when worker reports analysis result to the parallelisation operator, so it's basically similar to sequential analysis, because file write is done only in one process.

Co-authored-by: Julien Falque <julien.falque@gmail.com>

… PHPStan narrowed types

Use absolute file path for reporting analysis result and convert it to relative path when processing on parallelisation operator's side. It gives the same effect in terms of caching, but does not require `DirectoryInterface::getAbsolutePath()`.

It improves caching, because cache save can be triggered more frequently and errors or interrupting during processing large chunk won't cause analysis info being lost (which would lead to re-analysing file that was already processed).

- extract constants to dedicated class with more fitting name - use constants in every place where communication between runner and worker occurs

There were 2 problems: - when child processes were spawned, `ArgvInput` was created to pass the arguments/options to the worker, but when fix/check command was executed via command tester, `ArgvInput` was using PHPUnit's argv from top-level command (and was failing with "The "--configuration" option does not exist.") - Passing `InputInterface` instance from command to process (did an experiment with static property set in command to skip passing input through the Runner instance) was not enough because output was invalid and tests were failing. I hope we can figure out something better after discussion on code review 😉.

Some of the tests are little hacky at this point, maybe it's possible to refactor a bit to achieve better testability. Good enough for now, though.

coveralls · 2024-03-05T09:48:52Z

coverage: 95.03% (-1.0%) from 96.029%
when pulling 57bcf57 on connorhu:feature/real-line-numbers
into b68c8ba on PHP-CS-Fixer:master.

devnix · 2024-03-08T11:10:36Z

Would this solve the use-case for checkstyle reports? #3601

connorhu · 2024-03-08T11:17:03Z

@devnix Yes, that is my goal.

Wirone · 2024-03-08T11:48:33Z

I don't have currently capacity for thorough review here, but please consider #7777 in your diffing/blaming process, as parallel processing is ready and should be merged before any other crucial changes related to the fixing process are made. With that approach you would need to pass blame info back from worker to the main process.

Also I don't like the experimental namespace, it should rather be in some "normal" namespace, but marked with @internal / @experimental.

connorhu · 2024-03-08T12:18:11Z

@Wirone There's still time for a review after I've solved the known problems, so it's not current yet.

keradus · 2024-03-16T22:45:41Z

src/FixerBlame/CodeChange.php

@@ -0,0 +1,51 @@
+<?php


please avoid having all commits of Wirone in your PR, as they are other PR and making diff for your PR crazy

I believe it's an outcome of my comment, but I only asked for taking the parallel runner into consideration (make sure it will work), not to rebase on my work 😅. It could work if we had branches in the same repo, and a proper target branch was selected, but not like this. Hard to tell what is the best approach here, maybe a draft with a branch based on master (like before), but waiting until my PR is merged?

I wanted to align it with PR #7777 so I rebaseled it to see how much conflict there was (fortunately I only had to resolve 2-3 things). @Wirone's work is the more polished, more important and is expected to be in the master sooner. As mentioned in the PR description, once the parallel feature is merged this can be rebased back to the master and then the commits for the parallel run will be removed. There is no point in starting the review until there are implementation (and performance) issues with the PR anyway.

keradus · 2024-03-16T22:47:09Z

src/FixerBlame/CodeChange.php

+ * with this source code in the file LICENSE.
+ */
+
+namespace PhpCsFixer\FixerBlame;


please describe approach you want to take in RFC style before implementing it. it will help reviewers to recognise what and how you want to do, and help you to align on direction before spending time on implementing it.

right now, I'm happy with udiff and don't seeing blame-named class is very confusing.

One of the problems with the implementation at the moment is that, because of the way udiff works, I have to do a lot of manual work to track exactly which lines, which fixer, made what changes (which lines were removed and which lines were inserted). One sign of this is that I keep converting the token array back to text and running diff on it (and then trying to identify the separate patches in the diff output). The next logical step would be to see how much work it would be to modify udiff to be able to do its job on any data structure (in this case a Token[]). If this is a solvable problem, then 1) I would use less resources 2) I wouldn't have to deal with the problem of diffing the token array.

The theory works anyway, because it can accurately track multi-step transformations, but it's also obvious that in its current form it doesn't just do what it's actually supposed to do.

That's why I can't describe in any style what happens, because it doesn't happen the way I want to see it happen and 1000% I'll have to rewrite the blame class if udiff knows the comparison.

I failed to understand the comment.

Please describe the problem you want to solve and describe the concept of solution, before you go deep into blame class.
Without that I fail to understand why you even want to tech udiff to understand Token[]

What is the problem?
The problem is that for various outputs, the violations indicate incorrect line and character positions.

What is a possible solution to this problem?
a) remove the outputs or not report the line and character numbers
b) the fixers let the Tokens class know what they have done
c) keep track from fixer to fixer which fixer changes what

What is the problem with which approach?

a) while I'm all for something either working well or not existing at all, this is obviously not the answer
b) touching all fixers, modifying existing interfaces is a lot of work and an unwanted BC break
c) this might work if we can somehow track what happened between two fixer states, because then we only have to track the change within the Runner and don't have to go deeper, but it is inaccurate

What tools do we currently have to monitor changes?
Currently Sebastian's library Diff is available, which he basically created for texts or arrays of texts, and he told me that he didn't want to change that, which is acceptable.
With the implementation I have made, I currently only have the ability to track line level changes and in addition I always have to convert Tokens back to text to diff two states together.
This back-conversion currently means unwanted memory usage and unnecessary iterations, so it's not ideal yet. This is where I am at the moment:

Sebastian's tool is given and cannot be changed

It is not practical to import a new diff library just to use it in this one case

Implementing my own diff just to keep track of token changes is a lot of work and logic that is only present because of one function.

The idea works as long as you don't put two violations in a line:

if ($variable !== false && $variable2 !== false)

That's two yoda style violations, but due to the limitations of diff I see this as one. So far I am not happy with the solution for this. As long as I am not satisfied with what my solution can do, there is no point in talking about how to integrate it nicely.

Anyway, I'm now wondering if just changing the Tokens class is enough to keep track of the changes.

imagine the case:

<?php class A { public method zzz() {} ## line 2 public method aaa() { var_dump("a" == "b"); ## line 5 } }

1st fixer moves method zzz from top of file to bottom of file

2nd fixer change content of method (eg == => ===)

which lines should be reported as modified?
imho 1st fixer: line 2 and 2nd fixer: line 5.
But when 2nd fixer is applied, line 5 is actually line 3.

i do not want to touch Tokens

i do not want to have own logic for tracking modified lines (Eg PatchInfo)

I suggest a new command/BC break that is not applying one fixer on top of another (like we need for fix), but one after another , separately, always against original file variant - that way, you can determine modified line (with current udiff approach) for original code.
That would work only for check and not for fix (for when we actually want to have those reports) and will make check blind for changes that are based on top of changes done by other rule (so check may detect less things that fix would do, because fix will apply changes on modified file variant)

I achieved quite nice results on complex code with the solution I made ( https://github.com/connorhu/PHP-CS-Fixer/blob/57bcf57cde0cc5844070eb62ed38a5320085eb35/tests/Fixtures/Integration/misc/blame_test.test#L46-L56 ). The function will only work correctly if it finds the original position where the violation occurs, and now the code can do it (mostly). I'm between two extremes at the moment: in the Runner class I lose some details, and in the Tokens class I get too much detail about the changes. Anyway, the potential of the Tokens class is very promising, but the setCode method makes the changes untraceable (fortunately only 1 fixer uses it as far as I can see).

In the beginning I did as you wrote. I made a "complex" code, set several fixers and saved each step in a separate file, then diffed them. I wrote down what I wanted to see (what the logical end result would be) and iterated until I got what the logical end result is.

I suggest a new command...

I'll sleep on it and think about it.

pssst, this one already exists - check (just it's checking one after another and not independently vs original code - the line 5 vs line 3 topic I mentioned)

pssst, this one already exists - check (just it's checking one after another and not independently vs original code - the line 5 vs line 3 topic I mentioned)

I know, so I don't understand why the code you wrote is an issue. Fixing only fixes, checking only checks.
The "think about it" was to see what options are available if you don't want to modify the Tokens class and don't want to work on the diff result. For now it feels like a step backwards, but I have to think about it.

Your example code results in two violations:

method order at line 2 char 0

triple equal sign comparison operator at line 5 at char 0

In both cases, char 0 is still an incorrect result, but this is a side effect of the diff.

For me it's already an issue:

var_dump("a" == "b", "c" == "b");

and

var_dump("a" == "b"); var_dump("c" == "b");

My solution takes these as one violation. Which is not ok.

I can see that you don't want to deal with such things, which you have the possibility to do anyway, by removing the output formats (or outsourcing them from the core code to a plugin system).

Wirone and others added 30 commits March 1, 2024 02:34

Configuration flow for parallel run

9410b9f

Extract factory method for linting aware file iterator

9319a3e

The same collection of files has to be determined for both sequential and parallel run.

Initial implementation of parallel analysis 🎉

351f48c

It's working, but needs polishing here and there. It speeds up analysis significantly (like 5x for this codebase on my computer).

Use sequential runner by default

a2bbbc2

Parallel runner must be introduced slowly, and start as an opt-in. When it's stable enough, we can change it to be default one.

Display information about parallel runner being experimental feature

3c91819

Auto detection for parallel config

af14b86

Using `ParallelConfig::detect()` can be used for utilising all available resources. Tested by running analysis natively on the host and inside Docker (with core limit set on the OrbStack level). In both scenarios it detects available cores correctly.

Use parallel runner with config auto-detection for the project itself

2769ad3

Pass analysis errors from worker to the main runner and report them t…

622e675

…o the error manager

Use exact same PHP binary for worker process

cc98df0

Remove quasi-parallel Composer script

d20f3aa

Document usage of parallel runner

aa27b5d

Allow only static factory for creating Process

be5a196

Determine files count early from simple iterator

0e60e18

Revert BC break in Runner signature

847a7b1

Improve type in Error constructor

d873e38

Fix test for fallback ParallelConfig

396b5de

Fix CS

24c15b9

Explicitly require react/stream

7e7e596

Detected by Composer Require Checker

Introduce DirectoryInterface::getAbsolutePath()

c8b83a9

Save cache info in parallel analysis only when it makes sense

dc2b8c1

Apply Julien's suggestions from code review

7c8ecbe

Co-authored-by: Julien Falque <julien.falque@gmail.com>

Sync --using-cache description

c889d18

Use InvalidArgumentException for parallelisation misconfiguration +…

a4fb45a

… PHPStan narrowed types

Fix BC-break in DirectoryInterface

63e4dfc

Use absolute file path for reporting analysis result and convert it to relative path when processing on parallelisation operator's side. It gives the same effect in terms of caching, but does not require `DirectoryInterface::getAbsolutePath()`.

Clean up parallel actions handling

1f80310

- extract constants to dedicated class with more fitting name - use constants in every place where communication between runner and worker occurs

Add most of the tests

0cf69be

Some of the tests are little hacky at this point, maybe it's possible to refactor a bit to achieve better testability. Good enough for now, though.

connorhu force-pushed the feature/real-line-numbers branch 4 times, most recently from 0ae17bc to 6203087 Compare March 5, 2024 09:12

connorhu changed the title ~~Added FixerBlame tool and one test~~ feat: Add FixerBlame class and one test Mar 5, 2024

connorhu force-pushed the feature/real-line-numbers branch 5 times, most recently from 991cd90 to b66693b Compare March 5, 2024 09:42

connorhu force-pushed the feature/real-line-numbers branch 10 times, most recently from 01be641 to b5d1bc0 Compare March 9, 2024 06:04

feat: add FixerBlame tool and one test

57bcf57

connorhu force-pushed the feature/real-line-numbers branch from b5d1bc0 to 57bcf57 Compare March 9, 2024 06:07

keradus reviewed Mar 16, 2024

View reviewed changes

keradus force-pushed the master branch from a968b6e to 28209a2 Compare April 17, 2024 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add FixerBlame class and one test #7865

feat: Add FixerBlame class and one test #7865

connorhu commented Mar 5, 2024 •

edited

coveralls commented Mar 5, 2024 •

edited

devnix commented Mar 8, 2024

connorhu commented Mar 8, 2024

Wirone commented Mar 8, 2024

connorhu commented Mar 8, 2024

keradus Mar 16, 2024 •

edited

Wirone Mar 16, 2024

connorhu Mar 17, 2024 •

edited

keradus Mar 16, 2024

connorhu Mar 17, 2024

keradus Mar 17, 2024 •

edited

connorhu Mar 17, 2024 •

edited

connorhu Mar 17, 2024

keradus Mar 18, 2024

connorhu Mar 18, 2024 •

edited

keradus Mar 18, 2024

connorhu Mar 19, 2024

feat: Add FixerBlame class and one test #7865

Are you sure you want to change the base?

feat: Add FixerBlame class and one test #7865

Conversation

connorhu commented Mar 5, 2024 • edited

coveralls commented Mar 5, 2024 • edited

devnix commented Mar 8, 2024

connorhu commented Mar 8, 2024

Wirone commented Mar 8, 2024

connorhu commented Mar 8, 2024

keradus Mar 16, 2024 • edited

Choose a reason for hiding this comment

Wirone Mar 16, 2024

Choose a reason for hiding this comment

connorhu Mar 17, 2024 • edited

Choose a reason for hiding this comment

keradus Mar 16, 2024

Choose a reason for hiding this comment

connorhu Mar 17, 2024

Choose a reason for hiding this comment

keradus Mar 17, 2024 • edited

Choose a reason for hiding this comment

connorhu Mar 17, 2024 • edited

Choose a reason for hiding this comment

connorhu Mar 17, 2024

Choose a reason for hiding this comment

keradus Mar 18, 2024

Choose a reason for hiding this comment

connorhu Mar 18, 2024 • edited

Choose a reason for hiding this comment

keradus Mar 18, 2024

Choose a reason for hiding this comment

connorhu Mar 19, 2024

Choose a reason for hiding this comment

connorhu commented Mar 5, 2024 •

edited

coveralls commented Mar 5, 2024 •

edited

keradus Mar 16, 2024 •

edited

connorhu Mar 17, 2024 •

edited

keradus Mar 17, 2024 •

edited

connorhu Mar 17, 2024 •

edited

connorhu Mar 18, 2024 •

edited