New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add FixerBlame class and one test #7865
base: master
Are you sure you want to change the base?
Conversation
Empty for now, because logic for running fixers must be moved here. Basically it's a PoC of command being hidden - works on PHP 7.4 and 8.3 with latest Composer packages on each version. Command's arguments/options are more or less what we need in the worker, but it may be changed later.
The same collection of files has to be determined for both sequential and parallel run.
It's working, but needs polishing here and there. It speeds up analysis significantly (like 5x for this codebase on my computer).
Parallel runner must be introduced slowly, and start as an opt-in. When it's stable enough, we can change it to be default one.
Using `ParallelConfig::detect()` can be used for utilising all available resources. Tested by running analysis natively on the host and inside Docker (with core limit set on the OrbStack level). In both scenarios it detects available cores correctly.
…o the error manager
Detected by Composer Require Checker
Cache check is done during file iteration - if file is in cache and wasn't changed, then it's skip by iterator and doesn't even go to the worker. That's why `ReadonlyCacheManager` used on worker's side is a bit superfluous, but I thought it's safer to use it instead of `NullCacheManager`. Anyway, each file is stored to cache when worker reports analysis result to the parallelisation operator, so it's basically similar to sequential analysis, because file write is done only in one process.
Co-authored-by: Julien Falque <julien.falque@gmail.com>
… PHPStan narrowed types
Use absolute file path for reporting analysis result and convert it to relative path when processing on parallelisation operator's side. It gives the same effect in terms of caching, but does not require `DirectoryInterface::getAbsolutePath()`.
It improves caching, because cache save can be triggered more frequently and errors or interrupting during processing large chunk won't cause analysis info being lost (which would lead to re-analysing file that was already processed).
- extract constants to dedicated class with more fitting name - use constants in every place where communication between runner and worker occurs
There were 2 problems: - when child processes were spawned, `ArgvInput` was created to pass the arguments/options to the worker, but when fix/check command was executed via command tester, `ArgvInput` was using PHPUnit's argv from top-level command (and was failing with "The "--configuration" option does not exist.") - Passing `InputInterface` instance from command to process (did an experiment with static property set in command to skip passing input through the Runner instance) was not enough because output was invalid and tests were failing. I hope we can figure out something better after discussion on code review 😉.
Some of the tests are little hacky at this point, maybe it's possible to refactor a bit to achieve better testability. Good enough for now, though.
0ae17bc
to
6203087
Compare
991cd90
to
b66693b
Compare
Would this solve the use-case for checkstyle reports? #3601 |
@devnix Yes, that is my goal. |
I don't have currently capacity for thorough review here, but please consider #7777 in your diffing/blaming process, as parallel processing is ready and should be merged before any other crucial changes related to the fixing process are made. With that approach you would need to pass blame info back from worker to the main process. Also I don't like the experimental namespace, it should rather be in some "normal" namespace, but marked with |
@Wirone There's still time for a review after I've solved the known problems, so it's not current yet. |
01be641
to
b5d1bc0
Compare
b5d1bc0
to
57bcf57
Compare
@@ -0,0 +1,51 @@ | |||
<?php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid having all commits of Wirone in your PR, as they are other PR and making diff for your PR crazy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's an outcome of my comment, but I only asked for taking the parallel runner into consideration (make sure it will work), not to rebase on my work 😅. It could work if we had branches in the same repo, and a proper target branch was selected, but not like this. Hard to tell what is the best approach here, maybe a draft with a branch based on master (like before), but waiting until my PR is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to align it with PR #7777 so I rebaseled it to see how much conflict there was (fortunately I only had to resolve 2-3 things). @Wirone's work is the more polished, more important and is expected to be in the master sooner. As mentioned in the PR description, once the parallel feature is merged this can be rebased back to the master and then the commits for the parallel run will be removed. There is no point in starting the review until there are implementation (and performance) issues with the PR anyway.
* with this source code in the file LICENSE. | ||
*/ | ||
|
||
namespace PhpCsFixer\FixerBlame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please describe approach you want to take in RFC style before implementing it. it will help reviewers to recognise what and how you want to do, and help you to align on direction before spending time on implementing it.
right now, I'm happy with udiff
and don't seeing blame
-named class is very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the problems with the implementation at the moment is that, because of the way udiff
works, I have to do a lot of manual work to track exactly which lines, which fixer, made what changes (which lines were removed and which lines were inserted). One sign of this is that I keep converting the token array back to text and running diff on it (and then trying to identify the separate patches in the diff output). The next logical step would be to see how much work it would be to modify udiff
to be able to do its job on any data structure (in this case a Token[]
). If this is a solvable problem, then 1) I would use less resources 2) I wouldn't have to deal with the problem of diffing the token array.
The theory works anyway, because it can accurately track multi-step transformations, but it's also obvious that in its current form it doesn't just do what it's actually supposed to do.
That's why I can't describe in any style what happens, because it doesn't happen the way I want to see it happen and 1000% I'll have to rewrite the blame
class if udiff
knows the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I failed to understand the comment.
Please describe the problem you want to solve and describe the concept of solution, before you go deep into blame
class.
Without that I fail to understand why you even want to tech udiff
to understand Token[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem?
The problem is that for various outputs, the violations indicate incorrect line and character positions.
What is a possible solution to this problem?
a) remove the outputs or not report the line and character numbers
b) the fixers let the Tokens class know what they have done
c) keep track from fixer to fixer which fixer changes what
What is the problem with which approach?
a) while I'm all for something either working well or not existing at all, this is obviously not the answer
b) touching all fixers, modifying existing interfaces is a lot of work and an unwanted BC break
c) this might work if we can somehow track what happened between two fixer states, because then we only have to track the change within the Runner and don't have to go deeper, but it is inaccurate
What tools do we currently have to monitor changes?
Currently Sebastian's library Diff is available, which he basically created for texts or arrays of texts, and he told me that he didn't want to change that, which is acceptable.
With the implementation I have made, I currently only have the ability to track line level changes and in addition I always have to convert Tokens back to text to diff two states together.
This back-conversion currently means unwanted memory usage and unnecessary iterations, so it's not ideal yet. This is where I am at the moment:
- Sebastian's tool is given and cannot be changed
- It is not practical to import a new diff library just to use it in this one case
- Implementing my own diff just to keep track of token changes is a lot of work and logic that is only present because of one function.
The idea works as long as you don't put two violations in a line:
if ($variable !== false && $variable2 !== false)
That's two yoda style violations, but due to the limitations of diff I see this as one. So far I am not happy with the solution for this. As long as I am not satisfied with what my solution can do, there is no point in talking about how to integrate it nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I'm now wondering if just changing the Tokens
class is enough to keep track of the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagine the case:
<?php
class A {
public method zzz() {} ## line 2
public method aaa() {
var_dump("a" == "b"); ## line 5
}
}
- 1st fixer moves method
zzz
from top of file to bottom of file - 2nd fixer change content of method (eg
==
=>===
)
which lines should be reported as modified?
imho 1st fixer: line 2
and 2nd fixer: line 5
.
But when 2nd fixer is applied, line 5 is actually line 3.
- i do not want to touch Tokens
- i do not want to have own logic for tracking modified lines (Eg
PatchInfo
)
I suggest a new command/BC break that is not applying one fixer on top of another (like we need for fix
), but one after another , separately, always against original file variant - that way, you can determine modified line (with current udiff
approach) for original code.
That would work only for check
and not for fix
(for when we actually want to have those reports) and will make check
blind for changes that are based on top of changes done by other rule (so check
may detect less things that fix
would do, because fix
will apply changes on modified file variant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I achieved quite nice results on complex code with the solution I made ( https://github.com/connorhu/PHP-CS-Fixer/blob/57bcf57cde0cc5844070eb62ed38a5320085eb35/tests/Fixtures/Integration/misc/blame_test.test#L46-L56 ). The function will only work correctly if it finds the original position where the violation occurs, and now the code can do it (mostly). I'm between two extremes at the moment: in the Runner
class I lose some details, and in the Tokens
class I get too much detail about the changes. Anyway, the potential of the Tokens class is very promising, but the setCode method makes the changes untraceable (fortunately only 1 fixer uses it as far as I can see).
In the beginning I did as you wrote. I made a "complex" code, set several fixers and saved each step in a separate file, then diffed them. I wrote down what I wanted to see (what the logical end result would be) and iterated until I got what the logical end result is.
I suggest a new command...
I'll sleep on it and think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pssst, this one already exists -
check
(just it's checking one after another and not independently vs original code - the line 5 vs line 3 topic I mentioned)
I know, so I don't understand why the code you wrote is an issue. Fixing only fixes, checking only checks.
The "think about it" was to see what options are available if you don't want to modify the Tokens
class and don't want to work on the diff result. For now it feels like a step backwards, but I have to think about it.
Your example code results in two violations:
- method order at line 2 char 0
- triple equal sign comparison operator at line 5 at char 0
In both cases, char 0 is still an incorrect result, but this is a side effect of the diff.
For me it's already an issue:
var_dump("a" == "b", "c" == "b");
and
var_dump("a" == "b");
var_dump("c" == "b");
My solution takes these as one violation. Which is not ok.
I can see that you don't want to deal with such things, which you have the possibility to do anyway, by removing the output formats (or outsourcing them from the core code to a plugin system).
First step to fixing #7171 and similar issues
What are the known problems:
It will remain a draft until the known problems are solved.
Note: Since it works on the same files, the PR is rebased to the
Wirone:codito/bombazo
branch. Once #7777 is merged, it can be rebased back to the master.