Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate is it possible to check duplicates of generated mutators (physical and logical) #1838

Open
maks-rafalko opened this issue Apr 5, 2023 · 5 comments
Labels
DX Developer Experience Performance

Comments

@maks-rafalko
Copy link
Member

maks-rafalko commented Apr 5, 2023

Just out of curiosity as I don't know the full functionality behind.

Isn't there a hashmap containing a list of, let's say, md5 hashes of the content of each mutated file? So that there can be no duplicate mutation?

Or is it because that some mutations look different but produce the same logical outcome?

Originally posted by @icanhazstring in #1753 (comment)

Also, see #1753 (comment)

@maks-rafalko maks-rafalko added DX Developer Experience Performance labels Apr 5, 2023
@icanhazstring
Copy link
Contributor

icanhazstring commented Apr 5, 2023

One idea that I have right now, since the mutation is based on php-parser, is.
We could introduce some kind of LogicNormalizer which applies de morgan's law onto a logical node from php-parser
and produces the normalized version of it.

This normalized version can be used as the caching key.
So every logical expression will be checked against the same normalized version.

-- Edit:
Maybe I can write a small script as a PoC that it might work based on php-parser nodes :)

@icanhazstring
Copy link
Contributor

I think it should be placed here:

public function generate(Node $node): iterable
{
$this->currentNode = $node;
$this->testsMemoized = null;
$this->isOnFunctionSignatureMemoized = null;
$this->isInsideFunctionMemoized = null;
if (!$this->isOnFunctionSignature()
&& !$this->isInsideFunction()
) {
return;
}
if ($this->isForGitDiffLines && !$this->filesDiffChangedLines->contains($this->filePath, $node->getStartLine(), $node->getEndLine(), $this->gitDiffBase ?? GitDiffFileProvider::DEFAULT_BASE)) {
return;
}
foreach ($this->mutators as $mutator) {
yield from $this->generateForMutator($node, $mutator);
}
}

Before the $node is passed to the mutators, we should "replace" the node with normalized logical representation.
This way the hash functionality will stay the same.

@icanhazstring
Copy link
Contributor

icanhazstring commented Apr 5, 2023

Or does it make more sense to have this only for logical mutators?

Like here:

public function mutate(Node $node): iterable
{
yield $this->negateEverySubExpression($node);
}

So that each logical mutator has to call some LogicNormalizer on the passed in $node to normalize the structure.
As this kind of normalization is only needed for these logical mutators.

@maks-rafalko
Copy link
Member Author

maks-rafalko commented Apr 6, 2023

Spent a couple of hours investigating it, here are my thoughts (they also answer to your questions):

  1. Normalizing boolean expressions and finding duplicates is not an easy task. What I found and read about

    • It is suggested to convert any boolean expression to DNF or CNF (see Computational Complexity) and compare such normal forms, finding duplicates
    • I didn't find a good PHP lib that can take e.g. X & (Y || B) ... string and produce CNF/DNF
    • Other algorithms are yet to be investigated if they exist.
  2. But anyway, I have a strong feeling that we should not integrate such algorithm in the runtime. What we should do is to prevent Infection to generate such mutations in the first place. Just like we did here: Trying to remove false positive on logical or. #1801. So, instead of generating N mutations for logical operators (LogicalOr, LogicalAnd, etc.) and then finding duplicates in runtime, we should not generated them at all, teaching Mutators to do as few mutations as possible.

    And this task can be done by creating boolean matrices (truth tables) for each of provided Mutators here: Add negation mutators #1753 (even manually on a paper piece).

  3. I'm sure my small investigation is very high-level and I can be wrong, so any thoughts are welcome.


I also did a small PoC to throw an exception in the code if duplicated Mutant is created (any one, not just logical) by comparing hashes of generated mutated code to find duplicates, I will prepare a PR soon. Though It doesn't solve the root issue, just finds duplicates in generated changed code

@icanhazstring
Copy link
Contributor

icanhazstring commented Apr 6, 2023

Good finding. Generally, I would go for approach 2 first.
Finding it on paper is the simplest solution for now.

Writing a "truth table" test for those might be another addition to that.
We know the input, we know what is expected. We should not only test on a single logical Mutator but the full combination of them. So we make sure the outcome is as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DX Developer Experience Performance
Projects
None yet
Development

No branches or pull requests

2 participants