Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new metric: Test usefulness #1812

Open
Slamdunk opened this issue Feb 1, 2023 · 7 comments
Open

Add new metric: Test usefulness #1812

Slamdunk opened this issue Feb 1, 2023 · 7 comments

Comments

@Slamdunk
Copy link
Contributor

Slamdunk commented Feb 1, 2023

Given:

class Foo
{
    public function twice(int $from): int
    {
        return 2 * $from;
    }
}

/**
 * @covers \Foo
 */
final class FooTest extends TestCase
{
    public function testTwice(): void
    {
        self::assertSame(6, (new Foo())->twice(3));
    }

    public function testTwiceAgain(): void
    {
        self::assertSame(6, (new Foo())->twice(3));
    }
}

Currently Infection reports everything is OK for any of the @default mutators.
Fact is though that FooTest::testTwiceAgain is utterly useless.
What I'd like to have reported is:

 3 mutations were generated:
        3 mutants were killed
        [...]
 
 Metrics:
          Mutation Score Indicator (MSI): 100%
          Mutation Code Coverage: 100%
          Covered Code MSI: 100%
+         Test usefulness: 50%
 
+List of tests that don't fail any configured mutators:
+         - FooTest::testTwiceAgain

https://twitter.com/slamzoe/status/1620719073633107970

@Slamdunk Slamdunk changed the title Add new metric: Tests usefulness Add new metric: Test usefulness Feb 1, 2023
@sanmai
Copy link
Member

sanmai commented Feb 5, 2023

As Infection can infer which test covers which line, IIRC you should be able to figure out which test produces no coverage looking at the coverage report alone. Although this could be a more difficult problem.

Unless this problem is harder than it looks, I feel like it could be better to build a dedicated tool.

@theofidry
Copy link
Member

@sanmai I think it's slightly different though. You can have some code covered by tests, but mutating that code will not break that test (could break another, so would be fine from infection perspective) which indicates this test is likely flawed.

And in order to find out this, you indeed would have no choice but too run infection.

@sanmai
Copy link
Member

sanmai commented Feb 6, 2023

Then we need a better example. Because in case of testNothing() we have no use for Infection.

@Slamdunk
Copy link
Contributor Author

Slamdunk commented Feb 6, 2023

Indeed @sanmai I'm sorry I missed reporting my real use case scenario.

We have a bunch of classes that solve a specific domain goal for which we just need to apply the rules we receive from the client: our gov.
The rules are complex and outside our control: we write them down in the tests and get the tests pass, nothing more.
Some times they don't even bother to check the previous ones: they just pile them one onto another.
I found few useless rules because in fact removing the tests still gets us to 100% MSI score.

I've updated the sample in #1812 (comment): even though FooTest::testTwiceAgain generates alone 100% of CodeCoverage, Infection never runs it because any mutation gets killed by FooTest::testTwice and so it stops running other tests.

@sanmai
Copy link
Member

sanmai commented Feb 6, 2023

Thank you @Slamdunk, it all makes much more sense now.

There's a slight issue here: if Infection sorts the tests from shortest to longest, then longer tests could be seen as inherently less useful, which may not be the case. We might need to decide if we want to maintain the order of tests if a users requests the test usefulness report.

@Slamdunk
Copy link
Contributor Author

Slamdunk commented Feb 6, 2023

Tests order is indeed a factor to count in.
Take another example:

  • testA covers mutations 1
  • testB covers mutations 1 and 2

With [testA,testB] order, both are seen as useful, testA for 1 and testB for 2.
With [testB,testA] order, testA become useless.

@wickedOne
Copy link

We have a bunch of classes that solve a specific domain goal for which we just need to apply the rules we receive from the client: our gov. The rules are complex and outside our control: we write them down in the tests and get the tests pass, nothing more. Some times they don't even bother to check the previous ones: they just pile them one onto another. I found few useless rules because in fact removing the tests still gets us to 100% MSI score.

even though FooTest::testTwiceAgain generates alone 100% of CodeCoverage, Infection never runs it because any mutation gets killed by FooTest::testTwice and so it stops running other tests.

wouldn't these be integration tests rather than unit tests and you could decide not to run them with coverage?

while i really like the idea of this one, it might be a slippery slope given false positives?
so perhaps a different state for this might be unreliable, it surely would be interesting to list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants