Add new metric: Test usefulness #1812

Slamdunk · 2023-02-01T12:29:19Z

Given:

class Foo
{
    public function twice(int $from): int
    {
        return 2 * $from;
    }
}

/**
 * @covers \Foo
 */
final class FooTest extends TestCase
{
    public function testTwice(): void
    {
        self::assertSame(6, (new Foo())->twice(3));
    }

    public function testTwiceAgain(): void
    {
        self::assertSame(6, (new Foo())->twice(3));
    }
}

Currently Infection reports everything is OK for any of the @default mutators.
Fact is though that FooTest::testTwiceAgain is utterly useless.
What I'd like to have reported is:

 3 mutations were generated:
        3 mutants were killed
        [...]
 
 Metrics:
          Mutation Score Indicator (MSI): 100%
          Mutation Code Coverage: 100%
          Covered Code MSI: 100%
+         Test usefulness: 50%
 
+List of tests that don't fail any configured mutators:
+         - FooTest::testTwiceAgain

https://twitter.com/slamzoe/status/1620719073633107970

The text was updated successfully, but these errors were encountered:

sanmai · 2023-02-05T12:44:44Z

As Infection can infer which test covers which line, IIRC you should be able to figure out which test produces no coverage looking at the coverage report alone. Although this could be a more difficult problem.

Unless this problem is harder than it looks, I feel like it could be better to build a dedicated tool.

theofidry · 2023-02-05T19:28:48Z

@sanmai I think it's slightly different though. You can have some code covered by tests, but mutating that code will not break that test (could break another, so would be fine from infection perspective) which indicates this test is likely flawed.

And in order to find out this, you indeed would have no choice but too run infection.

sanmai · 2023-02-06T01:22:46Z

Then we need a better example. Because in case of testNothing() we have no use for Infection.

Slamdunk · 2023-02-06T07:33:55Z

Indeed @sanmai I'm sorry I missed reporting my real use case scenario.

We have a bunch of classes that solve a specific domain goal for which we just need to apply the rules we receive from the client: our gov.
The rules are complex and outside our control: we write them down in the tests and get the tests pass, nothing more.
Some times they don't even bother to check the previous ones: they just pile them one onto another.
I found few useless rules because in fact removing the tests still gets us to 100% MSI score.

I've updated the sample in #1812 (comment): even though FooTest::testTwiceAgain generates alone 100% of CodeCoverage, Infection never runs it because any mutation gets killed by FooTest::testTwice and so it stops running other tests.

sanmai · 2023-02-06T08:58:27Z

Thank you @Slamdunk, it all makes much more sense now.

There's a slight issue here: if Infection sorts the tests from shortest to longest, then longer tests could be seen as inherently less useful, which may not be the case. We might need to decide if we want to maintain the order of tests if a users requests the test usefulness report.

Slamdunk · 2023-02-06T09:20:51Z

Tests order is indeed a factor to count in.
Take another example:

testA covers mutations 1
testB covers mutations 1 and 2

With [testA,testB] order, both are seen as useful, testA for 1 and testB for 2.
With [testB,testA] order, testA become useless.

wickedOne · 2023-02-18T09:20:56Z

We have a bunch of classes that solve a specific domain goal for which we just need to apply the rules we receive from the client: our gov. The rules are complex and outside our control: we write them down in the tests and get the tests pass, nothing more. Some times they don't even bother to check the previous ones: they just pile them one onto another. I found few useless rules because in fact removing the tests still gets us to 100% MSI score.

even though FooTest::testTwiceAgain generates alone 100% of CodeCoverage, Infection never runs it because any mutation gets killed by FooTest::testTwice and so it stops running other tests.

wouldn't these be integration tests rather than unit tests and you could decide not to run them with coverage?

while i really like the idea of this one, it might be a slippery slope given false positives?
so perhaps a different state for this might be unreliable, it surely would be interesting to list

Slamdunk changed the title ~~Add new metric: Tests usefulness~~ Add new metric: Test usefulness Feb 1, 2023

sanmai added the Feature Request label Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new metric: Test usefulness #1812

Add new metric: Test usefulness #1812

Slamdunk commented Feb 1, 2023 •

edited

sanmai commented Feb 5, 2023 •

edited

theofidry commented Feb 5, 2023

sanmai commented Feb 6, 2023

Slamdunk commented Feb 6, 2023

sanmai commented Feb 6, 2023

Slamdunk commented Feb 6, 2023

wickedOne commented Feb 18, 2023

Add new metric: Test usefulness #1812

Add new metric: Test usefulness #1812

Comments

Slamdunk commented Feb 1, 2023 • edited

sanmai commented Feb 5, 2023 • edited

theofidry commented Feb 5, 2023

sanmai commented Feb 6, 2023

Slamdunk commented Feb 6, 2023

sanmai commented Feb 6, 2023

Slamdunk commented Feb 6, 2023

wickedOne commented Feb 18, 2023

Slamdunk commented Feb 1, 2023 •

edited

sanmai commented Feb 5, 2023 •

edited