Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results vary depending on number of Infection threads #1671

Open
peldax opened this issue Feb 16, 2022 · 22 comments
Open

Results vary depending on number of Infection threads #1671

peldax opened this issue Feb 16, 2022 · 22 comments

Comments

@peldax
Copy link

peldax commented Feb 16, 2022

Question Answer
Infection version 0.25.6
Test Framework version PHPUnit 9.5.13
PHP version 8.1.2
Platform Linux
Github Repo private - we could set up a call to demonstrate

After deploying Infection to our project it generated result of 100% covered code MSI. It was kinda suspicious to me, because in our other projects or libraries where I use Infection it always gets around 80% and is hard to push over 90% (where it usually stays with only some false-positives mutants). I tried various things to get believable results and I ended up disabling multi threading.

  • Initial runs were executed using --threads=4 or --threads=16.
  • When run using --threads=1 it gives natural results with uncovered mutants correctly.

I also noticed how parallel run resolves all mutants incredibly quickly (NOT because of parallelism), we have ~5500 mutants and it runs in under two minutes with 4 threads. Single core run did not get halfway through in 20 minutes.

One unusual thing we do in our tests is usage of PHPUnits @depends feature to chain tests together, but even when I disabled those testcases results were the same.

phpunit.xml
<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd"
        convertErrorsToExceptions="true"
        convertNoticesToExceptions="true"
        convertWarningsToExceptions="true"
        colors="true"
        processIsolation="false"
        stopOnFailure="true"
        executionOrder="defects"
        backupGlobals="false"
        cacheResult="true"
        cacheResultFile="./temp/phpunit.cache">
   <php>
       <env name="PHPUNIT_MODE" value="yes"/>
       <env name="DEBUG_MODE" value="yes" />
   </php>
   <coverage includeUncoveredFiles="true">
       <include>
           <directory suffix=".php">./app</directory>
       </include>
       <exclude>
           <directory suffix=".php">./app/Utils</directory>
       </exclude>
       <report>
           <html outputDirectory="./build/coverage" lowUpperBound="60" highLowerBound="100"/>
       </report>
   </coverage>
   <testsuites>
       <testsuite name="base">
           <directory suffix="Test.php">./tests/Base</directory>
       </testsuite>
       <testsuite name="complex">
           <directory suffix="Test.php">./tests/Complex</directory>
       </testsuite>
   </testsuites>
   <logging>
       <junit outputFile="./build/phpunit.junit.xml"/>
   </logging>
</phpunit>
@sanmai
Copy link
Member

sanmai commented Feb 16, 2022

What if you remove executionOrder="defects"? And/or add resolveDependencies="true"?

@theofidry
Copy link
Member

Could you try to enable the extra logging (for the log file) and run it with both no parallel and parallel and compare the results for a mutant that has not been killed with no parallel with one that has in parallel?

@maks-rafalko
Copy link
Member

in addition to what was said above, please see this article and use --noop option to debug the issue: https://infection.github.io/guide/debugging-issues.html

@peldax
Copy link
Author

peldax commented Feb 17, 2022

Thank you all for your suggestions.

After some time tinkering I discovered there is underlying issue. My mutants are resolved as killed because testcase threw - but reasons for throwing are various and not related to mutation (for example, and most commonly, permissions on file). I am running tests in docker so there might be some miss-configuration happening. Will keep this thread updated after I return from a week on a holiday.

@sanmai
Copy link
Member

sanmai commented May 19, 2022

@maks-rafalko
Copy link
Member

Feel free to reopen if this is still the case.

For future readers: if with parallel execution you have much higher MSI - it is 99% the reason in your tests being conflicted with each other, rather than the issue with Infection.

See suggestions above and debug it with --noop option.

@curry684
Copy link

curry684 commented Nov 1, 2023

I'm not sure why this issue was closed as I'm seeing this, with pretty hefty consequences, all the time in a new fresh project (not on Github yet, sorry, needs a week or more of work).

What I'm seeing is that just running infection without parameters consistently creates the SAME test suite, resulting in the same MSI and output. Specifying ANY value for thread count will generate a completely different test set on EVERY run.

See for example 2 subsequent invocations of the exact same command on the exact same project:

$ vendor/bin/infection --threads=4

    ____      ____          __  _
   /  _/___  / __/__  _____/ /_(_)___  ____
   / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
 _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

#StandWithUkraine

Infection - PHP Mutation Testing Framework version 0.27.6


Running initial test suite...

PHPUnit version: 10.4.2

   94 [============================]  1 sec

Generate mutants...

Processing source code files: 18/18
.: killed, M: escaped, U: uncovered, E: fatal error, X: syntax error, T: timed out, S: skipped, I: ignored

..........MM..M....M....MMMM..MM...M.M..M.........   ( 50 / 205)
......................MM..........M...............   (100 / 205)
.....M..............................M.............   (150 / 205)
..M...............................................   (200 / 205)
.....                                                (205 / 205)

205 mutations were generated:
     186 mutants were killed
       0 mutants were configured to be ignored
       0 mutants were not covered by tests
      19 covered mutants were not detected
       0 errors were encountered
       0 syntax errors were encountered
       0 time outs were encountered
       0 mutants required more time than configured

Metrics:
         Mutation Score Indicator (MSI): 90%
         Mutation Code Coverage: 100%
         Covered Code MSI: 90%

Just pressing up and enter:

$ vendor/bin/infection --threads=4

    ____      ____          __  _
   /  _/___  / __/__  _____/ /_(_)___  ____
   / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
 _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

#StandWithUkraine

Infection - PHP Mutation Testing Framework version 0.27.6


Running initial test suite...

PHPUnit version: 10.4.2

   89 [============================]  1 sec

Generate mutants...

Processing source code files: 18/18
.: killed, M: escaped, U: uncovered, E: fatal error, X: syntax error, T: timed out, S: skipped, I: ignored

........................MMMM............MM....MM..   ( 50 / 205)
..................................................   (100 / 205)
.....................................M............   (150 / 205)
..M...............................................   (200 / 205)
.....                                                (205 / 205)

205 mutations were generated:
     195 mutants were killed
       0 mutants were configured to be ignored
       0 mutants were not covered by tests
      10 covered mutants were not detected
       0 errors were encountered
       0 syntax errors were encountered
       0 time outs were encountered
       0 mutants required more time than configured

Metrics:
         Mutation Score Indicator (MSI): 95%
         Mutation Code Coverage: 100%
         Covered Code MSI: 95%

So my MSI improved by 5% from 90% to 95% just by re-running the exact same tests. From what I can see in logs it generates a completely different test suite.

This makes the --threads option pretty much unusable in CI, even though we have powerful runners that can happily run 4 times faster with 4 threads.

edit: notably even --threads=1 exposes this behavior, even though I would expect it to be the default.

@theofidry theofidry reopened this Nov 1, 2023
@maks-rafalko
Copy link
Member

maks-rafalko commented Nov 1, 2023

I'm pretty sure there is an issue with your testsuite. Did you try to analyze the logs? Did you run Infection with --noop as explained above?

We can't guess what's going on without logs or without reproducible repository.

The easiest ways to find the reason are

  • --noop
  • analyze the logs between 2 inconsistent runs and find what NEW mutants are added and why

I'm not sure why this issue was closed as I'm seeing this

because author said he found issues in the tests suite and didn't reply.

What I suggest

if you need any help with the steps above, let us know!

Thank you

@maks-rafalko
Copy link
Member

/remind in 2 weeks

@reminders reminders bot added the Reminder label Nov 1, 2023
Copy link

reminders bot commented Nov 1, 2023

@maks-rafalko set a reminder for Nov 15th 2023

@curry684
Copy link

curry684 commented Nov 1, 2023

Right, I found the root issue: I'm clearing the Symfony cache in the PHPunit bootstrap script. Wrapping it in if (!getenv('INFECTION')) solves the issue.

[edit: nm I was probably looking at random ghosts]

From a DX perspective I'm wondering whether we can improve this by detecting the situation.

@theofidry
Copy link
Member

From a DX perspective I'm wondering whether we can improve this by detecting the situation.

I think the easiest way would be to have tests running in parallel in the first place. Otherwise there is two good solutions:

  • having a noop during the run from time to time
  • not running parallel unless the tests support parallelism

but both hurt performances unfortunately

@maks-rafalko
Copy link
Member

I would say it's not Infection's responsibility to check if tests work ok in parallel. We can inform users, notify, etc. but not check it (which is not even easy to implement). Actually, we already do it, see https://infection.github.io/guide/command-line-options.html#threads-or-j

Any developer IMO must analyze the logs if they use Mutation Testing, not blindly rely on MSI value

@maks-rafalko
Copy link
Member

Right, I found the root issue: I'm clearing the Symfony cache in the PHPunit bootstrap script. Wrapping it in if (!getenv('INFECTION')) solves the issue.

oh, that's a good point. We do the same:

tests/bootstrap.php

<?php

declare(strict_types=1);

use Symfony\Component\Dotenv\Dotenv;
use Symfony\Component\Filesystem\Filesystem;
use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Lock\Store\FlockStore;

require dirname(__DIR__).'/vendor/autoload.php';

if (file_exists(dirname(__DIR__).'/config/bootstrap.php')) {
    require dirname(__DIR__).'/config/bootstrap.php';
} elseif (method_exists(Dotenv::class, 'bootEnv')) {
    (new Dotenv())->bootEnv(dirname(__DIR__).'/.env');
}

$filesystem = new Filesystem();

$factory = new LockFactory(new FlockStore());

$lock = $factory->createLock('remove-tests-cache', ttl: 3);

if (!getenv('INFECTION') && $lock->acquire()) {
    $filesystem->remove([__DIR__ . '/../var/cache/test']);
    echo "\nTest cache cleared\n";

    $filesystem->remove([__DIR__ . '/../var/log/test.log']);
    echo "Test logs cleared\n\n";
}

I will add it somewhere in the docs.

@curry684
Copy link

curry684 commented Nov 2, 2023

I would say it's not Infection's responsibility to check if tests work ok in parallel.

Completely agree of course. But if we can, with reasonable effort, save someone else the 2 hours I wasted researching this, we should :)

An alternative train of thought I had - would it be possible to have an infection-symfony plugin that proxies the actual application Kernel and overrides getCacheDir to include the test token? As long as the environment name itself remains untouched and an application relies on kernel.cache_dir parameter instead of hardcoded stuff it would likely work transparently.

@maks-rafalko
Copy link
Member

maks-rafalko commented Nov 2, 2023

From what I see, clearing the cache in bootstrap file is not how Symfony works by default. So, it's your (and mine) particular cases, where we added this ourselves.

I mean, how many developers are affected by this? We can only guess, but I think not many. There were not any issues about it so far.

would it be possible to have an infection-symfony plugin

I would try to avoid this as this is not that easy. Too many questions right from the top of my head - how can we detect it? what about different symfony versions? what about our prefixed PHAR? what about other frameworks? should we bundle it in infection/infection or as a separate package? And so on.


Other ideas

  • (quite simple) detect symfony project and just provide a link to docs? so developer can read all the nuances
  • (medium) or check tests/bootstrap.php file (by analyzing AST probably) - if it clears the cache and doesn't check `getnenv('INFECTION'), hard fail and point to the docs. But again, I don't know if we really need to do it right now.

But if we can, with reasonable effort, save someone else the 2 hours I wasted researching this, we should :)

I absolutely agree with you. We always try to fail fast, provide good docs and exception messages. But, our resources are limited.

If adding docs can cover those few people who clear the cache in bootstrap.php, I would do it and avoid creating plugins. But these are my 2 cents, we can listen to other opinions.

@curry684
Copy link

curry684 commented Nov 2, 2023

clearing the cache in bootstrap file is not how Symfony works by default

No it only makes real sense in libraries, when you start to notice your code coverage bounces up and down depending on whether there is a cache, as all the DI code isn't being run when the cache is warm. It's why I added it 😆

I mean, how many developers are affected by this? We can only guess, but I think not many.

Correct, when developing a regular site or application you're not likely to run into this, although I also wonder how many people do infection testing at all on an end-user project, as only reusable libraries really benefit high coverage and infection - the 2 factors that combined made me run into this.

@peldax did you ever manage to pinpoint your specific issue?

@peldax
Copy link
Author

peldax commented Nov 2, 2023

Hi all.

Unfortunatelly no. We decided to use infection without parallelism and had not continued investigation on this problem further.

@maks-rafalko
Copy link
Member

offtopic:

Correct, when developing a regular site or application you're not likely to run into this, although I also wonder how many people do infection testing at all on an end-user project,

We do use Infection at the company I'm working for. And write (many) tests. Not sure about how many people do this. From the recent researches, not many. I saw that only ~30% of PHP developers write tests (not even saying about Infection) :(

as only reusable libraries really benefit high coverage and infection - the 2 factors that combined made me run into this.

I hope you are wrong! :)

  1. any code requires high coverage. Arguable, but in general it's true. See related article https://localheinz.com/articles/2018/02/01/code-coverage-is-a-meaningless-metric/
  2. even if the code is covered by 10%, you can still benefit a lot from Infection, as it check quality of the tests in the first place (aka "Covered MSI")

@curry684
Copy link

curry684 commented Nov 2, 2023

Come to think of it Symfony is not even remotely related to the problem - a Laravel project would have the same issue as it caches a lot as well.

So yeah, I think I agree that we can't do all that much here, apart from adding an extra line of output when using threads, something like:

Please note that some mutants will inevitably be harmless (i.e. false positives). If you are seeing inconsistent results with multithreaded testing, your project is likely not ready for it. Visit https://infection.github.io/guide/debugging-issues.html for guidance.

Time: 8s. Memory: 30.00MB. Threads: 7


My statements about testing were based on the "pragmatism not idealism" adage 😉 For libraries I strive for the highest numbers, I have a project with ostensibly 100% coverage, 10.0 code quality and 100% MSI, "because I can". But that one's small, and in the real world clients aren't paying for 100% coverage, or infection testing for that matters. In my experience, for libraries every percent of coverage has a similar cost. For applications the cost rises exponentially, and getting from 70% to 90% takes more work than getting to 70% in the first place. Infection testing is usually not even in the 10 year agenda (regrettably).

@peldax
Copy link
Author

peldax commented Nov 2, 2023

Re: offtopic.

For reusable libraries infection is a no brainer as it is easy to deploy and get a great insight about the quality of your tests. In this context the majority of tests are unit tests which test spcific reusable tools.

For a company work where we develop concrete applications the nature of our tests is completelly different. We also write a lot pf tests and I personally like to write tests which simulate a whole user stories. These tests are slow. For a project with lots of features the tests take a several minutes to complete as there is a database involved and is repeatedly seeded for specific purposes. When you run infection on this codebase (with single thread) it takes a few hours to complete. This is where I tried to make it work in parallel and created this issue, but it ended up too much effort and I eventually had to give up. One may argue that mutation testing is not suitable for integration tests, but those tests give me much more confidence in the whole system. When we encounter complicated logic, we write unit tests for that part for sure, but majority of code in real world application is just "some controller" which does something in the database, sends an email or generates some other output.

We ended up in a state where we run infection only once in a while in order to review and improve our tests, but we do not use it in pipeline.

@maks-rafalko
Copy link
Member

maks-rafalko commented Nov 2, 2023

I have super-good news for you guys.

  1. It is possible to use Infection for functional/e2e tests
  2. Infection does not take hours if run it in a correct way (see below)
  3. Any projects, even with millions of lines of code and thousands of slow functional tests can run Infection in minutes

How is that possible?

Easy: you need to run Infection for lines in the diff - so only new or updated lines will be mutated on CI.

I've spent weeks implementing different features in Infection to allow to use it for "functional tests suites".

For example:

As you can see in the example from this blogpost, it takes less than minute to run Infection for a feature branch with functional tests (that work with real DB, calling API endpoints by using WebTestCase/KernelTestCase, sending emails and so on).

So, everything is possible. And believe me, you will find very interesting results when you use Infection with not only unit tests.

So it can be (and should be) integrated in the CI pipeline. --git-diff-lines makes it possible and fast.

Here is how I use it for a real application (not library):

# run your tests and pregenerate coverage for Infection
vendor/bin/paratest -p4 --runner=WrapperRunner --coverage-xml=reports/coverage/coverage-xml --log-junit=reports/coverage/junit.xml --stop-on-failure

# run infection for the changes lines, reusing coverage
vendor/bin/infection --threads=max --only-covered --git-diff-lines --ignore-msi-with-no-mutations --min-covered-msi=100 --coverage=reports --skip-initial-tests --only-covering-test-cases

We ended up in a state where we run infection only once in a while in order to review and improve our tests, but we do not use it in pipeline.

That's how it was before, I hope my comment will inspire you to try again and run Infection on a regular basis for any project with any types of tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants