Incremental analysis: caching Mutations for multiple runs #1085

Fenikkusu · 2020-02-28T03:18:55Z

Is your feature request related to a problem? Please describe.
When running on very large projects, it can take time to generate the mutations in addition to running the mutations. As an example, I have a project that has roughly 380 files and 26,867 lines of code. This project takes approximately 20 minutes to run start to finish. I have a second project with roughly 5,000 files and 900,000 lines of code. While I'm not sure the exact time the second project would take, a rough calculation puts us at over 4 hours to process infection.

Describe the solution you'd like
PHP-CS-Fixer has the ability to specify a cache file. This is a cache file tracks the paths of all the files in the project and the last modified time. When you run PHP-CS-Fixer a second time, it compares the modified times of the files to what is stored in the cache file. If the file has changed, then it will test the file again.

I believe a similar feature would be useful in infection. I think it would be useful to actually have two caches in infection. The first cache would be for generating mutations. The system would only process the file to generate the mutations if the files have changed since the last time infection ran.

The second place a cache would be useful is when running the mutations. The system would only run mutations for files that have changed since the last time infection was ran.

Describe alternatives you've considered
Haven't really thought that far ahead.

sanmai · 2020-02-28T12:11:07Z

The problem here is that the code and tests are interdependent. It is not that tests are always at fault for an escaping mutation. Sometime the code isn't as properly written as it should to be well-tested by the very same test.

Another problem is that any change in any part of the code may cause a new mutation to escape, just as well as an old mutation to be caught. Now, consider there's a per mutation per line of code cache. Under which terms we're going to invalidate it? OK, we can invalidate it if either the test or the subject changes. But what if a mutation is caused by another file, which is technically not covered by the test?

This is a great idea, but I have too many questions with too little answers so I'm not even sure where we should begin to implement it.

Fenikkusu · 2020-02-28T13:01:20Z

From a 10,000 foot view, I would assume the code coverage becomes the determining factor. That is to say that if a file has changed from the last run, that infection would automatically compare to the code coverage and run all mutations that touch the given file based on code coverage. I believe this would be the most 'make sense' way to invalidate the cache. Infection is already smart enough to skip files based off their coverage. I think it would be a simple enough task to just 'add' these files to the skipped files as if they weren't covered.

Fenikkusu · 2020-02-28T13:14:38Z

I just took a quick look at the code. I'm not highly knowledgable about the ends and outs, so there may be a better place, but a brief look at the system, using the concept I mentioned above, makes me think that the MutationGenerator would likely be a starting point to implementing such a feature. In the generate method, test the cache and skip the file from generating if it hasn't changed.

The only problem with this would ensuring that the mutation gets generated if another file changed and this file is covered by the same test.

theofidry · 2020-02-28T13:24:07Z

I think there is only two-three things that are cacheable unfortunately:

for a given source file, its corresponding AST
for a given source file: its corresponding mutations; It requires however to be invalidated ass soon as the mutator config changes or the infection commit changes - not very reliable and efficient
for a given source coverage file, its tests that we collect from parsing it

Anything else it not cacheable.

IMO for large codebases, if you want the score you do it in a night build that can take hours or dozen of hours, but otherwise infection should most certainly be use incrementally. For now this can be done by restricting it on the changed source files only. Maybe this can be improved though

Fenikkusu · 2020-02-28T18:35:55Z

@theofidry , How do you suggest restricting it? Perhaps I'm missing something or not seeing what you are seeing.

theofidry · 2020-02-28T19:37:24Z

with the filter option; cf. https://github.com/infection/infection/blob/master/.ci/travis-functions.sh#L76

Fenikkusu · 2020-02-28T20:35:35Z

@theofidry , Thank You. I will look into that.

An additional thought did occur to me: since the --filter options exists, I wonder if it might be possible to add in a --filter-cache as a midway point.. While you can use the filter option in combination with git, I would think it might be possible to simply do the afore mentioned compare with a cache file, and then auto-populate the filter options using the files marked as changed. I'm not sure there is much benefit in doing that since it can be done through git.

theofidry · 2020-02-29T08:40:49Z

Yes maybe we could have an incremental option which works with the last run

maks-rafalko · 2021-08-09T20:00:14Z

Several ideas for the inspiration from @hcoles #1549 (comment)

You might be interesting in pitest's incremental analysis feature. This PR looks to implement one of the strategies it employs to speed things up, but others are also possible

https://pitest.org/quickstart/incremental_analysis/

Reading it again (it's a long time since I wrote it), I'm not sure number 5 is a great idea, but the others give pitest a huge speedup once the data has been collected.

Fenikkusu mentioned this issue Feb 28, 2020

Concurrent mutation generator #1082

Merged

1 task

maks-rafalko changed the title ~~Caching Mutations For Multiple Runs~~ Incremental analysis: caching Mutations for multiple runs Aug 9, 2021

maks-rafalko added DX Developer Experience Feature Request Performance labels Aug 9, 2021

maks-rafalko mentioned this issue Aug 9, 2021

[Performance] Cache phpunit results and run defects first to speed up the Mutation Testing execution #1549

Merged

danepowell mentioned this issue May 2, 2022

Coveralls-style reporting on PRs #1687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental analysis: caching Mutations for multiple runs #1085

Incremental analysis: caching Mutations for multiple runs #1085

Fenikkusu commented Feb 28, 2020

sanmai commented Feb 28, 2020 •

edited

Fenikkusu commented Feb 28, 2020

Fenikkusu commented Feb 28, 2020 •

edited

theofidry commented Feb 28, 2020

Fenikkusu commented Feb 28, 2020

theofidry commented Feb 28, 2020

Fenikkusu commented Feb 28, 2020

theofidry commented Feb 29, 2020

maks-rafalko commented Aug 9, 2021

Incremental analysis: caching Mutations for multiple runs #1085

Incremental analysis: caching Mutations for multiple runs #1085

Comments

Fenikkusu commented Feb 28, 2020

sanmai commented Feb 28, 2020 • edited

Fenikkusu commented Feb 28, 2020

Fenikkusu commented Feb 28, 2020 • edited

theofidry commented Feb 28, 2020

Fenikkusu commented Feb 28, 2020

theofidry commented Feb 28, 2020

Fenikkusu commented Feb 28, 2020

theofidry commented Feb 29, 2020

maks-rafalko commented Aug 9, 2021

sanmai commented Feb 28, 2020 •

edited

Fenikkusu commented Feb 28, 2020 •

edited