Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an extension point at rule execution to measure performances #6962

Open
mgroth0 opened this issue Feb 10, 2024 · 8 comments
Open

Create an extension point at rule execution to measure performances #6962

mgroth0 opened this issue Feb 10, 2024 · 8 comments
Labels

Comments

@mgroth0
Copy link
Contributor

mgroth0 commented Feb 10, 2024

I would like to measure the performance of rules. I can measure the time it takes to process a file with the FileProcessListener extension point. But I do not see an extension point for rules.

I suggest something like:

interface RuleExecutionListener {
    fun startingRuleExecution(file: KtFile,rule: BaseRule)
    fun finishedRuleExecution(file: KtFile, rule: BaseRule, findings: List<Finding>)
}

fun executeRules(rules: List<BaseRule>) {
    for (rule in rules) {
        ruleListeners.forEach { it.startingRuleExecution(file,rule) }
        rule.visitFile(file, bindingContext, compilerResources)
        val findings = filterSuppressedFindings(rule, bindingContext)
        ruleListeners.forEach { it.finishedRuleExecution(file,rule,findings) }
        // ...
    }
}

For my use case, which is just to measure time, I do not need the list of findings.

Based on idea from @arturbosch here #5183. Also might solve that issue.

@BraisGabin
Copy link
Member

It's great to see someone interested on this topic! I think that the performance on a static analysis tool is key. It is the main reason to don't use it or use it less times.

About your proposal: that should be easy to implement. But I think that a feature like that should be inside detekt. I would love to have to receive issues related with the performance on specific rules so we can fix them.

So, would you be willing to implement that inside detekt?

My problem to implement it is how to handle all that information.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Feb 11, 2024

I feel the same way! I run detekt and ktlint. What is interesting is that ktlint, even with many more rules enabled, runs like 10 times as fast for me. So I am very curious what is happening.

I can think of some ideas for handling the information. I will be happy to give this a shot.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Feb 11, 2024

@BraisGabin

Is it ok if develop this on top of 3flex's k2 branch instead of main so that I can work in k2? Or would that make the PR too complicated?

@BraisGabin
Copy link
Member

Why do you want to do that? From my side, how you develop it is up to you. But to merge your branch you should rebase it back to main or wait for that other branch to be merged (aka wait for kotlin 2.0).

What is interesting is that ktlint, even with many more rules enabled, runs like 10 times as fast for me. So I am very curious what is happening.

Right now if you enable the debug mode on detekt detekt { debug = true } you can get a high level report. I would assume that we expend a lot of time generating the ContextBinding.

I can think of some ideas for handling the information. I will be happy to give this a shot.

If they are ready to imminent you can just go for it. Otherwise I recommend you to share them to see if they align with our ideas. That would probably make the pr review faster.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Feb 11, 2024

But to merge your branch you should rebase it back to main or wait for that other branch to be merged

Thank you for clarifying. I will see what is the best way for me than, but in the end will merge it back to main or wait before submitting a PR. As for why, just because I am using K2 in general so it is less context switching.

debug mode on detekt

I didn't know about this. I tried it, and the results were interesting.

I would assume that we expend a lot of time generating the ContextBinding.

I ran detekt on one of my multiplatform modules which contains source sets for jvm, android, js, and native. The debug info printed 12 times, and since I have the worker API enabled I am guessing it printed once per worker. Here are a couple of examples:

Phase LoadConfig took 159.868042ms
Phase CreateSettings took 87.994ms
Phase ValidateConfig took 133.163083ms
Phase Parsing took 1.293766667s
Phase Binding took 4.566750ms
Phase LoadingExtensions took 138.399750ms
Phase Analyzer took 762.714666ms
Phase Reporting took 125.207ms
Phase LoadConfig took 179.245291ms
Phase CreateSettings took 53.003833ms
Phase ValidateConfig took 128.469333ms
Phase Parsing took 1.311580792s
Phase Binding took 2.464839125s
Phase LoadingExtensions took 52.986833ms
Phase Analyzer took 458.546209ms
Phase Reporting took 54.108750ms

As you can see, the Binding phase either was super fast or super slow. So I am guessing that binding is either enabled or disabled depending on the target? Is that right? About half of them seemed to have a long Binding phase. But Parsing was consistently long, and might have contributed the highest amount to the total time as a result.

I recommend you to share them to see if they align with our ideas

After seeing the debug information above, my idea is we start with the existing scaffolding of the debug option.

  1. First we add a new metric to the debug processing. I have not looked at the code yet. But conceptually, just as we see:
Phase Parsing took 309.847583ms
Phase Binding took 7.125us

We could also see:

Rule UnnecessaryAny took 309.847583ms
Rule UnnecessaryApply took 7.125us

This would make the debug print a lot more, but that seems ok to me because it is "debug" after all and it is disabled by default.

  1. The next challenge will be to aggregate this information. Since it looks like the existing debug information is not aggregated, I might like to take a swing at aggregating all of it, both the Rule metrics as well as the Phase metrics. This might involve a new gradle task, unsure.

  2. Then reporting. Once the information is aggregated, this is relatively easy. I might make a simple report, but more importantly I'd make it easy for people to create their own reports with the info.


Also, side question, I see lots of messages in my debug output like:

The rule 'UnnecessarySafeCall' requires type resolution but it was run without it.

Is this normal? Is it due to running detekt on non-jvm sources? Should I do something about it?

@BraisGabin
Copy link
Member

and since I have the worker API enabled I am guessing it printed once per worker.

It prints it once per detekt task. Even without the worker API that would happen.

As you can see, the Binding phase either was super fast or super slow. So I am guessing that binding is either enabled or disabled depending on the target? Is that right?

You are right.

But Parsing was consistently long, and might have contributed the highest amount to the total time as a result.

That have sense. And I don't see how can we make that faster. The only tool that I can think of is parallelization. I was thinking about using coroutines inside detekt but probably that's not the main priority right now.

2. The next challenge will be to aggregate this information. Since it looks like the existing debug information is not aggregated, I might like to take a swing at aggregating all of it, both the Rule metrics as well as the Phase metrics. This might involve a new gradle task, unsure.
3. Then reporting. Once the information is aggregated, this is relatively easy. I might make a simple report, but more importantly I'd make it easy for people to create their own reports with the info.

Right now Detektion is the class that collects all the issues and information that detekt find in a project (for example, lines of codes, complexity...). Maybe we could add there this performance information. That would make it possible to show that information at any report. We could create a report only to show performance information (a csv with a matrix file x rule?) but also show part of this information in the html report.

Also, side question, I see lots of messages in my debug output like:

The rule 'UnnecessarySafeCall' requires type resolution but it was run without it.

Is this normal? Is it due to running detekt on non-jvm sources? Should I do something about it?

Yes, that's normal. And there is nothing we can do right now to fix that. Maybe 2.0 would help us there, no idea. The tasks where the Binding phase was SUPER quick you will see that messages.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Feb 12, 2024

That have sense. And I don't see how can we make [Parsing] faster.

Is Parsing done in the same task as running rules?

Parsing depends only on the source code. It doesn't depend on things used for rule execution such as the detekt configuration file and the detekt classpath for custom rules and extensions. So I am wondering if the Parsing phase could be put into a separate task. That way it will not have to rerun if a developer only edits their Detekt config file, for example.

The only tool that I can think of is parallelization

It is already parallel if the gradle build itself is parallel though, right? So I think parallelizing more would not benefit projects that are already parallel, and therefore are already at CPU capacity. Detekt is not IO-bound, right?

matrix file x rule

What is a "matrix file x rule"?

While technically not a blocker, I would prefer to wait for #6715 to be fixed before I start this so that I'm able to develop this while testing it on my full project.

@BraisGabin
Copy link
Member

Is Parsing done in the same task as running rules?

It is. And there is no way to split those because the result of parsing is not serializable to a file. Well, that "serialization" is the Kotlin file itself.

It is already parallel if the gradle build itself is parallel though, right?

Well, it's parallel by gradle project. But a gradle project have several files. Those files could be parsed in parallel. But I'm not sure if that would make detekt faster.

Detekt is not IO-bound, right?

No idea, I assume that a some stages yes. For example, parsing, as it needs to read every single file on the project.

What is a "matrix file x rule"?

I was thinking of something like a csv. Each row is a file, each column is a rule. And each cell is the time expended by that rule in that file.

@cortinico cortinico changed the title Create an extension point at rule execution Create an extension point at rule execution to measure performances Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants