Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #14788: fix NPE for MagicNumberCheck #14791

Merged
merged 1 commit into from Apr 24, 2024

Conversation

Lmh-java
Copy link
Contributor

@Lmh-java Lmh-java commented Apr 13, 2024

@Lmh-java
Copy link
Contributor Author

The culprit of this NPE: #12848

The regression config of that PR is shown below:

<module name="MagicNumber">
  <property name="tokens" value="NUM_DOUBLE , NUM_FLOAT , NUM_INT , NUM_LONG "/>
            <property name="id" value="MagicNumber2"/>
  <property name="ignoreNumbers" value="-1, 0, 1, 2"/>
  <property name="ignoreFieldDeclaration" value="true"/>
  <property name="ignoreAnnotation" value="true"/>
</module>

ignoreAnnotation is set to true, therefore, this NPE segment of code did not get executed.

@Lmh-java Lmh-java marked this pull request as ready for review April 13, 2024 02:51
@romani
Copy link
Member

romani commented Apr 13, 2024

please generate diff report.

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge if Diff report is good.

@Lmh-java
Copy link
Contributor Author

GitHub, generate report

@romani
Copy link
Member

romani commented Apr 13, 2024

Report look good.

Copy link
Member

@rnveach rnveach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing anything for #14788 (comment) .

In addition, provide more regression with other variation of options. This exception exists because of a lack of regression in previous issue. Issue was around ignoreFieldDeclaration=true, so turning it off for other options should show no differences.

@Lmh-java
Copy link
Contributor Author

Lmh-java commented Apr 14, 2024

I am not seeing anything for #14788 (comment) .

@rnveach #14791 (comment). Is this what you want?

@rnveach
Copy link
Member

rnveach commented Apr 14, 2024

What about

what we can do to ensure we try to find this in the future.

Is manual regression the only thing which can be done? Nothing automatic?

Even the initial regression you provided is still only half 1/4th of the whole check.

@Lmh-java
Copy link
Contributor Author

GitHub, generate report

@Lmh-java
Copy link
Contributor Author

Lmh-java commented Apr 14, 2024

What about

what we can do to ensure we try to find this in the future.

Is manual regression the only thing which can be done? Nothing automatic?

Even the initial regression you provided is still only half 1/4th of the whole check.

Manual regression test might be the most computational efficient. In this case, all the variables encountered are binary (either true and false). It is possible for us to automate this process to cover all the cases, but the time complexity will be the mathmatical combination of all these variables. I have already been waiting for an hour to see this single regression test. If we need to cover all those possible cases, it will be a crazy amount of time to wait and it will greatly consume the computational resource.

This is only a boolean variable case, but if there is another type of variable with arbitrary possible values, then it would be impossible to brute force all the cases.

Besides brute forcing all possible combinations. Alternatively, I felt like there might be some methods, that we can integrate code coverage report with regression test. This code coverage report should detect how many changed lines in the patch are covered by the regression test. I am not sure how to do this. This is just a rough and crazy idea. :)

@Lmh-java
Copy link
Contributor Author

@romani @nrmancuso @Lmh-java
Unrelated to PR, but why do some regressions have full stacktraces and others do not in the same report?
https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8039a13_2024203306/reports/diff/elasticsearch/index.html#A1
https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8039a13_2024203306/reports/diff/elasticsearch/index.html#A139

That's very wired, I am also curious about this. I will investigate into this later on.

@Lmh-java
Copy link
Contributor Author

@rnveach #14791 (comment)
https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8039a13_2024011511/reports/diff/index.html is the report produced when ignoreFieldDeclaration=false. There is no difference between the patch and the base. Please take a look.

@Lmh-java Lmh-java requested a review from rnveach April 14, 2024 01:36
@rnveach
Copy link
Member

rnveach commented Apr 20, 2024

Regression is good.

the time complexity will be the mathmatical combination of all these variables. I have already been waiting for an hour to see this single regression test. If we need to cover all those possible cases, it will be a crazy amount of time to wait and it will greatly consume the computational resource.

I suggest you look into where execution time is spent in checkstyle. I have created a custom listener at https://github.com/rnveach/checkstyle/tree/more_audits to do this but it is outdated and not on the latest checkerstyle.

Majority of our time is spent creating the grammar tree and possibly loading the file from a hard drive. Very few checks cause a blip on the scale in time needed to process. Most checks finish under a second, even if you add all the times together when they are called.

Let's also not forget that Checkstyle does not support multi-threading, so your additionally waiting on a single CPU core even if you have a 8 core system. Technically patch-diff-report-tool is also not fully multi-threaded so there is additional time there. I have created multiple instances of Multi-threaded checking, and use my latest version on regression reports. I usually finish the same job in under an hour (and with more repos).

if there is another type of variable with arbitrary possible values, then it would be impossible to brute force all the cases.

Yes, and we have a GSoC project for this. For fields with an unlimited amount of possibilities, we planned to just pull cases from our tests since we build those tests on user issues. It won't be "everything", but its the best possibility for a start.

Alternatively, I felt like there might be some methods, that we can integrate code coverage report with regression test. This code coverage report should detect how many changed lines in the patch are covered by the regression test. I am not sure how to do this

I assume you would have to integrate the source and the regression files into 1 project and run the regression as a test to capture things as code coverage using the utilities already built. Anything else would need a custom implementation. However, this will only tell you if lines are executed and not if a boolean is flipped on or off, which like this issue shows, you would need to know to cover everything. This would need to be something halfway to pitest, but pitest would be a huge time addition. We split up our pitest profiles because it would take hours otherwise for a single run on the entire project.

=========

We have some automated regression added to our CI. We run checkstyle on various external projects during a PR, even before you run it manually.

https://github.com/checkstyle/contribution/blob/master/checkstyle-tester/checks-nonjavadoc-error.xml
This is our config used for it. It is basically a super-config of all our checks. We split Javadoc and non-Javadoc, because again, the extra ANTLR run needed for Javadoc is another slowdown.

https://github.com/checkstyle/contribution/blob/master/checkstyle-tester/checks-nonjavadoc-error.xml#L301
This is the only instance of use using MagicNumber and it is just the default instance. So it didn't find the issue because the property in this case was never turned on/off.

My recommendation is to add an instance of this to that config so we can catch more things in the future.

@Lmh-java
Copy link
Contributor Author

Lmh-java commented Apr 22, 2024

Thank you for your thorough reply.

Let's also not forget that Checkstyle does not support multi-threading, so your additionally waiting on a single CPU core even if you have a 8 core system. Technically patch-diff-report-tool is also not fully multi-threaded so there is additional time there. I have created multiple instances of Multi-threaded checking, and use my latest version on regression reports. I usually finish the same job in under an hour (and with more repos).

Yes, in that case, multi-threading or multi-processing and a efficient dispatcher will definitely help. If this process can be done in short amount of time. We can implement a regression test that can cover all the cases. We can start with only fliping boolean properties, and then propagate this to all kinds of properties. I will think more on how to do this after I finish all my final exams (in early May). I am kind of busy with school stuffs these days and cannot think properly :(
After I have a draft idea, I will open an issue on this.

My recommendation is to add an instance of this to that config so we can catch more things in the future.

Sure, I will add this config for this check.

@Lmh-java
Copy link
Contributor Author

@rnveach, I have added the property to the default instance of MagicNumberCheck.

PR: checkstyle/contribution#857

@Lmh-java
Copy link
Contributor Author

@rnveach I was thinking about the regression problem while having dinner :). I came up with an idea. I am not sure whether this is good or not. Please share your thoughts

Since automating the regression to automatically test all the properties and cross cases can be difficult, could we start with manual configuration of all the possible values. I got some inspirations from the matrix parameter of github workflow. In our case, if we can let the developer to configure all the possible values, and our tester can automatically generate all the possible cross cases and its corresponding configuration. Developers can leave some "blanks" in their regression configuration, and our tool will fill these "blanks" accordingly.

Sometimes we need to test different values for regression reports, and we need to manually create multiple versions of config and manually switch among them. This tool can also automate this process.

After we have such tool, we can move forward to automate the property generation by the tool itself in the future.

@rnveach
Copy link
Member

rnveach commented Apr 22, 2024

automating the regression to automatically test all the properties and cross cases can be difficult

This is a GSOC project.

could we start with manual configuration of all the possible values

This is basically what the config is that I am having you update, so it sounds reasonable. However, I wanted to point out that this is extremely related to the GSOC project and we are just starting to assign GSOC projects. So, unless you are picked for this project and the project is picked, any contributions for this may be rejected during GSOC.

@Lmh-java
Copy link
Contributor Author

Reply to #14791 (comment).

@rnveach After investigation, I found this difference does not come from patch-diff-report-tool. Before the results are sent to compare, this difference is already presented in checkstyle-result.xml. For example:

<file name="/Users/liminghao/Programs/IdeaProjects/contribution/checkstyle-tester/repositories/elasticsearch/src/test/java/org/elasticsearch/bwcompat/UnicastBackwardsCompatibilityTest.java">
<error line="1" severity="error" message="Got an exception - java.lang.NullPointerException: Cannot invoke &quot;com.puppycrawl.tools.checkstyle.api.DetailAST.getType()&quot; because &quot;node&quot; is null&#10;#x9;at com.puppycrawl.tools.checkstyle.checks.coding.MagicNumberCheck.isFieldDeclaration(MagicNumberCheck.java:390)&#10;#x9;at com.puppycrawl.tools.checkstyle.checks.coding.MagicNumberCheck.visitToken(MagicNumberCheck.java:219)&#10;#x9;at com.puppycrawl.tools.checkstyle.TreeWalker.notifyVisit(TreeWalker.java:335)&#10;#x9;at com.puppycrawl.tools.checkstyle.TreeWalker.processIter(TreeWalker.java:408)&#10;#x9;at com.puppycrawl.tools.checkstyle.TreeWalker.walk(TreeWalker.java:273)&#10;#x9;at com.puppycrawl.tools.checkstyle.TreeWalker.processFiltered(TreeWalker.java:154)&#10;#x9;at com.puppycrawl.tools.checkstyle.api.AbstractFileSetCheck.process(AbstractFileSetCheck.java:101)&#10;#x9;at com.puppycrawl.tools.checkstyle.Checker.processFile(Checker.java:340)&#10;#x9;at com.puppycrawl.tools.checkstyle.Checker.processFiles(Checker.java:299)&#10;#x9;at com.puppycrawl.tools.checkstyle.Checker.process(Checker.java:226)&#10;#x9;at org.apache.maven.plugins.checkstyle.exec.DefaultCheckstyleExecutor.executeCheckstyle(DefaultCheckstyleExecutor.java:202)&#10;#x9;at org.apache.maven.plugins.checkstyle.AbstractCheckstyleReport.executeReport(AbstractCheckstyleReport.java:533)&#10;#x9;at org.apache.maven.plugins.checkstyle.CheckstyleReport.executeReport(CheckstyleReport.java:57)&#10;#x9;at org.apache.maven.reporting.AbstractMavenReport.generate(AbstractMavenReport.java:255)&#10;#x9;at org.apache.maven.plugins.site.render.ReportDocumentRenderer.renderDocument(ReportDocumentRenderer.java:226)&#10;#x9;at org.apache.maven.doxia.siterenderer.DefaultSiteRenderer.render(DefaultSiteRenderer.java:348)&#10;#x9;at org.apache.maven.plugins.site.render.SiteMojo.renderLocale(SiteMojo.java:194)&#10;#x9;at org.apache.maven.plugins.site.render.SiteMojo.execute(SiteMojo.java:143)&#10;#x9;at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:126)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:328)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:316)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:174)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor.access$000(MojoExecutor.java:75)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor$1.run(MojoExecutor.java:162)&#10;#x9;at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute(DefaultMojosExecutionStrategy.java:39)&#10;#x9;at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:159)&#10;#x9;at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:105)&#10;#x9;at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:73)&#10;#x9;at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:53)&#10;#x9;at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:118)&#10;#x9;at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:261)&#10;#x9;at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:173)&#10;#x9;at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:101)&#10;#x9;at org.apache.maven.cli.MavenCli.execute(MavenCli.java:906)&#10;#x9;at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:283)&#10;#x9;at org.apache.maven.cli.MavenCli.main(MavenCli.java:206)&#10;#x9;at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)&#10;#x9;at java.base/java.lang.reflect.Method.invoke(Method.java:580)&#10;#x9;at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:283)&#10;#x9;at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:226)&#10;#x9;at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:407)&#10;#x9;at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:348)&#10;" source="com.puppycrawl.tools.checkstyle.Checker"/>
</file>
<file name="/Users/liminghao/Programs/IdeaProjects/contribution/checkstyle-tester/repositories/elasticsearch/src/test/java/org/elasticsearch/script/ScriptFieldTests.java">
<error line="1" severity="error" message="Got an exception - java.lang.NullPointerException&#10;" source="com.puppycrawl.tools.checkstyle.Checker"/>
</file>

This means, when the checkstyle report is generated, this difference is already there. I also noticed that the single line stacktrace has &#10(\n) at the very end. This might be the cause of the incorrect termination of the line.

However, I am not experienced enough with the contribution repo. I don't know how to trace more into the report generation. I only know the report is generated by the following mvn command, but I don't know where is the code behind this command. I might need some guidance on this.

https://github.com/checkstyle/contribution/blob/266f20250a15877331ebf9ff1ab511e8afe517f5/checkstyle-tester/diff.groovy#L627-L649

def runMavenExecution(srcDir, excludes, checkstyleConfig,
                      checkstyleVersion, extraMvnRegressionOptions) {
    println "Running 'mvn clean' on $srcDir ..."
    def mvnClean = "mvn -e --no-transfer-progress --batch-mode clean"
    executeCmd(mvnClean)
    println "Running Checkstyle on $srcDir ... with excludes {$excludes}"
    def mvnSite = "mvn -e --no-transfer-progress --batch-mode site " +
        "-Dcheckstyle.config.location=$checkstyleConfig -Dcheckstyle.excludes=$excludes"
    if (checkstyleVersion) {
        mvnSite = mvnSite + " -Dcheckstyle.version=$checkstyleVersion"
    }
    if (extraMvnRegressionOptions) {
        mvnSite = mvnSite + " "

        if (!extraMvnRegressionOptions.startsWith("-")) {
            mvnSite = mvnSite + "-"
        }
        mvnSite = mvnSite + extraMvnRegressionOptions
    }
    println(mvnSite)
    executeCmd(mvnSite)
    println "Running Checkstyle on $srcDir - finished"
}

By the way, do you want me to open a separate issue for this, or just discuss everything in this PR (since this is unrelated to this PR.) I have done all the updates to this PR, please take a look, so we can merge both this and checkstyle/contribution#857.

@rnveach
Copy link
Member

rnveach commented Apr 24, 2024

You can open an issue in contribution for this.

@rnveach rnveach merged commit ef2f38a into checkstyle:master Apr 24, 2024
113 checks passed
@Lmh-java
Copy link
Contributor Author

You can open an issue in contribution for this.

Done, at checkstyle/contribution#858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants