Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280

nrmancuso · 2021-07-07T21:29:19Z

In summary, there are no regressions. All differences (in Check regression reports and AST regression reports) fall into the following categories:

Diffs are in file that could not be parsed previously, but now we can (usually due to unicode characters).
Diffs are in a file that we cannot parse any longer, that is not compilable.
Diffs are from correct type parameter placement. This usually happens when one of the types is an array type. Example: Changes at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/spoon/index.html
Diffs are from change in exception message.
Two cases of changes in DOT operator placement, both are consistent with Checkstyle vision of AST (DOT operator as parent of expression).
1. https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/spoon/index.html#A20
2. https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/guava-mvnstyle/index.html#A91
One case of change in line length now that we can parse unicode correctly: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/my-checkstyle/index.html#A239
New violations on previously not parsable file InputAntlr4AstRegressionUncommon3.java(from 3d471c3#r679410108)

Check Regression Reports

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-nonjavadoc-error/index.html

Gained ability to parse most unicode characters in java code. Example: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-nonjavadoc-error/pmd/index.html#A6
Lost ability to parse some non-compilable java files. Example: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-nonjavadoc-error/spoon/index.html#A6
Minor change in exception message. This was expected.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-only-javadoc-error/index.html

Changes are identical to above.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part1/index.html

Ability to parse most unicode characters enabled us to find new violations in files that contain them. Example: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part1/apache-ant/index.html#A1, https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part1/pmd/index.html#A2
Lost violations in https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part1/spoon/index.html are from non-compilable files.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part2/index.html

Lost violations for non-compilable file. Example: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part2/spoon/index.html

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part3/index.html

New violations at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part3/apache-ant/index.html and https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part3/pmd/index.html are from files containing unicode characters that Checkstyle was previously unable to parse.
Other changes are from files above that either are not compilable or have unicode characters that we weren't able to parse previously.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/index.html

Changes at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/spoon/index.html are from non-compilable files and correct > positioning.
All other changes are from files with unicode characters that we previously could not parse.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part5/index.html

Changes at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part5/guava/index.html are from corrected DOT operator parent placement.
All other changes are from the same non-compilable files and files containing unicode chars as above.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part6/index.html

New violations on previously not parsable file InputAntlr4AstRegressionUncommon3.java.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_1/index.html

Change at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_1/guava/index.html from same correct DOT operator parent placement as above.
All other changes are from the same non-compilable files and files containing unicode chars as above.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_2/index.html

All other changes are from the same non-compilable files and files containing unicode chars as above.

AST Regression Report

All repos:
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/index.html

Regression was found in lombok-ast at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/lombok-ast/index.html#A1 , but is fixed. but is fixed.

Updated AST regression report for lombok-ast : https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr-lombok/index.html

nrmancuso · 2021-07-09T01:27:08Z

src/main/java/com/puppycrawl/tools/checkstyle/DetailAstImpl.java

+     * @param hiddenBefore comment token preceding this DetailAstImpl
+     */
+    public void setHiddenBefore(List<Token> hiddenBefore) {
+        this.hiddenBefore = Collections.unmodifiableList(hiddenBefore);


From https://teamcity.jetbrains.com/viewLog.html?buildId=3543062&tab=Inspection&buildTypeId=Checkstyle_IdeaInspectionsPullRequest:

523: setHiddenBefore() Assignment to List<Token> field 'hiddenBefore' from parameter hiddenBefore 532: setHiddenAfter() Assignment to List<Token> field 'hiddenAfter' from parameter hiddenAfter

nrmancuso · 2021-07-09T01:28:04Z

src/test/java/com/puppycrawl/tools/checkstyle/DetailAstImplTest.java

@@ -396,7 +393,7 @@ public void testAddNextSiblingNullParent() {

        assertEquals(oldParent, newSibling.getParent(), "Invalid parent");
        assertNull(newSibling.getNextSibling(), "Invalid next sibling");
-        assertEquals(newSibling, child.getNextSibling(), "Invalid child");
+        assertSame(newSibling, child.getNextSibling(), "Invalid child");


From https://teamcity.jetbrains.com/viewLog.html?buildId=3543062&tab=Inspection&buildTypeId=Checkstyle_IdeaInspectionsPullRequest:

401: testAddNextSiblingNullParent() assertEquals() may be 'assertSame()'

nrmancuso · 2021-07-09T01:29:20Z

src/main/java/com/puppycrawl/tools/checkstyle/api/TokenTypes.java

@@ -43,7 +45,7 @@
     * @see #CLASS_DEF
     * @see #INTERFACE_DEF
     **/
-    public static final int EOF = GeneratedJavaTokenTypes.EOF;
+    public static final int EOF = Recognizer.EOF;


This must be Recognizer and not JavaParser.

From https://teamcity.jetbrains.com/viewLog.html?buildId=3543062&tab=Inspection&buildTypeId=Checkstyle_IdeaInspectionsPullRequest:

46: EOF Static field EOF declared in class 'org.antlr.v4.runtime.Recognizer' but referenced via subclass 'com.puppycrawl.tools.checkstyle.grammar.java.JavaLexer'

nrmancuso · 2021-07-15T13:53:54Z

src/main/java/com/puppycrawl/tools/checkstyle/checks/ArrayTypeStyleCheck.java

@@ -147,8 +147,7 @@ public void visitToken(DetailAST ast) {
            // force all methods to be Java style (see note in top Javadoc)
            final boolean isMethodViolation = isMethod && !isJavaStyle;
            final boolean isVariableViolation = !isMethod
-                    && isJavaStyle != javaStyle
-                    && typeAST.getType() != TokenTypes.TYPE_ARGUMENT;


Now that GENERIC_END token positions are always reported correctly, and emitted accurately, this line was no longer covered.

Example:

protected Pair<Integer, Pair<String, Pair<String, Object>>[]>[] values3a;

Difference between latest master and PR branch:
https://www.diffchecker.com/IuwJEwmB

romani

items:

src/main/java/com/puppycrawl/tools/checkstyle/JavaParser.java

src/main/java/com/puppycrawl/tools/checkstyle/api/DetailAST.java

…s and expected output

…etc.)

nrmancuso · 2021-08-09T15:30:42Z

After all reviews are completed, I will squash all commits into one, and run one final regression report before we merge this PR.

nrmancuso · 2021-08-10T17:18:06Z

src/main/java/com/puppycrawl/tools/checkstyle/JavaAstVisitor.java

+/**
+ * Visitor class used to build Checkstyle's Java AST from the parse tree produced by
+ * {@link CheckstyleJavaParser}. In each {@code visit...} method, we visit the children of a node
+ * (which correspond to subrules) or create terminal nodes (tokens), and return a subtree as a
+ * result.
+ *
+ * <p>Example:</p>
+ *
+ * <p>The following package declaration:</p>
+ * <pre>
+ * package com.puppycrawl.tools.checkstyle;
+ * </pre>
+ *
+ * <p>
+ * Will be parsed by the {@code packageDeclaration} rule from {@code CheckstyleJavaParser.g4}:
+ * </p>
+ * <pre>
+ * packageDeclaration
+ *     : annotations[true] LITERAL_PACKAGE qualifiedName SEMI
+ *     ;
+ * </pre>
+ *
+ * <p>
+ * We override the {@code visitPackageDeclaration} method generated by ANTLR in
+ * {@code CheckstyleJavaParserBaseVisitor} at
+ * {@link JavaAstVisitor#visitPackageDeclaration(CheckstyleJavaParser.PackageDeclarationContext)}
+ * to create a subtree based on the subrules and tokens found in the {@code packageDeclaration}
+ * subrule accordingly, thus producing the following AST:
+ * </p>
+ * <pre>
+ * PACKAGE_DEF -&gt; package
+ * |--ANNOTATIONS -&gt; ANNOTATIONS
+ * |--DOT -&gt; .
+ * |   |--DOT -&gt; .
+ * |   |   |--DOT -&gt; .
+ * |   |   |   |--IDENT -&gt; com
+ * |   |   |   `--IDENT -&gt; puppycrawl
+ * |   |   `--IDENT -&gt; tools
+ * |   `--IDENT -&gt; checkstyle
+ * `--SEMI -&gt; ;
+ * </pre>
+ * <p>
+ * See https://github.com/checkstyle/checkstyle/pull/10434 for a good example of how
+ * to make changes to Checkstyle's grammar and AST.
+ * </p>
+ * <p>
+ * The order of {@code visit...} methods in {@code JavaAstVisitor.java} and production rules in
+ * {@code CheckstyleJavaParser.g4} should be consistent to ease maintenance.
+ * </p>
+ */


Should I write a more detailed README/ guide with code examples on how to update grammar/ visitor? If so, should it live in the same directory as the parser and lexer grammar?

romani

Let's merge this

md-5 · 2021-11-22T04:32:17Z

I'm seeing huge memory usage / OutOfMemoryErrors in the Antlr runtime with the 9.x series.

With the 8.x series my project(s) builds complete on a Maven heap of only 256M, but by bumping Checkstyle from 8.45.1 to any of the 9.x series, the minimum heap sits at around 768M (fails at 512M, succeeds at 768M).

This suggests that Checkstyle memory usage is somewhere in the order of 2-3 times what it was previously.

Is this something you have observed / an unavoidable consequence of Antlr 4, or is it possible there are leaks somewhere?

nrmancuso · 2021-11-22T13:03:06Z

Is this something you have observed / an unavoidable consequence of Antlr 4, or is it possible there are leaks somewhere?

We are conducting an investigation, and have an open issue at #10934. Please leave a comment there with details about your project, and if it is open source, share a link to your repo.

This issue has been noted previously with ANTLR4, there are a few reports such as
antlr/antlr4#2384.

romani · 2021-11-22T14:36:15Z

For now kind of unavoidable

We keep #10934 open to find out root reason and maybe fix problem by PR to antrl4 project

nrmancuso force-pushed the issue-10272 branch from f3c3c90 to d1be9bd Compare July 7, 2021 21:31

nrmancuso marked this pull request as draft July 7, 2021 21:31

nrmancuso force-pushed the issue-10272 branch 4 times, most recently from e215dcc to 7dddea2 Compare July 9, 2021 01:24

nrmancuso commented Jul 9, 2021

View reviewed changes

nrmancuso force-pushed the issue-10272 branch 12 times, most recently from e23487a to c72b9d5 Compare July 15, 2021 13:44

nrmancuso commented Jul 15, 2021

View reviewed changes

nrmancuso force-pushed the issue-10272 branch 8 times, most recently from 0e6bcc4 to 2f5c283 Compare July 19, 2021 18:20

nrmancuso force-pushed the issue-10272 branch 3 times, most recently from 6643940 to 0cd105c Compare August 9, 2021 06:28

strkkk approved these changes Aug 9, 2021

View reviewed changes

romani requested changes Aug 9, 2021

View reviewed changes

src/main/java/com/puppycrawl/tools/checkstyle/JavaParser.java Outdated Show resolved Hide resolved

src/main/java/com/puppycrawl/tools/checkstyle/api/DetailAST.java Outdated Show resolved Hide resolved

nrmancuso added 8 commits August 9, 2021 11:29

Issue checkstyle#10272: create new Antlr4AstRegressionTest with input…

463d763

…s and expected output

Issue checkstyle#10272: infra and dependencies

59ab5cc

Issue checkstyle#10272: suppressions and configs

969648e

Issue checkstyle#10272: update tests for antlr4 (tokens, exceptions, …

511de07

…etc.)

Issue checkstyle#10272: update utilities

566094f

Issue checkstyle#10272: update ArrayTypeStyleCheck

d3e5ee9

Issue checkstyle#10272: update WhitespaceAfterCheck

3f6281c

Issue checkstyle#10272: Upgrade Java Grammar from ANTLR2 to ANTLR4

3d471c3

nrmancuso force-pushed the issue-10272 branch from 0cd105c to 3d471c3 Compare August 9, 2021 15:29

nrmancuso mentioned this pull request Aug 10, 2021

Issue #3095: Add COMPILATION_UNIT token in Ast Tree, remove EOF token #10574

Merged

nrmancuso commented Aug 10, 2021

View reviewed changes

rnveach approved these changes Aug 10, 2021

View reviewed changes

romani approved these changes Aug 11, 2021

View reviewed changes

romani merged commit 5a27d12 into checkstyle:master Aug 11, 2021

nrmancuso deleted the issue-10272 branch March 18, 2022 02:02

nrmancuso mentioned this pull request Mar 29, 2022

Issue #11087: New check ChainedMethodCallWrap #11231

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280

Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280

nrmancuso commented Jul 7, 2021 •

edited

nrmancuso Jul 9, 2021

nrmancuso Jul 9, 2021

nrmancuso Jul 9, 2021

nrmancuso Jul 15, 2021 •

edited

romani left a comment

nrmancuso commented Aug 9, 2021

nrmancuso Aug 10, 2021

romani left a comment

md-5 commented Nov 22, 2021

nrmancuso commented Nov 22, 2021

romani commented Nov 22, 2021

Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280

Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280

Conversation

nrmancuso commented Jul 7, 2021 • edited

Check Regression Reports

AST Regression Report

nrmancuso Jul 9, 2021

Choose a reason for hiding this comment

nrmancuso Jul 9, 2021

Choose a reason for hiding this comment

nrmancuso Jul 9, 2021

Choose a reason for hiding this comment

nrmancuso Jul 15, 2021 • edited

Choose a reason for hiding this comment

romani left a comment

Choose a reason for hiding this comment

nrmancuso commented Aug 9, 2021

nrmancuso Aug 10, 2021

Choose a reason for hiding this comment

romani left a comment

Choose a reason for hiding this comment

md-5 commented Nov 22, 2021

nrmancuso commented Nov 22, 2021

romani commented Nov 22, 2021

nrmancuso commented Jul 7, 2021 •

edited

nrmancuso Jul 15, 2021 •

edited