Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280

Merged
merged 8 commits into from Aug 11, 2021

Conversation

nrmancuso
Copy link
Member

@nrmancuso nrmancuso commented Jul 7, 2021

Closes #10272.

In summary, there are no regressions. All differences (in Check regression reports and AST regression reports) fall into the following categories:

  1. Diffs are in file that could not be parsed previously, but now we can (usually due to unicode characters).
  2. Diffs are in a file that we cannot parse any longer, that is not compilable.
  3. Diffs are from correct type parameter placement. This usually happens when one of the types is an array type. Example: Changes at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/spoon/index.html
  4. Diffs are from change in exception message.
  5. Two cases of changes in DOT operator placement, both are consistent with Checkstyle vision of AST (DOT operator as parent of expression).
    1. https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/spoon/index.html#A20
    2. https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/guava-mvnstyle/index.html#A91
  6. One case of change in line length now that we can parse unicode correctly: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/my-checkstyle/index.html#A239
  7. New violations on previously not parsable file InputAntlr4AstRegressionUncommon3.java(from 3d471c3#r679410108)

Check Regression Reports

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-nonjavadoc-error/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-only-javadoc-error/index.html

  • Changes are identical to above.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part1/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part2/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part3/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part5/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part6/index.html

  • New violations on previously not parsable file InputAntlr4AstRegressionUncommon3.java.

https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_1/index.html


https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_2/index.html

  • All other changes are from the same non-compilable files and files containing unicode chars as above.

AST Regression Report

All repos:
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/index.html

Regression was found in lombok-ast at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/lombok-ast/index.html#A1 , but is fixed. but is fixed.

Updated AST regression report for lombok-ast : https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr-lombok/index.html

@nrmancuso nrmancuso marked this pull request as draft July 7, 2021 21:31
@nrmancuso nrmancuso force-pushed the issue-10272 branch 4 times, most recently from e215dcc to 7dddea2 Compare July 9, 2021 01:24
* @param hiddenBefore comment token preceding this DetailAstImpl
*/
public void setHiddenBefore(List<Token> hiddenBefore) {
this.hiddenBefore = Collections.unmodifiableList(hiddenBefore);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://teamcity.jetbrains.com/viewLog.html?buildId=3543062&tab=Inspection&buildTypeId=Checkstyle_IdeaInspectionsPullRequest:

523: setHiddenBefore() Assignment to List<Token> field 'hiddenBefore' from parameter hiddenBefore 
532: setHiddenAfter() Assignment to List<Token> field 'hiddenAfter' from parameter hiddenAfter 

@@ -396,7 +393,7 @@ public void testAddNextSiblingNullParent() {

assertEquals(oldParent, newSibling.getParent(), "Invalid parent");
assertNull(newSibling.getNextSibling(), "Invalid next sibling");
assertEquals(newSibling, child.getNextSibling(), "Invalid child");
assertSame(newSibling, child.getNextSibling(), "Invalid child");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -43,7 +45,7 @@
* @see #CLASS_DEF
* @see #INTERFACE_DEF
**/
public static final int EOF = GeneratedJavaTokenTypes.EOF;
public static final int EOF = Recognizer.EOF;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be Recognizer and not JavaParser.

From https://teamcity.jetbrains.com/viewLog.html?buildId=3543062&tab=Inspection&buildTypeId=Checkstyle_IdeaInspectionsPullRequest:

46: EOF Static field EOF declared in class 'org.antlr.v4.runtime.Recognizer' but referenced via subclass 'com.puppycrawl.tools.checkstyle.grammar.java.JavaLexer' 

@nrmancuso nrmancuso force-pushed the issue-10272 branch 12 times, most recently from e23487a to c72b9d5 Compare July 15, 2021 13:44
@@ -147,8 +147,7 @@ public void visitToken(DetailAST ast) {
// force all methods to be Java style (see note in top Javadoc)
final boolean isMethodViolation = isMethod && !isJavaStyle;
final boolean isVariableViolation = !isMethod
&& isJavaStyle != javaStyle
&& typeAST.getType() != TokenTypes.TYPE_ARGUMENT;
Copy link
Member Author

@nrmancuso nrmancuso Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that GENERIC_END token positions are always reported correctly, and emitted accurately, this line was no longer covered.

Example:

        protected Pair<Integer, Pair<String, Pair<String, Object>>[]>[] values3a;

Difference between latest master and PR branch:
https://www.diffchecker.com/IuwJEwmB

@nrmancuso nrmancuso force-pushed the issue-10272 branch 8 times, most recently from 0e6bcc4 to 2f5c283 Compare July 19, 2021 18:20
@nrmancuso nrmancuso force-pushed the issue-10272 branch 3 times, most recently from 6643940 to 0cd105c Compare August 9, 2021 06:28
Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items:

@nrmancuso
Copy link
Member Author

After all reviews are completed, I will squash all commits into one, and run one final regression report before we merge this PR.

Comment on lines +44 to +93
/**
* Visitor class used to build Checkstyle's Java AST from the parse tree produced by
* {@link CheckstyleJavaParser}. In each {@code visit...} method, we visit the children of a node
* (which correspond to subrules) or create terminal nodes (tokens), and return a subtree as a
* result.
*
* <p>Example:</p>
*
* <p>The following package declaration:</p>
* <pre>
* package com.puppycrawl.tools.checkstyle;
* </pre>
*
* <p>
* Will be parsed by the {@code packageDeclaration} rule from {@code CheckstyleJavaParser.g4}:
* </p>
* <pre>
* packageDeclaration
* : annotations[true] LITERAL_PACKAGE qualifiedName SEMI
* ;
* </pre>
*
* <p>
* We override the {@code visitPackageDeclaration} method generated by ANTLR in
* {@code CheckstyleJavaParserBaseVisitor} at
* {@link JavaAstVisitor#visitPackageDeclaration(CheckstyleJavaParser.PackageDeclarationContext)}
* to create a subtree based on the subrules and tokens found in the {@code packageDeclaration}
* subrule accordingly, thus producing the following AST:
* </p>
* <pre>
* PACKAGE_DEF -&gt; package
* |--ANNOTATIONS -&gt; ANNOTATIONS
* |--DOT -&gt; .
* | |--DOT -&gt; .
* | | |--DOT -&gt; .
* | | | |--IDENT -&gt; com
* | | | `--IDENT -&gt; puppycrawl
* | | `--IDENT -&gt; tools
* | `--IDENT -&gt; checkstyle
* `--SEMI -&gt; ;
* </pre>
* <p>
* See https://github.com/checkstyle/checkstyle/pull/10434 for a good example of how
* to make changes to Checkstyle's grammar and AST.
* </p>
* <p>
* The order of {@code visit...} methods in {@code JavaAstVisitor.java} and production rules in
* {@code CheckstyleJavaParser.g4} should be consistent to ease maintenance.
* </p>
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I write a more detailed README/ guide with code examples on how to update grammar/ visitor? If so, should it live in the same directory as the parser and lexer grammar?

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this

@md-5
Copy link

md-5 commented Nov 22, 2021

I'm seeing huge memory usage / OutOfMemoryErrors in the Antlr runtime with the 9.x series.

With the 8.x series my project(s) builds complete on a Maven heap of only 256M, but by bumping Checkstyle from 8.45.1 to any of the 9.x series, the minimum heap sits at around 768M (fails at 512M, succeeds at 768M).

This suggests that Checkstyle memory usage is somewhere in the order of 2-3 times what it was previously.

Is this something you have observed / an unavoidable consequence of Antlr 4, or is it possible there are leaks somewhere?

@nrmancuso
Copy link
Member Author

Is this something you have observed / an unavoidable consequence of Antlr 4, or is it possible there are leaks somewhere?

We are conducting an investigation, and have an open issue at #10934. Please leave a comment there with details about your project, and if it is open source, share a link to your repo.

This issue has been noted previously with ANTLR4, there are a few reports such as
antlr/antlr4#2384.

@romani
Copy link
Member

romani commented Nov 22, 2021

For now kind of unavoidable

We keep #10934 open to find out root reason and maybe fix problem by PR to antrl4 project

@nrmancuso nrmancuso deleted the issue-10272 branch March 18, 2022 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Java Grammar from ANTLR2 to ANTLR4
7 participants