New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #10272: Upgrade Java Grammar from ANTLR2 to ANTLR4 #10280
Conversation
e215dcc
to
7dddea2
Compare
* @param hiddenBefore comment token preceding this DetailAstImpl | ||
*/ | ||
public void setHiddenBefore(List<Token> hiddenBefore) { | ||
this.hiddenBefore = Collections.unmodifiableList(hiddenBefore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
523: setHiddenBefore() Assignment to List<Token> field 'hiddenBefore' from parameter hiddenBefore
532: setHiddenAfter() Assignment to List<Token> field 'hiddenAfter' from parameter hiddenAfter
@@ -396,7 +393,7 @@ public void testAddNextSiblingNullParent() { | |||
|
|||
assertEquals(oldParent, newSibling.getParent(), "Invalid parent"); | |||
assertNull(newSibling.getNextSibling(), "Invalid next sibling"); | |||
assertEquals(newSibling, child.getNextSibling(), "Invalid child"); | |||
assertSame(newSibling, child.getNextSibling(), "Invalid child"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
401: testAddNextSiblingNullParent() assertEquals() may be 'assertSame()'
@@ -43,7 +45,7 @@ | |||
* @see #CLASS_DEF | |||
* @see #INTERFACE_DEF | |||
**/ | |||
public static final int EOF = GeneratedJavaTokenTypes.EOF; | |||
public static final int EOF = Recognizer.EOF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be Recognizer
and not JavaParser
.
46: EOF Static field EOF declared in class 'org.antlr.v4.runtime.Recognizer' but referenced via subclass 'com.puppycrawl.tools.checkstyle.grammar.java.JavaLexer'
e23487a
to
c72b9d5
Compare
@@ -147,8 +147,7 @@ public void visitToken(DetailAST ast) { | |||
// force all methods to be Java style (see note in top Javadoc) | |||
final boolean isMethodViolation = isMethod && !isJavaStyle; | |||
final boolean isVariableViolation = !isMethod | |||
&& isJavaStyle != javaStyle | |||
&& typeAST.getType() != TokenTypes.TYPE_ARGUMENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that GENERIC_END
token positions are always reported correctly, and emitted accurately, this line was no longer covered.
Example:
protected Pair<Integer, Pair<String, Pair<String, Object>>[]>[] values3a;
Difference between latest master and PR branch:
https://www.diffchecker.com/IuwJEwmB
0e6bcc4
to
2f5c283
Compare
6643940
to
0cd105c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
items:
src/main/java/com/puppycrawl/tools/checkstyle/api/DetailAST.java
Outdated
Show resolved
Hide resolved
…s and expected output
After all reviews are completed, I will squash all commits into one, and run one final regression report before we merge this PR. |
/** | ||
* Visitor class used to build Checkstyle's Java AST from the parse tree produced by | ||
* {@link CheckstyleJavaParser}. In each {@code visit...} method, we visit the children of a node | ||
* (which correspond to subrules) or create terminal nodes (tokens), and return a subtree as a | ||
* result. | ||
* | ||
* <p>Example:</p> | ||
* | ||
* <p>The following package declaration:</p> | ||
* <pre> | ||
* package com.puppycrawl.tools.checkstyle; | ||
* </pre> | ||
* | ||
* <p> | ||
* Will be parsed by the {@code packageDeclaration} rule from {@code CheckstyleJavaParser.g4}: | ||
* </p> | ||
* <pre> | ||
* packageDeclaration | ||
* : annotations[true] LITERAL_PACKAGE qualifiedName SEMI | ||
* ; | ||
* </pre> | ||
* | ||
* <p> | ||
* We override the {@code visitPackageDeclaration} method generated by ANTLR in | ||
* {@code CheckstyleJavaParserBaseVisitor} at | ||
* {@link JavaAstVisitor#visitPackageDeclaration(CheckstyleJavaParser.PackageDeclarationContext)} | ||
* to create a subtree based on the subrules and tokens found in the {@code packageDeclaration} | ||
* subrule accordingly, thus producing the following AST: | ||
* </p> | ||
* <pre> | ||
* PACKAGE_DEF -> package | ||
* |--ANNOTATIONS -> ANNOTATIONS | ||
* |--DOT -> . | ||
* | |--DOT -> . | ||
* | | |--DOT -> . | ||
* | | | |--IDENT -> com | ||
* | | | `--IDENT -> puppycrawl | ||
* | | `--IDENT -> tools | ||
* | `--IDENT -> checkstyle | ||
* `--SEMI -> ; | ||
* </pre> | ||
* <p> | ||
* See https://github.com/checkstyle/checkstyle/pull/10434 for a good example of how | ||
* to make changes to Checkstyle's grammar and AST. | ||
* </p> | ||
* <p> | ||
* The order of {@code visit...} methods in {@code JavaAstVisitor.java} and production rules in | ||
* {@code CheckstyleJavaParser.g4} should be consistent to ease maintenance. | ||
* </p> | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I write a more detailed README/ guide with code examples on how to update grammar/ visitor? If so, should it live in the same directory as the parser and lexer grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this
I'm seeing huge memory usage / OutOfMemoryErrors in the Antlr runtime with the 9.x series. With the 8.x series my project(s) builds complete on a Maven heap of only 256M, but by bumping Checkstyle from 8.45.1 to any of the 9.x series, the minimum heap sits at around 768M (fails at 512M, succeeds at 768M). This suggests that Checkstyle memory usage is somewhere in the order of 2-3 times what it was previously. Is this something you have observed / an unavoidable consequence of Antlr 4, or is it possible there are leaks somewhere? |
We are conducting an investigation, and have an open issue at #10934. Please leave a comment there with details about your project, and if it is open source, share a link to your repo. This issue has been noted previously with ANTLR4, there are a few reports such as |
For now kind of unavoidable We keep #10934 open to find out root reason and maybe fix problem by PR to antrl4 project |
Closes #10272.
In summary, there are no regressions. All differences (in Check regression reports and AST regression reports) fall into the following categories:
DOT
operator placement, both are consistent with Checkstyle vision of AST (DOT
operator as parent of expression).InputAntlr4AstRegressionUncommon3.java
(from 3d471c3#r679410108)Check Regression Reports
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-nonjavadoc-error/index.html
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_checks-only-javadoc-error/index.html
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part1/index.html
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part2/index.html
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part3/index.html
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/index.html
Changes at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part4/spoon/index.html are from non-compilable files and correct
>
positioning.All other changes are from files with unicode characters that we previously could not parse.
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part5/index.html
Changes at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part5/guava/index.html are from corrected
DOT
operator parent placement.All other changes are from the same non-compilable files and files containing unicode chars as above.
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_part6/index.html
InputAntlr4AstRegressionUncommon3.java
.https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_1/index.html
Change at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_1/guava/index.html from same correct
DOT
operator parent placement as above.All other changes are from the same non-compilable files and files containing unicode chars as above.
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_08_09/diff_sevntu-check-regression_part_2/index.html
AST Regression Report
All repos:
https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/index.html
Regression was found in
lombok-ast
at https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr/lombok-ast/index.html#A1 , but is fixed. but is fixed.Updated AST regression report for
lombok-ast
: https://nmancus1.github.io/issue-10272_check_diff_reports_2021_07_25/diff-antlr-lombok/index.html