Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column number in DetailNode should start with 1 #14780

Open
MANISH-K-07 opened this issue Apr 11, 2024 · 16 comments
Open

Column number in DetailNode should start with 1 #14780

MANISH-K-07 opened this issue Apr 11, 2024 · 16 comments

Comments

@MANISH-K-07
Copy link
Contributor

MANISH-K-07 commented Apr 11, 2024

I have downloaded the latest checkstyle from: https://checkstyle.org/cmdline.html#Download_and_Run
https://checkstyle.org/cmdline.html#Command_line_usage
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words

E:\MANISH\Workspace>cat Test.java
/**
 * Some javadoc.
 *
 * @author MANISH-K-07
 * @version 1.0
 */
class Test {
        /*
          romani
          rnveach
          nrmancuso
        */
        // MANISH-K-07
}

E:\MANISH\Workspace>java -jar checkstyle-10.14.2-all.jar -J Test.java
COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
`--CLASS_DEF -> CLASS_DEF [7:0]
    |--MODIFIERS -> MODIFIERS [7:0]
    |--BLOCK_COMMENT_BEGIN -> /* [1:0]
    |   |--COMMENT_CONTENT -> *\n * Some javadoc.\n *\n * @author MANISH-K-07\n * @version 1.0\n  [1:2]
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]
    |   |       |--TEXT ->  Some javadoc. [2:2]
    |   |       |--NEWLINE -> \n [2:16]
    |   |       |--LEADING_ASTERISK ->  * [3:0]
    |   |       |--NEWLINE -> \n [3:2]
    |   |       |--LEADING_ASTERISK ->  * [4:0]
    |   |       |--WS ->   [4:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [4:3]
    |   |       |   |--AUTHOR_LITERAL -> @author [4:3]
    |   |       |   |--WS ->   [4:10]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [4:11]
    |   |       |       |--TEXT -> MANISH-K-07 [4:11]
    |   |       |       `--NEWLINE -> \n [4:22]
    |   |       |--LEADING_ASTERISK ->  * [5:0]
    |   |       |--WS ->   [5:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [5:3]
    |   |       |   |--VERSION_LITERAL -> @version [5:3]
    |   |       |   |--WS ->   [5:11]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [5:12]
    |   |       |       |--TEXT -> 1.0 [5:12]
    |   |       |       |--NEWLINE -> \n [5:15]
    |   |       |       `--TEXT ->   [6:0]
    |   |       `--EOF -> <EOF> [6:1]
    |   `--BLOCK_COMMENT_END -> */ [6:1]
    |--LITERAL_CLASS -> class [7:0]
    |--IDENT -> Test [7:6]
    `--OBJBLOCK -> OBJBLOCK [7:11]
        |--LCURLY -> { [7:11]
        |--BLOCK_COMMENT_BEGIN -> /* [8:1]
        |   |--COMMENT_CONTENT -> \n\t  romani\n\t  rnveach\n\t  nrmancuso\n\t [8:3]
        |   `--BLOCK_COMMENT_END -> */ [12:1]
        |--SINGLE_LINE_COMMENT -> // [13:1]
        |   `--COMMENT_CONTENT ->  MANISH-K-07\n [13:3]
        `--RCURLY -> } [14:0]

E:\MANISH\Workspace>

The de-sync of column numbers (-1) can be found at the javadoc content of the AST :

 * Some javadoc.
 *
 * @author MANISH-K-07
 * @version 1.0
 
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]
    |   |       |--TEXT ->  Some javadoc. [2:2]
    |   |       |--NEWLINE -> \n [2:16]
    |   |       |--LEADING_ASTERISK ->  * [3:0]
    |   |       |--NEWLINE -> \n [3:2]
    |   |       |--LEADING_ASTERISK ->  * [4:0]
    |   |       |--WS ->   [4:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [4:3]
    |   |       |   |--AUTHOR_LITERAL -> @author [4:3]
    |   |       |   |--WS ->   [4:10]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [4:11]
    |   |       |       |--TEXT -> MANISH-K-07 [4:11]
    |   |       |       `--NEWLINE -> \n [4:22]
    |   |       |--LEADING_ASTERISK ->  * [5:0]
    |   |       |--WS ->   [5:2]
    |   |       |--JAVADOC_TAG -> JAVADOC_TAG [5:3]
    |   |       |   |--VERSION_LITERAL -> @version [5:3]
    |   |       |   |--WS ->   [5:11]
    |   |       |   `--DESCRIPTION -> DESCRIPTION [5:12]
    |   |       |       |--TEXT -> 1.0 [5:12]
    |   |       |       |--NEWLINE -> \n [5:15]
    |   |       |       `--TEXT ->   [6:0]
    |   |       `--EOF -> <EOF> [6:1]


Leading asterix is being considered as position '0' because the javadoc is parsed as a separate file.
This is why column numbers shown in the AST generated through cli option -J, -- treeWithJavadoc differ by 1 for javadoc content.

The option works in a way to sync the line numbers accurately. This should be extended to column numbers too
We should expect an AST something like :

    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:1]
    |   |       |--TEXT ->  Some javadoc. [2:3]
    |   |       |--NEWLINE -> \n [2:16]

Update: Similar to #4997 , but for DetailNode.

@MANISH-K-07
Copy link
Contributor Author

This issue in response to @rnveach 's #14710 (comment) which IMO is true considering the fact that the cmdline docs say that both the line and column numbers "will" differ. This however proved false as the line no's are in perfect sync.
The docs have been updated as part of #14710 but since we are able to sync lines, the same could be extended to columns.

@rnveach , @romani , I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

PS: the column numbering in IDEs would exceed by 1 bcoz AST displays 0 based positioning

@romani
Copy link
Member

romani commented Apr 11, 2024

There should be no 0 column at all, all should start from 1.

$ java -jar checkstyle-10.15.0-all.jar -J Test.java
COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
`--CLASS_DEF -> CLASS_DEF [7:0]
    |--MODIFIERS -> MODIFIERS [7:0]
    |--BLOCK_COMMENT_BEGIN -> /* [1:0]
    |   |--COMMENT_CONTENT -> *\n * Some javadoc.\n *\n * @author MANISH-K-07\n * @version 1.0\n  [1:2]
    |   |   `--JAVADOC -> JAVADOC [1:3]
    |   |       |--NEWLINE -> \n [1:3]
    |   |       |--LEADING_ASTERISK ->  * [2:0]

even COMPILATION_UNIT should be [7:1]

@MANISH-K-07
Copy link
Contributor Author

MANISH-K-07 commented Apr 11, 2024

There should be no 0 column at all, all should start from 1.
even COMPILATION_UNIT should be [7:1]

Ideally yes @romani , to avoid confusion this would help a lot..
But for some reason, AST column numbering starts from 0 conventionally.
https://www.geeksforgeeks.org/abstract-syntax-tree-ast-in-java/

I couldn't understand how just each newline (\n) of javadoc is in perfect sync with the column numbers?

Could you please tell me why this is observed?

@romani
Copy link
Member

romani commented Apr 11, 2024

Could you please tell me why this is observed?

no idea. Somebody need to debug it.

But for some reason, AST column numbering starts from 0 conventionally.

we should start from 1, to match what user is seeing in all IDE and text editors.

@MANISH-K-07
Copy link
Contributor Author

MANISH-K-07 commented Apr 11, 2024

no idea. Somebody need to debug it.

Will do. The parser is antlr based I believe.... so maybe @nrmancuso , any insights on why this might be happening?

we should start from 1, to match what user is seeing in all IDE and text editors.

Totally agree with this. It would help a lot to avoid all the confusion

@rnveach
Copy link
Member

rnveach commented Apr 11, 2024

There should be no 0 column at all, all should start from 1.
even COMPILATION_UNIT should be [7:1]

This is issue #4997 . It should be slightly separate from this issue as we have a de-sync with the current numbers.

Once they are insync, if position should be upped so there is no 0, then it should fall under #4997.

@MANISH-K-07

This comment was marked as outdated.

@nrmancuso
Copy link
Member

nrmancuso commented Apr 17, 2024

I am removing the approved label until we have a clear picture of the scope of the work to be done here.

As I read through the description and comments:

The option works in a way to sync the line numbers accurately

antlr

It is not clear to me if the problem is in our ANTLR grammar, JavaParser, or in the CLI.

What I need to see here to approve this again:

  1. Explanation of where/how we currently mutate line numbers as mentioned above, with links.
  2. All details in PR description
  3. New framing of issue entirely; it sounds like we are going to impact the entire Javadoc AST, not just the command line option. This means that checks, filters, etc. are impacted. We need to make this very clear.
  4. Understand that this is a breaking change, and consider how this can impact users.

@romani
Copy link
Member

romani commented Apr 17, 2024

Doc should show how it works now, even it is weird.
We can create issue to improve and in scope of improvement doc will be updated too. No need to block all together in one update.

@nrmancuso
Copy link
Member

Doc should show how it works now, even it is weird. We can create issue to improve and in scope of improvement doc will be updated too. No need to block all together in one update.

Please restate this, sounds like we still have a lot of confusion about the problem and how we are proposing to fix it. This comment leads me to believe that you think this is a documentation/CLI issue, while @MANISH-K-07 is ready to start making changes to the ANTLR grammar:

I will look into this bug, but antlr is not really my stronghold. Could someone please give me an overview of the approach here?

My comment at #14780 (comment) stands.

@MANISH-K-07

This comment was marked as outdated.

@nrmancuso
Copy link
Member

nrmancuso commented Apr 18, 2024

not sure on why and how part

Yes, this is why you need to debug and research, then provide your findings here before we approve and you start writing code. Always expect to do this for issues of any appreciable value and scope. Writing code is the easiest part; truly understanding the issue is a big part of the work, and helping others to understand is another important aspect.

Think about it like this: you start writing code, testing, etc, then send a PR only to find out that there is a massive misunderstanding, then you have to do it all over again. Then the next reviewer comes, points out a bunch of issues with your approach, then rinse and repeat. It is better to do a lot of work in the issue with research/explanations/discussion, since this work is reusable and ensures that everyone is on the same page.

@MANISH-K-07

This comment was marked as off-topic.

@rnveach
Copy link
Member

rnveach commented Apr 21, 2024

COMPILATION_UNIT -> COMPILATION_UNIT [7:0]
| | |--LEADING_ASTERISK -> * [2:0]

I haven't looked into this much, but this seems more like an antlr issue, or our conversion of antlr to AST.

When we create ASTs, we do a direct conversion from antlr's type. Debugging this code, shows we are giving column 0. So we take that as is, with no change.

AST is provided and used for everything, so when we provide column 0, everything else receives column 0.

Things are probably made more complex as violations possibly seem to be 1 based. This is because we add a 1 in AbstractCheck's log methods. These +1s were added as part of #4998 . Javadoc's logging uses similar log methods from AbstractCheck which have the same +1.

As another example, when don't provide a column number to Violation class, 0 is used as the default.

Also ASTs don't know about tab widths, so column number in the AST is not truly the column number. It is more the character index in the line. This is probably why ANTLR calls it's method getCharPositionInLine. It's javadoc says The index of the first character of this token relative to the beginning of the line at which it occurs, 0..n-1 which supports what I said above.

Fixing this issue will probably require little fixes all over the place.

@rnveach
Copy link
Member

rnveach commented Apr 21, 2024

@MANISH-K-07 Unless I am missing something, this issue doesn't seem like a desync, but is more Column number in DetailNode should start with 1.

Until #4997 is fixed, all our ASTs are 0-based. Your first post tree shows this for both ASTs. So they are technically in sync, right? If there a de-sync between violations and ASTs, this follows what I am saying in my previous post, which still stems from our ASTs are 0-based.

@MANISH-K-07

This comment was marked as resolved.

@MANISH-K-07 MANISH-K-07 changed the title CLI option -J (--treeWithJavadoc) parses AST with column numbering of javadoc content in de-sync Column number in DetailNode should start with 1 Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants