Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All checks that use AbstractCheck#getLine() should check code points instead of characters for spacing #10924

Open
nrmancuso opened this issue Nov 4, 2021 · 42 comments
Milestone

Comments

@nrmancuso
Copy link
Member

nrmancuso commented Nov 4, 2021

Affected checks:
https://checkstyle.sourceforge.io/config_misc.html#CommentsIndentation
https://checkstyle.sourceforge.io/config_blocks.html#RightCurly
https://checkstyle.sourceforge.io/config_blocks.html#LeftCurly
https://checkstyle.sourceforge.io/config_whitespace.html#GenericWhitespace
https://checkstyle.sourceforge.io/config_whitespace.html#NoWhitespaceBefore
https://checkstyle.sourceforge.io/config_whitespace.html#SingleSpaceSeparator
https://checkstyle.sourceforge.io/config_whitespace.html#NoWhitespaceAfter

➜  checks git:(issue-10920) grep -R 'getLine(' | grep -v "indentCheck\|\.getLine(\|int"
indentation/CommentsIndentationCheck.java:        final String targetSourceLine = getLine(singleLineComment.getLineNo() - 1);
indentation/CommentsIndentationCheck.java:        final String commentLine = getLine(blockComment.getLineNo() - 1);
blocks/RightCurlyCheck.java:        else if (shouldBeAloneOnLine(option, details, getLine(details.rcurly.getLineNo() - 1))) {
blocks/LeftCurlyCheck.java:        final String braceLine = getLine(brace.getLineNo() - 1);
whitespace/WhitespaceAfterCheck.java:            final String line = getLine(targetAST.getLineNo() - 1);
whitespace/WhitespaceAfterCheck.java:            final String line = getLine(ast.getLineNo() - 1);
whitespace/GenericWhitespaceCheck.java:        final String line = getLine(ast.getLineNo() - 1);
whitespace/GenericWhitespaceCheck.java:        final String line = getLine(ast.getLineNo() - 1);
whitespace/NoWhitespaceBeforeCheck.java:        final String line = getLine(ast.getLineNo() - 1);
whitespace/WhitespaceAroundCheck.java:            final String line = getLine(ast.getLineNo() - 1);
whitespace/SingleSpaceSeparatorCheck.java:                            getLine(currentNode.getLineNo() - 1),
whitespace/NoWhitespaceAfterCheck.java:        final String line = getLine(astLineNo - 1);

Noticed in #10837 and #10920, any check where we use AbstractCheck#getLine() (extracting a line of code from a source file) should be converted to using code points instead of characters. Now that Checkstyle supports Unicode in variable names and counts Unicode surrogate pairs correctly, we should make sure that all affected checks use code points instead of character arrays to ensure that we are indexing the correct character.

@nrmancuso
Copy link
Member Author

nrmancuso commented Nov 4, 2021

Another option to consider is to have AbstractCheck#getLine() return an int array of code points, but this would be a significant breaking change for users.

@rnveach
Copy link
Member

rnveach commented Nov 4, 2021

Why not add a getCodePoints() method?

@nrmancuso
Copy link
Member Author

Why not add a getCodePoints() method?

In AbstractCheck ? Great idea.

@MUzairS15
Copy link
Contributor

Can I work on this?

@nrmancuso
Copy link
Member Author

@MUzairS15 sure, please be aware that we will need to generate check regression reports for all checks listed in the issue description.

@MUzairS15
Copy link
Contributor

Thank you for your response.

@MUzairS15
Copy link
Contributor

MUzairS15 commented Nov 20, 2021

@nmancus1 Need help.
I created a method getCodePoint() in AbstractCheck, then I tried doing the required change but when I am running the tests related to that files, they are failing.
For eg: GenericWhiteSpaceCheck and GenericWhiteSpaceCheckTest 3 of 11 checks are failing.
Should I create a draft PR, to let you take a look at my changes?

@nrmancuso
Copy link
Member Author

@MUzairS15 circling back on this issue, I see that you edited your comment.

Should I create a draft PR, to let you take a look at my changes?

Yes, please do.

@MUzairS15
Copy link
Contributor

@MUzairS15 circling back on this issue, I see that you edited your comment.

Should I create a draft PR, to let you take a look at my changes?

Yes, please do.

I have created a draft PR, pls check when you find time,
Yes, I edited because I thought I will manage but I was not able to.

@nrmancuso
Copy link
Member Author

nrmancuso commented Nov 23, 2021

@MUzairS15 a good strategy for this update might be:

  1. Create AbstractCheck#getCodePoints
  2. Make changes in already updated checks (WhitespaceAround, NoWhitespaceBefore) to use this method
  3. Update checks that require simpler changes, such as *CurlyChecks and SingleSpaceSeparatorCheck
  4. GenericWhitespaceCheck seems to be the most complicated update, so I would do this last.

This process may help to give you a better feel for the reason why we are doing this update, and provide some insight that will be helpful when you return to GenericWhitespaceCheck.

Here is an article(and additional links) from StackOverflow that helped me to better understand code points: https://stackoverflow.com/questions/23979676/java-what-are-characters-code-points-and-surrogates-what-difference-is-there/23980020 ; perhaps this can also aid you in your work on this issue.

MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Nov 26, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 1, 2021
@nrmancuso
Copy link
Member Author

nrmancuso commented Dec 4, 2021

We have missed a bunch of instances where we use getLines()[i], which has the same effect: #11001

➜  checks git:(master) grep -R 'getLines(' | grep -v "indentCheck\|\.getLine(\|int"
AvoidEscapedUnicodeCharactersCheck.java:                final String line = getLines()[lineNo - 1];
indentation/ArrayInitHandler.java:        final String line = getIndentCheck().getLines()[lineNo - 1];
indentation/CommentsIndentationCheck.java:        final String[] lines = getLines();
indentation/CommentsIndentationCheck.java:        final char[] line = getLines()[lineNo - 1].toCharArray();
indentation/AnnotationArrayInitHandler.java:        final String line = getIndentCheck().getLines()[lineNo - 1];
TrailingCommentCheck.java:        final String line = getLines()[lineNo - 1];
TrailingCommentCheck.java:        String line = getLines()[lineNo - 1];
TrailingCommentCheck.java:        final String lineBefore = getLines()[lineNo - 1].substring(0, ast.getColumnNo());
blocks/EmptyBlockCheck.java:        final String[] lines = getLines();
imports/CustomImportOrderCheck.java:        final String[] lines = getLines();
coding/FallThroughCheck.java:        final String[] lines = getLines();
whitespace/MethodParamPadCheck.java:            final String line = getLines()[parenAST.getLineNo() - 1];
whitespace/EmptyForIteratorPadCheck.java:            final String line = getLines()[semi.getLineNo() - 1];
whitespace/EmptyLineSeparatorCheck.java:            final String prePreviousLine = getLines()[lineNo - number];
whitespace/EmptyLineSeparatorCheck.java:            final String lineBefore = getLines()[lineNo - 2];
whitespace/EmptyLineSeparatorCheck.java:            final String lineWithComment = getLines()[comment.getLineNo() - 1].trim();
whitespace/SeparatorWrapCheck.java:        final String currentLine = getLines()[lineNo - 1];
whitespace/AbstractParenPadCheck.java:        final String line = getLines()[ast.getLineNo() - 1];
whitespace/AbstractParenPadCheck.java:            final String line = getLines()[ast.getLineNo() - 1];
whitespace/EmptyForInitializerPadCheck.java:            final String line = getLines()[semiLineIdx];

These checks will also need to be updated.

@MUzairS15
Copy link
Contributor

Yes, I was going to ask but wanted to fix those first😅

@nrmancuso
Copy link
Member Author

nrmancuso commented Dec 4, 2021

@pbludov @strkkk @romani @rnveach since scope of this issue has grown, and needs of codepoint compatible methods/ operations is increasing (trimming, checking whitespace, finding first character of type, etc.), should we consider creating our own String class that implements CharSequence, and create such methods in our new class? This will make this update much easier, since we will be able to replace all String s from getLine(s) with our own implementation (this would also become part of our API). Just a thought.

We could call it UnicodeString, and back it with an int[] instead of a byte[] as the String class is now.

@strkkk
Copy link
Member

strkkk commented Dec 4, 2021

I think such way will only bring more mess and confusion. Some custom StringUtil class with all utility methods (I do not think there will be a lot of them) seems a better approach to me.

@MUzairS15
Copy link
Contributor

matcher accepts a CharSequence so wherever matcher is used, can we use string instead of codepoints?

@nrmancuso
Copy link
Member Author

matcher accepts a CharSequence so wherever matcher is used, can we use string instead of codepoints?

Yes, we do not need exact position of a character in this case.

@MUzairS15
Copy link
Contributor

Working on this, but require more time because, since last week I am busy with college activities exams, and other stuff because of sem-end this month.

MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 17, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 17, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 17, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 17, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 18, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Dec 18, 2021
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 15, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 15, 2022
@MUzairS15
Copy link
Contributor

@nick-mancuso Should I send PR to replace usage of getLines with getLine in TrailingCommentCheck?
as suggested in #11262 (review)

checks % git grep 'getLines' TrailingCommentCheck.java
TrailingCommentCheck.java:        final String line = getLines()[lineNo - 1];
TrailingCommentCheck.java:        String line = getLines()[lineNo - 1];
TrailingCommentCheck.java:        final String lineBefore = getLines()[lineNo - 1].substring(0, ast.getColumnNo());

@nrmancuso
Copy link
Member Author

nrmancuso commented Mar 16, 2022

@MUzairS15 let's create a checklist from comment at #10924 (comment)

replace usage of getLines with getLine

Ideally, we want to replace all with getLineCodePoints. We should also restrict usages of getLine to where we require matcher/regexp only.

@MUzairS15
Copy link
Contributor

MUzairS15 commented Mar 16, 2022

getLines to be replaced with getLine if code points update is not required

  • AvoidEscapedUnicodeCharactersCheck
  • ArrayInitHandler
  • AnnotationArrayInitHandler
  • CommentsIndentationCheck
  • EmptyBlockCheck
  • MethodParamPadCheck
  • EmptyForIteratorPadCheck
  • EmptyForInitializerPadCheck
  • FallThroughCheck
  • AbstractParenPadCheck
  • TrailingCommentCheck (review required)
  • SeparatorWrapCheck

EmptyLineSeparatorCheck and CustomImportOrderCheck doesn't require changes as these check checks presence of blank lines.

@MUzairS15
Copy link
Contributor

MUzairS15 commented Mar 16, 2022

TODO: move CommonUtil.isCodePointWhitespace to CodePointUtil

Will send PR when all checks have been updated.

MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 16, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 16, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 17, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 18, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 18, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 18, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 19, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 19, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 22, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 23, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 24, 2022
MUzairS15 added a commit to MUzairS15/checkstyle that referenced this issue Mar 26, 2022
@nrmancuso
Copy link
Member Author

@MUzairS15 what do we have left to do here? Let's update checklist with checks from #10924 (comment) and send PR for #10924 (comment)

@nrmancuso
Copy link
Member Author

@MUzairS15 ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment