Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #14747: fix ParenPad to not flag unsupported Tokens #14792

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mahfouz72
Copy link
Member

Resolves: #14747
Resolves: #4175

  • removed deep scan from processExpression and just scanned the first level of a given node (the direct children only not the whole subtree)
  • adding some tokens that got missed from having no deep scan

Diff Regression config: https://gist.githubusercontent.com/mahfouz72/2b480a919e3a98f5609aeab17fb79b1b/raw/23cfa105f73bc7149494f1e0712be2376ad13fbd/parenpadbase.xml

Diff Regression patch config: https://gist.githubusercontent.com/mahfouz72/79f7d3b270b52b91da18557854ce6e60/raw/9d6c487e7b1df6d511ec2a808566db21759b91e1/parenpadpatch.xml

Comment on lines 269 to 272
}
else if (currentNode.hasChildren() && !isAcceptableToken(currentNode)) {
// Traverse all subtree tokens which will never be configured
// to be launched in visitToken()
currentNode = currentNode.getFirstChild();
continue;
}

// Go up after processing the last child
while (currentNode.getNextSibling() == null && currentNode.getParent() != ast) {
currentNode = currentNode.getParent();
}
currentNode = currentNode.getNextSibling();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the deep scan and just check the first child and all of its siblings

PS D:\test> cat src/Test3.java

public class Test3 {
    int i =  (( (5*4) + ( 4 + 2)));
}
PS D:\test> java  -jar checkstyle-10.14.2-all.jar -t  src/Test3.java
COMPILATION_UNIT -> COMPILATION_UNIT [2:0]
`--CLASS_DEF -> CLASS_DEF [2:0]
    |--MODIFIERS -> MODIFIERS [2:0]
    |   `--LITERAL_PUBLIC -> public [2:0]
    |--LITERAL_CLASS -> class [2:7]
    |--IDENT -> Test3 [2:13]
    `--OBJBLOCK -> OBJBLOCK [2:19]
        |--LCURLY -> { [2:19]
        |--VARIABLE_DEF -> VARIABLE_DEF [3:4]
        |   |--MODIFIERS -> MODIFIERS [3:4]
        |   |--TYPE -> TYPE [3:4]
        |   |   `--LITERAL_INT -> int [3:4]
        |   |--IDENT -> i [3:8]
        |   |--ASSIGN -> = [3:10]
        |   |   `--EXPR -> EXPR [3:13]
        |   |       |--LPAREN -> ( [3:13]         <-- this and all its siblings only checked 
        |   |       |--LPAREN -> ( [3:14]
        |   |       |--PLUS -> + [3:22]
        |   |       |   |--LPAREN -> ( [3:16]          <-- no deep scan so this will not be picked
        |   |       |   |--STAR -> * [3:18]
        |   |       |   |   |--NUM_INT -> 5 [3:17]
        |   |       |   |   `--NUM_INT -> 4 [3:19]
        |   |       |   |--RPAREN -> ) [3:20]
        |   |       |   |--LPAREN -> ( [3:24]
        |   |       |   |--PLUS -> + [3:28]
        |   |       |   |   |--NUM_INT -> 4 [3:26]
        |   |       |   |   `--NUM_INT -> 2 [3:30]
        |   |       |   `--RPAREN -> ) [3:31]
        |   |       |--RPAREN -> ) [3:32]
        |   |       `--RPAREN -> ) [3:33]
        |   `--SEMI -> ; [3:34]
        `--RCURLY -> } [4:0]

Comment on lines +302 to +311
TokenTypes.TYPECAST,
TokenTypes.STAR,
TokenTypes.PLUS,
TokenTypes.MINUS,
TokenTypes.DIV,
TokenTypes.MOD,
TokenTypes.LAND,
TokenTypes.LOR,
TokenTypes.LNOT,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some tokens that got missed due to having no deep scan (we may need to add some more tokens but those were the obvious one from the test and input files)

Comment on lines -276 to -280
*/
private boolean isAcceptableToken(DetailAST ast) {
return acceptableTokens.get(ast.getType());
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this anymore we needed it to avoid checking the same node twice. while doing a deep scan when we found a subtree that is acceptableToken we don't visit it because it will be picked up with visitToken() later

now there is no deep scan for tokens so we don't need this check. I removed this method, the related unnecessary class fields and the test corresponding to it

@mahfouz72
Copy link
Member Author

Github, generate report

@mahfouz72
Copy link
Member Author

Github, generate site

@mahfouz72
Copy link
Member Author

there are huge differences.
I want to know if I'm on the right track before proceeding to add more tokens.

@romani
Copy link
Member

romani commented Apr 13, 2024

Please extend link check suppression files with lines as CI suggesting.

@mahfouz72
Copy link
Member Author

@romani done. should I add all tokens that got missed and identified in the regression report in this PR?

@romani
Copy link
Member

romani commented Apr 14, 2024

I do not recommend to rush to add tokens, some of them not added for good reason. Let's add one by one, with full attention to regression diff report.

@mahfouz72
Copy link
Member Author

one by one in separate PRs or in this PR?
and also what about the tokens added till now? all of them were added to cover the missing parnes that are not checked in input files after having no deepscan

@nrmancuso
Copy link
Member

I do not recommend to rush to add tokens, some of them not added for good reason. Let's add one by one, with full attention to regression diff report.

@romani I don't think we can do this without a bunch of hacks; as soon as we stop "deep scanning", we need to extend the tokens for this check to keep the behavior consistent.

@mahfouz72 can you help us to understand what other tokens we may need to add here, and how you are discovering them?

@mahfouz72
Copy link
Member Author

@nrmancuso we need to add any possible token that could be used under EXPR token. for example, all mathematical, logical, and bitwise operators should be added. for now, I added some of them I discovered those from the failing unit tests after having no deep scan. but there are more tokens that we didn't use in the input file

Examples:

  • bitwise: & , | , ~ , ^ , << , >>
  • comparison : < <= > >= == !=
  • assignments : += -= = /= *= etc..

I am discovering them from the regression report and in general as I stated above I need to think of any token that can be used under expression and can be surrounded by paren

what do you think should I start working in this way? and pay full attention to the report to have consistent behaviour

@rnveach
Copy link
Member

rnveach commented May 1, 2024

@nrmancuso @romani Could this be a sign we need a ParenPadExpression check? Will someone ever want different spacing for different expression tokens?

@mahfouz72
Copy link
Member Author

mahfouz72 commented May 1, 2024

Will someone ever want different spacing for different expression tokens?

IMO, No, no one will need to violate this x = (3+5) but not this x += (3+5). That why I see this step #14792 (comment) making the check weird and unnecessary a mess of tokens (we are talking about 15 ~ 20 new tokens)

but at the same time, how to pick those cases that got missed after having no deep scan without adding all those tokens....

I don't know if this a good design but can we leave the deep scan and while scanning we skip tokens that are not any of the tokens mentioned here #14792 (comment) so we will have
isMathematicalOperator() , isBitwiseOperator(), isLogicalOperator() etc... and skip the validation while deep scanning based on the type of token
So basiaclly this solution is almost the same as #14792 (comment) but we won't explicity add them as new token of the check we will be checking them internally while deep scanning and ignore all other tokens

@rnveach
Copy link
Member

rnveach commented May 2, 2024

but can we leave the deep scan and while scanning we skip tokens that are not any of the tokens mentioned here

What is the difference between leaving the deep scanning (dropping the issue) or leaving the deep scanning while checking this internal only list ? My understanding was you built the list from the deep scanning (and/or regression).

I was thinking along the lines we break this check apart. We drop expression support here and create an expression only check (we want to do this for another check, #5945 ), specify all the tokens you suggested, but make them non-configurable (must stop on them) and no deep scanning. I can't really imagine people wanting different configs for different expression tokens. If someone does, we have this isolated check to make it easier to explain.

The problem with deep scanning is its easy to lose control and its harder to understand what it is exactly looking at. We had issues before where a check went beyond its boundaries in scanning. We are sort of having a discussion like this for UnusedLocalVariableCheck regarding pitest. Another example is #5234 and #5124 which I specifically mentioned branchContains is a dangerous method to use all the time since there is no restriction on how deep it can search.

@mahfouz72
Copy link
Member Author

mahfouz72 commented May 2, 2024

What is the difference between leaving the deep scanning (dropping the issue) or leaving the deep scanning while checking this internal only list ?

if we leave deep scan and check this internal list only. we will avoid validating tokens thay definitely should not be validated example: RECORD_PATREN_DEF that I have in the issue related to this PR

one of the main problems of the deepscan that we was checking token that not in configuration if we enforce checking this internal list only we will avoid this problem

We drop expression support here and create an expression only check

this is a good solution I am leaning to it. if we really want to remove the deep scan not just leave it and do this list hack to enforce it check only specific toke n

@nrmancuso nrmancuso self-assigned this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParenPad incorretly flags unsupported token ParenPad: deep scan picks up excess tokens not set in config
4 participants