Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

shivam-sehgal
Copy link

@shivam-sehgal shivam-sehgal commented Jul 8, 2023

🤔 What's changed?

to handle tags cases like @Step1,@step2

adding regex to handle cases if the operand name of tags contains commas, in future this regex can be further extended
to handle any new case that will arise

⚡️ What's your motivation?

heavily use this library, and wanted to contribute.

🏷️ What kind of change is this?

I came across issue. Although the documentation specifies the expected format for tags, it seems that the pattern used by the user, '@Step1,@step2,' did not generate any error. This discrepancy has the potential to confuse users. To address this, it would be beneficial to implement a validation mechanism that throws an appropriate error when such patterns are encountered.

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, documentation etc. without changing behaviour)
  • 🐛 Bug fix (non-breaking change which fixes a defect)
  • ⚡ New feature (non-breaking change which adds new behaviour)
  • 💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

📋 Checklist:

added support for regex filtering on operand names of tags
Limitations
1)Currently only handling case ',' in operand name
2)Currently just modified the java module of this lib

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

This text was originally generated from a template, then edited by hand. You can modify the template here.

@jkronegg
Copy link

jkronegg commented Jul 8, 2023

I like the idea of having cucumber notifying the user of misconfiguration issues. However, the proposed exception message (Tag expression "%s" could not be parsed because of syntax error: Tag names should follow this pattern "%s"") is a bit rough.

As the natural understanding of ”@tag1,@tag2" is a an "or/and" combination of both tags, the exception message could be friendlier and suggest to use one of the operators (e.g. something like "It looks like your are trying to combine tags using comma (','), which is not supported as tag combination operators. Please replace it by either 'or' or 'and' operators, e.g. '@tag1 or @tag2'.").

The PR missed unit tests.

@@ -106,6 +108,7 @@ private static List<String> tokenize(String expr) {
isEscaped = true;
} else if (c == '(' || c == ')' || Character.isWhitespace(c)) {
if (token.length() > 0) {
isOperandValid(token,expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the method isOperandValid() is actually wrong here.
Correct, that is your intention but also operator-strings like and, but and not are checked by this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jenisys , for your review and feedback. I have made the necessary adjustment by updating the method name. As you rightly mentioned, apart from operands, this method will also handle validations for operators. Please confirm if the changes meet the requirements. Your continued assistance is appreciated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method-name is now OK for me 👍

@shivam-sehgal
Copy link
Author

I like the idea of having cucumber notifying the user of misconfiguration issues. However, the proposed exception message (Tag expression "%s" could not be parsed because of syntax error: Tag names should follow this pattern "%s"") is a bit rough.

As the natural understanding of ”@tag1,@tag2" is a an "or/and" combination of both tags, the exception message could be friendlier and suggest to use one of the operators (e.g. something like "It looks like your are trying to combine tags using comma (','), which is not supported as tag combination operators. Please replace it by either 'or' or 'and' operators, e.g. '@tag1 or @tag2'.").

The PR missed unit tests.

Thank you, @jkronegg, for your valuable comment and thorough review. I have enhanced the error message with additional details to provide clearer guidance. Additionally, I have included relevant unit tests to ensure proper validation. Please review the changes and kindly confirm if they meet the requirements. Your continued support is greatly appreciated.

@jenisys
Copy link
Contributor

jenisys commented Jul 9, 2023

HINTS:

  • The change in the test data affects all implementations (Java, Go, …)
  • The implementation is currently only provided for Java
  • Therefore, the tests for all other languages that lack this implementation have test failures

Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR prevents the use of commas inside tag expressions altogether.

However comma's are valid characters inside a tag. Consequently they must be a valid character to use in a tag expression. For example a scenario may be tagged with @a1,b2. Which could then be selected by the tag expression @a1,b2

  @a1,b2
  Scenario: Example 1 

Likewise a scenario might be tagged with @a1, and @b2. This should be selectable through the tag expression @a1, and @b2.

  @a1,@b2
  Scenario: Example 2

  @a1, @b2
  Scenario: Example 3  

The intent of this PR is to inform users they are using a tag-expression syntax from v1 while using v2. Currently I do not believe there way to accept all valid v2+ tag expressions while rejecting all v1 expressions. There may however the possibility of validating literals.

A tag in Gherkin is delimited by @ and Gherkin does not allow the @ to be escaped. So every tag matches the pattern ^@[^@]+. So any tag expression literal that contains a @ after the starting @ can not possibly target a gherkin tag.

For example

  @a1@b2
  Scenario: Example 4

This scenario is tagged with @a1 and @b2 rather than @a1@b2. So the tag expression @a1@b2 will never find any valid scenarios and could be considered invalid.

This will incidentally also capture any use of the v1 tag expressions of the form @a1,@b2 and also ~@a1.

Proposal:

  • Validate that Gherkin tags do indeed follow the ^@[^@]+ pattern
  • Validate literals in tag expressions to match the ^@[^@]+ pattern.
  • (Optional) Provide the user with examples of correct tag expressions in the error message. Note this should be done for all parse errors, not just the one's we are adding now.

@shivam-sehgal shivam-sehgal changed the title Handling error response when a comma is present in the operand name of tags [DRAFT]Handling error response when a comma is present in the operand name of tags Jul 9, 2023
@@ -11,7 +11,7 @@

public final class TagExpressionParser {
//regex for token to ensure no token has ',' in them later can be customized further
private final static String VALID_TOKEN = "(?:@[^@]*|and|or|not|/(|/))$";
private final static String VALID_TOKEN = "^(?:@[^@]*|and|or|not|\\(|\\))$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By trying to validate all tokens rather than the litterals you are making this more complex than needed. Considering that this implementation will have to be repeated for a few languages, perhaps a review in the middle of just the Java implementation can save some duplication of effort.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mpkorstanje, for your reply
I assume that a tag expression consists of operands and operators. When splitting the expression, there are two possibilities. The current token or string could be an operator, which includes "(", ")", "and", "or", and "not". Alternatively, it could be an operand, which should adhere to the Gherkin standard of starting with "@" and not containing "@" anywhere else.

To handle this, I have used a regex pattern to ensure that only valid tokens, which form the expression, are evaluated. By considering both operands and operators, it simplifies the code and eliminates the need for additional checks to determine the token type.

I believed this approach reduces complexity and improves the readability of the code, but maybe I might be missing someting, I would appreciate your opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually the situation is slightly different (as you can see from the tests / testdata):

  • Tags in a tag-expression may either be @foo (with-leading-atsign) or foo (without-leading-atsign)
  • @ (atsign) should not occur in a tag (after the first char) in a tag-expression
  • Gherkin parser removes the @ (atsign) from tags for Features/Rules/Scenarios/… . Therefore, @foo becomes“foo“ internally

SEE ALSO:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gherkin parser removes the @ (atsign)

@jenisys are you sure about that? Or is that true only for some Gherkin parsers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe I might be missing someting, I would appreciate your opinion on this.

@shivam-sehgal without repeating myself that will be a bit difficult.

But do consider validating only the litterals. In essence, do not put the validation in the middle of tokenizer but rather after tokenization, just prior to litteral creation.

Copy link
Author

@shivam-sehgal shivam-sehgal Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpkorstanje cucumber-jvm basically calls this library to parse the tag expression as well as to validate if the tag expression is valid or not so the flow is like this in cucmberOptions annotation user provides the tags expression shown in the below code

# — FILE: SomeClass.java
@CucumberOptions(features = "src/it/resources/features",
        glue = {"com.adobe.aep.devex.exim.integrationtests.stepDefinitions",
                "com.adobe.aep.devex.exim.integrationtests.config",
                "com.adobe.aep.devex.exim.integrationtests.hooks"
        },
        plugin = {"pretty", "html:target/cucumber-html-report.html",
                "json:target/cucumber.json",
                "rerun:target/cucumber-api-rerun.txt"
        },
    tags = "not @ignore and @hello" // Exclude the feature with the "@ignore" tag
)
…

this annotation is parsed by CucumberOptionsAnnotationParser.class.
addTags method in this class is called which then calls this repo to validate the tag expression, now the intent of this PR was users by mistake put tag expressions like @tag1,@tag2 in the tags section in the CucumberOptions, and they don't get any error message, to avoid that we need to validate the tag expression not having any tags or literals like @tag,@againInTag to prevent this confusion and give users a valid error response so they can understand that they need to use logical operator, instead of commas, I can put this tag validation for the provided tag expression in addTags method of the CucumberOptionsAnnotationParser clas i.e in cucumber jvm, but my mind says it's better to keep validation logic inside this tag expression validation repo only to follow the single responsibility principle of oops,
to address this concern with the least impacting change, for now, I see if there is an expression having a tag starting with @ that tag shouldn't have @ after first, it will prevent this confusion with a proper error message letting user understand that they are doing mistake and cucumber is reading their expression as a single tag, and also won't affect any case where we allow the literals that are not starting with @

In short, cucumber-jvm calls this repo to validate the tag expression passed by the user and uses a message from TagExpressionExceptionto let the user know with a message what is wrong with the expression, the message should be coming from here i.e from this repo's java directory, else the responsibility of this code we have to handle separately in cucumber-jvm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution approach could be:

  • if CucumberOptions had an optional parameter, like tag_expression_validator, tag_expression_parser or an tag-expression preprocess hook you could perform a pre-process check
  • Pre-process check would just split the tag-expression string value into words and check if comma(s) are contained in any word
  • If a comma is contained, you issue a log-warning or raise an exception (whatever the „right“ consequent action is), …

This approach would have the following advantages:

  • Strict checking is optional
  • If you want/need strict checking of tag-expressions, you can plug-in the functionality

HINT:
By looking at the current implementation of the CucumberOptions class, this needed parameter would need to be added to support such a kind of solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jenisys, I find the approach of using an optional flag for literal validation quite appealing. However, I have some doubts about the term "strict checking." If libraries using this repository are enforcing Gherkins rules, then it seems reasonable for this library to do the same. However, if we believe and would like to continue supporting users who define literals not starting with @ in new versions, then the optional parameter approach makes sense.

One potential concern raised by @mpkorstanje is that checking for commas alone may cause issues. To differentiate between v1 and v2, perhaps we should consider checking for ^@[^@]+ which he mentioned if the optional strict validation flag is passed in CucumberOptions.

I would appreciate hearing both of your opinions on this matter. Let's collaborate and I can begin working on it.

cc: @mpkorstanje

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost of the complexity is starting to exceed the value of the solution.

Perhaps a simpler solution such as improving the documentation in Cucumber JVM might help?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mpkorstanje for your advice, have opened a pull request to update the doc a little bit, to add info for users to avoid the confusion of separating tags using ,, attaching the screenshot of the doc changes below
image
would like to request your review of this PR .
Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants