[DRAFT]Handling error response when a comma is present in the operand name of tags #107

shivam-sehgal · 2023-07-08T15:19:56Z

🤔 What's changed?

to handle tags cases like @Step1,@step2

adding regex to handle cases if the operand name of tags contains commas, in future this regex can be further extended
to handle any new case that will arise

⚡️ What's your motivation?

heavily use this library, and wanted to contribute.

🏷️ What kind of change is this?

I came across issue. Although the documentation specifies the expected format for tags, it seems that the pattern used by the user, '@Step1,@step2,' did not generate any error. This discrepancy has the potential to confuse users. To address this, it would be beneficial to implement a validation mechanism that throws an appropriate error when such patterns are encountered.

🏦 Refactoring/debt/DX (improvement to code design, tooling, documentation etc. without changing behaviour)
🐛 Bug fix (non-breaking change which fixes a defect)
⚡ New feature (non-breaking change which adds new behaviour)
💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

📋 Checklist:

added support for regex filtering on operand names of tags
Limitations
1)Currently only handling case ',' in operand name
2)Currently just modified the java module of this lib

I agree to respect and uphold the Cucumber Community Code of Conduct
I've changed the behaviour of the code
- I have added/updated tests to cover my changes.
My change requires a change to the documentation.
- I have updated the documentation accordingly.
Users should know about my change
- I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

This text was originally generated from a template, then edited by hand. You can modify the template here.

jkronegg · 2023-07-08T21:57:23Z

I like the idea of having cucumber notifying the user of misconfiguration issues. However, the proposed exception message (Tag expression "%s" could not be parsed because of syntax error: Tag names should follow this pattern "%s"") is a bit rough.

As the natural understanding of ”@tag1,@tag2" is a an "or/and" combination of both tags, the exception message could be friendlier and suggest to use one of the operators (e.g. something like "It looks like your are trying to combine tags using comma (','), which is not supported as tag combination operators. Please replace it by either 'or' or 'and' operators, e.g. '@tag1 or @tag2'.").

The PR missed unit tests.

jenisys · 2023-07-09T06:25:03Z

java/src/main/java/io/cucumber/tagexpressions/TagExpressionParser.java

@@ -106,6 +108,7 @@ private static List<String> tokenize(String expr) {
                isEscaped = true;
            } else if (c == '(' || c == ')' || Character.isWhitespace(c)) {
                if (token.length() > 0) {
+                    isOperandValid(token,expr);


The name of the method isOperandValid() is actually wrong here.
Correct, that is your intention but also operator-strings like and, but and not are checked by this method.

Thank you, @jenisys , for your review and feedback. I have made the necessary adjustment by updating the method name. As you rightly mentioned, apart from operands, this method will also handle validations for operators. Please confirm if the changes meet the requirements. Your continued assistance is appreciated

The method-name is now OK for me 👍

shivam-sehgal · 2023-07-09T08:05:03Z

I like the idea of having cucumber notifying the user of misconfiguration issues. However, the proposed exception message (Tag expression "%s" could not be parsed because of syntax error: Tag names should follow this pattern "%s"") is a bit rough.

As the natural understanding of ”@tag1,@tag2" is a an "or/and" combination of both tags, the exception message could be friendlier and suggest to use one of the operators (e.g. something like "It looks like your are trying to combine tags using comma (','), which is not supported as tag combination operators. Please replace it by either 'or' or 'and' operators, e.g. '@tag1 or @tag2'.").

The PR missed unit tests.

Thank you, @jkronegg, for your valuable comment and thorough review. I have enhanced the error message with additional details to provide clearer guidance. Additionally, I have included relevant unit tests to ensure proper validation. Please review the changes and kindly confirm if they meet the requirements. Your continued support is greatly appreciated.

jenisys · 2023-07-09T08:50:58Z

HINTS:

The change in the test data affects all implementations (Java, Go, …)
The implementation is currently only provided for Java
Therefore, the tests for all other languages that lack this implementation have test failures

mpkorstanje

This PR prevents the use of commas inside tag expressions altogether.

However comma's are valid characters inside a tag. Consequently they must be a valid character to use in a tag expression. For example a scenario may be tagged with @a1,b2. Which could then be selected by the tag expression @a1,b2

  @a1,b2
  Scenario: Example 1

Likewise a scenario might be tagged with @a1, and @b2. This should be selectable through the tag expression @a1, and @b2.

  @a1,@b2
  Scenario: Example 2

  @a1, @b2
  Scenario: Example 3

The intent of this PR is to inform users they are using a tag-expression syntax from v1 while using v2. Currently I do not believe there way to accept all valid v2+ tag expressions while rejecting all v1 expressions. There may however the possibility of validating literals.

A tag in Gherkin is delimited by @ and Gherkin does not allow the @ to be escaped. So every tag matches the pattern ^@[^@]+. So any tag expression literal that contains a @ after the starting @ can not possibly target a gherkin tag.

For example

  @a1@b2
  Scenario: Example 4

This scenario is tagged with @a1 and @b2 rather than @a1@b2. So the tag expression @a1@b2 will never find any valid scenarios and could be considered invalid.

This will incidentally also capture any use of the v1 tag expressions of the form @a1,@b2 and also ~@a1.

Proposal:

Validate that Gherkin tags do indeed follow the ^@[^@]+ pattern
Validate literals in tag expressions to match the ^@[^@]+ pattern.
(Optional) Provide the user with examples of correct tag expressions in the error message. Note this should be done for all parse errors, not just the one's we are adding now.

mpkorstanje · 2023-07-10T00:05:46Z

java/src/main/java/io/cucumber/tagexpressions/TagExpressionParser.java

@@ -11,7 +11,7 @@

 public final class TagExpressionParser {
    //regex for token to ensure no token has ',' in them later can be customized further
-    private final static String VALID_TOKEN = "(?:@[^@]*|and|or|not|/(|/))$";
+    private final static String VALID_TOKEN = "^(?:@[^@]*|and|or|not|\\(|\\))$";


By trying to validate all tokens rather than the litterals you are making this more complex than needed. Considering that this implementation will have to be repeated for a few languages, perhaps a review in the middle of just the Java implementation can save some duplication of effort.

Thanks, @mpkorstanje, for your reply
I assume that a tag expression consists of operands and operators. When splitting the expression, there are two possibilities. The current token or string could be an operator, which includes "(", ")", "and", "or", and "not". Alternatively, it could be an operand, which should adhere to the Gherkin standard of starting with "@" and not containing "@" anywhere else.

To handle this, I have used a regex pattern to ensure that only valid tokens, which form the expression, are evaluated. By considering both operands and operators, it simplifies the code and eliminates the need for additional checks to determine the token type.

I believed this approach reduces complexity and improves the readability of the code, but maybe I might be missing someting, I would appreciate your opinion on this.

Well, actually the situation is slightly different (as you can see from the tests / testdata):

Tags in a tag-expression may either be @foo (with-leading-atsign) or foo (without-leading-atsign)

@ (atsign) should not occur in a tag (after the first char) in a tag-expression

Gherkin parser removes the @ (atsign) from tags for Features/Rules/Scenarios/… . Therefore, @foo becomes“foo“ internally

SEE ALSO:

testdata/parsing.yml

testdata/errors.yml

python/tests/functional/test_tag_expression.py#L57

Gherkin parser removes the @ (atsign)

@jenisys are you sure about that? Or is that true only for some Gherkin parsers?

but maybe I might be missing someting, I would appreciate your opinion on this.

@shivam-sehgal without repeating myself that will be a bit difficult.

But do consider validating only the litterals. In essence, do not put the validation in the middle of tokenizer but rather after tokenization, just prior to litteral creation.

@mpkorstanje cucumber-jvm basically calls this library to parse the tag expression as well as to validate if the tag expression is valid or not so the flow is like this in cucmberOptions annotation user provides the tags expression shown in the below code

# — FILE: SomeClass.java @CucumberOptions(features = "src/it/resources/features", glue = {"com.adobe.aep.devex.exim.integrationtests.stepDefinitions", "com.adobe.aep.devex.exim.integrationtests.config", "com.adobe.aep.devex.exim.integrationtests.hooks" }, plugin = {"pretty", "html:target/cucumber-html-report.html", "json:target/cucumber.json", "rerun:target/cucumber-api-rerun.txt" }, tags = "not @ignore and @hello" // Exclude the feature with the "@ignore" tag ) …

this annotation is parsed by CucumberOptionsAnnotationParser.class.
addTags method in this class is called which then calls this repo to validate the tag expression, now the intent of this PR was users by mistake put tag expressions like @tag1,@tag2 in the tags section in the CucumberOptions, and they don't get any error message, to avoid that we need to validate the tag expression not having any tags or literals like @tag,@againInTag to prevent this confusion and give users a valid error response so they can understand that they need to use logical operator, instead of commas, I can put this tag validation for the provided tag expression in addTags method of the CucumberOptionsAnnotationParser clas i.e in cucumber jvm, but my mind says it's better to keep validation logic inside this tag expression validation repo only to follow the single responsibility principle of oops,
to address this concern with the least impacting change, for now, I see if there is an expression having a tag starting with @ that tag shouldn't have @ after first, it will prevent this confusion with a proper error message letting user understand that they are doing mistake and cucumber is reading their expression as a single tag, and also won't affect any case where we allow the literals that are not starting with @

In short, cucumber-jvm calls this repo to validate the tag expression passed by the user and uses a message from TagExpressionExceptionto let the user know with a message what is wrong with the expression, the message should be coming from here i.e from this repo's java directory, else the responsibility of this code we have to handle separately in cucumber-jvm.

One solution approach could be:

if CucumberOptions had an optional parameter, like tag_expression_validator, tag_expression_parser or an tag-expression preprocess hook you could perform a pre-process check

Pre-process check would just split the tag-expression string value into words and check if comma(s) are contained in any word

If a comma is contained, you issue a log-warning or raise an exception (whatever the „right“ consequent action is), …

This approach would have the following advantages:

Strict checking is optional

If you want/need strict checking of tag-expressions, you can plug-in the functionality

HINT:
By looking at the current implementation of the CucumberOptions class, this needed parameter would need to be added to support such a kind of solution.

@jenisys, I find the approach of using an optional flag for literal validation quite appealing. However, I have some doubts about the term "strict checking." If libraries using this repository are enforcing Gherkins rules, then it seems reasonable for this library to do the same. However, if we believe and would like to continue supporting users who define literals not starting with @ in new versions, then the optional parameter approach makes sense.

One potential concern raised by @mpkorstanje is that checking for commas alone may cause issues. To differentiate between v1 and v2, perhaps we should consider checking for ^@[^@]+ which he mentioned if the optional strict validation flag is passed in CucumberOptions.

I would appreciate hearing both of your opinions on this matter. Let's collaborate and I can begin working on it.

cc: @mpkorstanje

I think the cost of the complexity is starting to exceed the value of the solution.

Perhaps a simpler solution such as improving the documentation in Cucumber JVM might help?

thanks @mpkorstanje for your advice, have opened a pull request to update the doc a little bit, to add info for users to avoid the confusion of separating tags using ,, attaching the screenshot of the doc changes below

would like to request your review of this PR .
Thanks in advance

to handle tags cases like @Step1,@step2

ffda0cd

shivam-sehgal mentioned this pull request Jul 8, 2023

@Before is not invoked when multiple tags are passed in the runner file cucumber/cucumber-jvm#2776

Closed

jenisys reviewed Jul 9, 2023

View reviewed changes

shivamsehgal added 2 commits July 9, 2023 12:43

review comments regarding method naming and messages

04af03e

review comments regarding adding unit tests

d54c62e

mpkorstanje requested changes Jul 9, 2023

View reviewed changes

testing ruby and some unit test changes

09819cf

shivam-sehgal changed the title ~~Handling error response when a comma is present in the operand name of tags~~ [DRAFT]Handling error response when a comma is present in the operand name of tags Jul 9, 2023

shivamsehgal added 5 commits July 9, 2023 22:50

testing ruby and some unit test changes

2486b55

changing for ruby

1183630

javascript changes

2665a56

perl changes

78b460d

go changes

e7198f0

mpkorstanje reviewed Jul 10, 2023

View reviewed changes

python changes

fae80c3

shivam-sehgal mentioned this pull request Aug 12, 2023

Doc changes to avoid user confusion of seperating tags with , in expr… cucumber/docs#911

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

shivam-sehgal commented Jul 8, 2023 •

edited

jkronegg commented Jul 8, 2023

jenisys Jul 9, 2023

shivam-sehgal Jul 9, 2023

jenisys Jul 9, 2023

shivam-sehgal commented Jul 9, 2023

jenisys commented Jul 9, 2023

mpkorstanje left a comment •

edited

mpkorstanje Jul 10, 2023 •

edited

shivam-sehgal Jul 10, 2023

jenisys Jul 10, 2023

mpkorstanje Jul 10, 2023

mpkorstanje Jul 10, 2023

shivam-sehgal Jul 23, 2023 •

edited by jenisys

jenisys Jul 23, 2023

shivam-sehgal Jul 23, 2023

mpkorstanje Aug 10, 2023

shivam-sehgal Aug 12, 2023 •

edited

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

Are you sure you want to change the base?

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

Conversation

shivam-sehgal commented Jul 8, 2023 • edited

🤔 What's changed?

⚡️ What's your motivation?

🏷️ What kind of change is this?

♻️ Anything particular you want feedback on?

📋 Checklist:

jkronegg commented Jul 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivam-sehgal commented Jul 9, 2023

jenisys commented Jul 9, 2023

mpkorstanje left a comment • edited

Choose a reason for hiding this comment

mpkorstanje Jul 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivam-sehgal Jul 23, 2023 • edited by jenisys

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivam-sehgal Aug 12, 2023 • edited

Choose a reason for hiding this comment

shivam-sehgal commented Jul 8, 2023 •

edited

mpkorstanje left a comment •

edited

mpkorstanje Jul 10, 2023 •

edited

shivam-sehgal Jul 23, 2023 •

edited by jenisys

shivam-sehgal Aug 12, 2023 •

edited