Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using alternates within escaped parentheses doesn't work #75

Open
mattwynne opened this issue Feb 22, 2022 · 7 comments
Open

Using alternates within escaped parentheses doesn't work #75

mattwynne opened this issue Feb 22, 2022 · 7 comments

Comments

@mattwynne
Copy link
Member

mattwynne commented Feb 22, 2022

👓 What did you see?

In this expression:

I run cucumber-js \(installed locally/globally\)

The phrase I run cucumber-js installed locally doesn't match.

The generated regular expression is shown as:

/^I run cucumber-js \(installed (?:locally|globally\))$/

✅ What did you expect to see?

Instead of the generated regular expression

/^I run cucumber-js \(installed (?:locally|globally\))$/

I think it should be

/^I run cucumber-js \(installed (?:locally|globally)\)$/

I guess we're gobbling up the trailing \) as part of the word globally

📦 Which version are you using?

cucumber-expressions javascript 15.0.1

🔬 How could we reproduce it?

https://cucumber.github.io/cucumber-expressions/?advanced=1&expression=I%20run%20cucumber-js%20%5C%28installed%20locally%2Fglobally%5C%29&step=I%20run%20cucumber-js%20installed%20locally

🤔 Anything else?

Discovered in cucumber/cucumber-js#1926

@mpkorstanje
Copy link
Contributor

The idea that the \( and \) go together only exists in your head. The alternation is bounded only by spaces.

Compare:

\(b c/d\)
some rat/cats

Just because there is an s or ( at the start, it doesn't mean that the s or ) at the end is paired with it .

@mpkorstanje
Copy link
Contributor

Maybe a better question, since the locally/globally isn't captured, why even mention it? Or use "globally or locally" to make it clear that it doesn't matter.

@mattwynne
Copy link
Member Author

mattwynne commented Feb 22, 2022

In fact the locally/globally is captured (or at least was), so this isn't the right expression anyway for the particular problem at hand, but that's not really the point.

I was just surprised by the behaviour and thought it might be useful to make it less surprising.

The alternation is bounded only by spaces.

That's the rule I was questioning. Maybe it's more sensible to have it bound by any non-word character? I can just about imagine people wanting to write I have eaten {int} cucumbers/carrots, {int} apples/pears and having the same problem with the comma. The intent there would clearly be (?:cucumbers|carrots) not (?:cucumbers|carrots,)

(Personally I can't remember when I last used a comma in a Gherkin step, but hopefully it at least illustrates the point. 😀)

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Feb 22, 2022

Maybe it's more sensible to have it bound by any non-word character?

So any non-word character is [^a-zA-Z0-9_] which probably isn't what you mean.

There is the P category in unicode for punctuation, and several subcategories to choose from but the Po category that contains the full stop "." also contains the ampersand "&" so that doesn't work either.

There are a few more caveats. But truth be told, I don't think we can sensibly define the bounds of a alternation beyond space in a way that doesn't involve writing our own multi-language table of characters. And even if we could, we'd have to explain it to people. And then we'd lose the advantage of having a simpler form of regular expressions. So if this is what you want, use a regex. :)

https://en.wikipedia.org/wiki/Unicode_character_property#General_Category

https://www.fileformat.info/info/unicode/category/Po/list.htm

@mattwynne
Copy link
Member Author

Maybe it's more sensible to have it bound by any non-word character?

So any non-word character is [^a-zA-Z0-9_] which probably isn't what you mean.

That is exactly what I meant, yes! You seem to be saying that obviously wouldn't work but I can't see why. What did I miss?

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Feb 24, 2022

Не все пишут на латыни. Ή αγγλικά για αυτό το θέμα. そして、ガーキンは多くの言語で動作します。Y realmente, incluso el español tiene algunos caracteres que no son palabras en medio de sus palabras.

@mattwynne
Copy link
Member Author

🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants