Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple operator rule in PHP lexer #1478

Merged
merged 3 commits into from Apr 7, 2020

Conversation

pyrmont
Copy link
Contributor

@pyrmont pyrmont commented Apr 5, 2020

PHP has a rule that can match one or more symbols as operators. Although this permits combinations that are not syntactically correct, it provides some performance improvement for code that has sequential operators.

The problem is that two of these operators, ?>, have a special meaning in PHP. Matching multiple operators in one pass causes the ?> to be lexed incorrectly if it is immediately preceded by an operator. This is possible with the : operator. This PR fixes the rule by splitting it between operators that can appear multiple times in sequence and those that cannot.

This fixes #1362.

@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Apr 5, 2020
@pyrmont pyrmont self-assigned this Apr 5, 2020
@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 5, 2020

@HypeMC, @julp: How does this look to you?

@julp
Copy link
Contributor

julp commented Apr 5, 2020

Is it a problem if operators like ?? or ??= are highlighted as several tokens?

@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 5, 2020

@julp Not really. The issue isn't about the number of tokens that are output; Rouge will merge the identical contiguous tokens together in the final output. The issue is the number of characters that the regular expression matches. The earlier rule was matching one or more operator symbols but since these symbols included :, ? and >, it meant that :?> would get matched by this rule in one pass and ?> would never get matched by the rule for closing PHP tags.

@julp
Copy link
Contributor

julp commented Apr 5, 2020

In fact, only ? is problematic, right? We could achieve the same with just rule %r/[?]/, couldn't we? Whatever you can find before ?> the single fact that ? is apart garantee you some kind of break on the character ? and ?> won't (unlikely) be consumed by something else.

@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 7, 2020

@julp Good point about the ? being the real issue. I've rebased on the just-committed PHP lexer and split that out into its own rule. What do you think?

@julp
Copy link
Contributor

julp commented Apr 7, 2020

Yes unless you want to handle this with a negative assertion? (eg %r/([~!%^&*+=\|:.<>\/@-]|(\?(?!>)))+/) Which, IMO, would be more error prone and less readable.

@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 7, 2020

@julp Agreed. Will merge this in now.

@pyrmont pyrmont merged commit 32c315e into rouge-ruby:master Apr 7, 2020
@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label Apr 7, 2020
@pyrmont pyrmont deleted the bugfix.php-opening-tags branch April 7, 2020 23:03
@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 7, 2020

@julp Thanks for your help! Another bug fixed! :)

mattt pushed a commit to NSHipster/rouge that referenced this pull request May 21, 2020
The PHP lexer has a rule that can match one or more characters as
operators. The problem is that if an operator occurs immediately before
the `?>` that closes a block of PHP, the `?` will match as an operator.
This is possible with the `:` operator. This commit fixes the problem
by splitting `?` out into its own rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PHP alternative syntax not always highlighted properly
2 participants