Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix escape sequences in Python's raw strings #1508

Merged
merged 5 commits into from May 12, 2020

Conversation

pyrmont
Copy link
Contributor

@pyrmont pyrmont commented Apr 24, 2020

Version 3.18.0 of Rouge included a new mechanism for handling strings in the Python lexer. One of the consequences of that change was that raw strings would break if they included 'invalid' escape sequences. This is a mistake as raw strings do not have 'invalid' escape sequences.

This PR fixes that error by adding a special state for escape sequences with raw strings. A separate state is still necessary because \ can be used in raw strings to prevent the particular string's delimiter from matching.

This fixes #1507.

@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Apr 24, 2020
@pyrmont pyrmont self-assigned this Apr 24, 2020
@pyrmont
Copy link
Contributor Author

pyrmont commented Apr 24, 2020

@tuxu Sorry again about the regression. How does this look?

lib/rouge/lexers/python.rb Outdated Show resolved Hide resolved
Copy link

@tuxu tuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fast fix, @pyrmont!

I stumbled upon a section in the Python manual about string literals and ran your changes over a couple more raw strings. Looks good to me 👍

@tueksta
Copy link

tueksta commented May 11, 2020

Looks great, let's get this merged :)

@pyrmont pyrmont merged commit ead03fa into rouge-ruby:master May 12, 2020
@pyrmont pyrmont deleted the bugfix.python-string-escapes branch May 12, 2020 09:08
@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label May 12, 2020
@pyrmont
Copy link
Contributor Author

pyrmont commented May 12, 2020

@tuxu @tueksta Thanks for your review. Version 3.19.0 of Rouge is scheduled for later tonight/early tomorrow (depending on your time zone). This will be part of that release 🎉

pyrmont added a commit that referenced this pull request May 12, 2020
The use of a ternary in code added to the Python lexer as part of the
#1508 pull request results in a RuboCop warning concerning parentheses
and grouped expressions. This commit fixes that warning.
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 21, 2020
Version 3.18.0 of Rouge included a new mechanism for handling strings
in the Python lexer. One of the consequences of that change was that
raw strings would break if they included 'invalid' escape sequences.
This is a mistake as raw strings do not have 'invalid' escape sequences.

This commit fixes this error by changing the approach that the Python
lexer takes to escape sequences more generally. Rather than dividing
recognised and unrecognised escape sequences into 'valid' and 'invalid',
it simply treats unrecognised escape sequences as ordinary strings.

The result of this is that all escape sequences in raw strings produce
`Str` tokens. In other types of strings, recognised escape sequences
produce `Str::Escape` tokens and unrecognised escape sequences produce
`Str` tokens.
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 21, 2020
The use of a ternary in code added to the Python lexer as part of the
rouge-ruby#1508 pull request results in a RuboCop warning concerning parentheses
and grouped expressions. This commit fixes that warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python lexer broken for raw strings in v3.18.0
3 participants