Skip to content
This repository has been archived by the owner on Sep 8, 2023. It is now read-only.

Commit

Permalink
Fix escape sequences in Python's strings (rouge-ruby#1508)
Browse files Browse the repository at this point in the history
Version 3.18.0 of Rouge included a new mechanism for handling strings
in the Python lexer. One of the consequences of that change was that
raw strings would break if they included 'invalid' escape sequences.
This is a mistake as raw strings do not have 'invalid' escape sequences.

This commit fixes this error by changing the approach that the Python
lexer takes to escape sequences more generally. Rather than dividing
recognised and unrecognised escape sequences into 'valid' and 'invalid',
it simply treats unrecognised escape sequences as ordinary strings.

The result of this is that all escape sequences in raw strings produce
`Str` tokens. In other types of strings, recognised escape sequences
produce `Str::Escape` tokens and unrecognised escape sequences produce
`Str` tokens.
  • Loading branch information
pyrmont authored and mattt committed May 21, 2020
1 parent c71f7b9 commit 1dc81df
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 14 deletions.
20 changes: 6 additions & 14 deletions lib/rouge/lexers/python.rb
Expand Up @@ -186,14 +186,7 @@ def current_string
end
end

rule %r/\\/ do |m|
if current_string.type? "r"
token Str
else
token Str::Interpol
end
push :generic_escape
end
rule %r/(?=\\)/, Str, :generic_escape

rule %r/{/ do |m|
if current_string.type? "f"
Expand All @@ -206,23 +199,22 @@ def current_string
end

state :generic_escape do
rule %r(
rule %r(\\
( [\\abfnrtv"']
| \n
| newline
| N{[a-zA-Z][a-zA-Z ]+[a-zA-Z]}
| u[a-fA-F0-9]{4}
| U[a-fA-F0-9]{8}
| x[a-fA-F0-9]{2}
| [0-7]{1,3}
)
)x do
if current_string.type? "r"
token Str
else
token Str::Escape
end
token (current_string.type?("r") ? Str : Str::Escape)
pop!
end

rule %r/\\./, Str, :pop!
end

state :generic_interpol do
Expand Down
2 changes: 2 additions & 0 deletions spec/visual/samples/python
Expand Up @@ -47,6 +47,7 @@ def baz():
'\UaaaaAF09'
'\xaf\xAF\x09'
'\007'
'.*\[p00t_(d\d{4})\].*' # There are no escape sequences in this string

# escaped characters in raw strings
def baz():
Expand All @@ -56,6 +57,7 @@ def baz():
r'\UaaaaAF09'
r'\xaf\xAF\x09'
r'\007'
r'.*\[p00t_(d\d{4})\].*'

# line continuations
apple.filter(x, y)
Expand Down

0 comments on commit 1dc81df

Please sign in to comment.