Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve f-string expression detection regex so ... #2437

Merged
merged 1 commit into from Aug 23, 2021

Conversation

ichard26
Copy link
Collaborator

we don't accidentally add backslashes to them when normalizing quotes
because that's invalid syntax!

The problem this commit fixes is that matches would eat too much
blocking important matches to occur. For example, here's one f-string
body:

{a}{b}{c}

I know there's no risk of introducing backslashes here, but the regex
already goes sideways with this. Throwing this example at regex101
I get:

{a}{b}{c}   # The As and Bs are the two matches, and the upper
---- ----   # case letters are the groups with those matches.
aAaa bbBb

... we've missed the middle expression (so if any backslashes in a
more complex example were introduced there we wouldn't bail out
even though we should -- hence the bug). As it stands the regex
needs somesort of extra character (or the start/end of the body)
around the expressions but that isn't always the case as shown
above.

The fix implemented here is to turn the "eat a surrounding non-curly
bracket character" groups ie. (?:[^{]|^) and (?:[^}]|$) into
negative lookaheads and lookbehinds. This still guarantees the
already specified rules but without problematically eating extra
characters ^^


Fixes #2348.

we don't accidentally add backslashes to them when normalizing quotes
because that's invalid syntax!

The problem this commit fixes is that matches would eat too much
blocking important matches to occur. For example, here's one f-string
body:

    {a}{b}{c}

I know there's no risk of introducing backslashes here, but the regex
already goes sideways with this. Throwing this example at regex101
I get:

    {a}{b}{c}   # The As and Bs are the two matches, and the upper
    ---- ----   # case letters are the groups with those matches.
    aAaa bbBb

... we've missed the middle expression (so if any backslashes in a
more complex example were introduced there we wouldn't bail out
even though we should -- hence the bug). As it stands the regex
needs somesort of extra character (or the start/end of the body)
around the expressions but that isn't always the case as shown
above.

The fix implemented here is to turn the "eat a surrounding non-curly
bracket character" groups ie. `(?:[^{]|^)` and `(?:[^}]|$)` into
negative lookaheads and lookbehinds. This still guarantees the
already specified rules but without problematically eating extra
characters ^^
@ichard26 ichard26 added the F: strings Related to our handling of strings label Aug 22, 2021
@JelleZijlstra JelleZijlstra merged commit 8c04847 into main Aug 23, 2021
@JelleZijlstra JelleZijlstra deleted the avoid-regex-troubles-with-strings branch August 23, 2021 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: strings Related to our handling of strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to parse single-quote f-string with backslash inside it
2 participants