-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error parsing Python raw f-string with backslash #1681
Comments
I think I now understand the problem and have the solution. In 'fstringescape': [
(r'\{\{', String.Escape),
(r'\}\}', String.Escape),
include('stringescape'),
], i.e. We can try adding the missing third elements, resulting in the new code: # raw f-strings
('(?i)(rf|fr)(""")',
bygroups(String.Affix, String.Double),
combined('fstringescape', 'tdqf')),
("(?i)(rf|fr)(''')",
bygroups(String.Affix, String.Single),
combined('fstringescape', 'tsqf')),
('(?i)(rf|fr)(")',
bygroups(String.Affix, String.Double),
combined('fstringescape', 'dqf')),
("(?i)(rf|fr)(')",
bygroups(String.Affix, String.Single),
combined('fstringescape', 'sqf')), The 'rfstringescape': [
(r'\{\{', String.Escape),
(r'\}\}', String.Escape),
], Now using # raw f-strings
('(?i)(rf|fr)(""")',
bygroups(String.Affix, String.Double),
combined('rfstringescape', 'tdqf')),
("(?i)(rf|fr)(''')",
bygroups(String.Affix, String.Single),
combined('rfstringescape', 'tsqf')),
('(?i)(rf|fr)(")',
bygroups(String.Affix, String.Double),
combined('rfstringescape', 'dqf')),
("(?i)(rf|fr)(')",
bygroups(String.Affix, String.Single),
combined('rfstringescape', 'sqf')), my raw f-strings containing backslashes within |
Thank you for leading the way. I have a version that more or less solves all these problems for all bytes and string literals, I still need to clean up a bit but results are great so far. Updated examples from https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals. |
That looks quite promising -- looking forward to the PR! |
QuestionI wonder why Pygments (in the case of Python code) doesn't just use the built-in machinery of Python, where all of the tokenization/lexing etc. is fully exposed? Why reimplement all of this from scratch, which is obviously hard to get 100% right? |
@jmd-dk a highlighter is not the same as a language tokenizer.
On the other hand, Pygments needs to be more lenient since it doesn't only support the one version of the langauge it happens to run on: using |
Please see #1654 for why this is not a good idea. |
@birkenfeld Hmmm ..
Pygments is literally tokenizing input text in order to output to other formats ( We're just talking about replacing the tokenizing process for Python (could be an opt-in) since there's a built-in module for that, that will:
Even tho, i have a pretty complete version that I'm very happy with right now. I know there might be some special edge case I've graciously missed because ey I've got only two hands and I'm living on a blue planet after all. And i definitly don't care if this is either a good or a bad idea, if it works, it works 😌 |
I do care if it's a good idea or a bad idea though. As @birkenfeld correctly points out, someone running Python 3.5 wouldn't get any Python 3.9 specific features highlighted, and that is in fact a blocker for us, because we want the same experience across all supported platforms. It also dramatically increases the testing matrix (it simply doubles it.) I'm not supportive of turning Pygments into a proper Python parser (no matter what underlying mechanism) and I'd rather live with some inaccuracies, but consistent behavior across all supported versions of Python. |
I was a bit ambiguous, for "language tokenizer" read "tokenizer used by a language interpreter". Of course both are tokenizing/lexing, that's in the name.
In general, using a completely different strategy for one of the lexers compared to 99% of the others increases the bar for contributing. I know what I'm talking about, we have a few of them and I'd rather they get rewritten.
As discussed before, this is nice, but the other direction breaks.
If you have a look at
No problem with that, if you're happy with it by all means use it. But it's not going to fly in a PR here. In any case, this is unrelated to the issue at hand - which has been fixed by #1683. |
I disagree it's incorrect, |
Ok, I'll try once more: A |
You just keep repeating that but it's plain false. An f-string is tokenized as a string by Really, i'm only trying to explain how it could work and you don't care. Kinda blindly without any piece of documentation at hand on why it wouldn't work. I get that if you evaluate the code, it's not going to work in 3.5 i mean, obviously. It doesn't have to interpret it. |
... though I'm now slightly reconsidering whether I should have asked the question, I appreciate the answers and discussion 😄 |
The only ways out of this are a) ship tokenize.py with Pygments or b) add version-specific hacks. Neither of which makes the codebase simpler, and both still require version-specific updates in Pygments when syntax changes. Independently, some common updates needed in the Python lexer are completely independent of what
Nobody is talking about evaluation. It's not that I don't care, it's that I know what I'm talking about and I know a blocker when I see one. |
@jmd-dk I'm sorry, I know this is off-topic here but we don't have a better venue for now... |
I mean with regex you double at least what you need to type for the same result because you have to create a specific state when "entering" a string on first quote (since we need the same closing one) if we want to catch specific sub tokens inside of it (f-expression, etc). If it's not evaluation you're talking about i don't get you, because throwing python version incompatibility as a reason not to use tokenize is plain wrong.
It's a childish mapping problem after |
I don't think this discussion is leading anywhere. The arguments have been brought up, and our stance on this issue is clear. Let's move along here. If this is really important for you, you can always extend Pygments with a plugin that uses tokenize and use that exclusively for your Python lexing needs, but it's not something we will merge into Pygments upstream. |
Python(3) code using raw f-strings containing a backslash within escaped braces (i.e. double braces) cannot be parsed. A minimum example is
fr'{{\S'
Some pointers:
f'{{\S'
.r'{{\S'
rf'{{'
.rf'\S'
.rf'{{\S'
.S
is not important, though some character must be there in order for the syntax to be valid.This might seem like a pretty esoteric case. For me it comes up a lot when writing LaTeX in Matplotlib.
The text was updated successfully, but these errors were encountered: