Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ Lexer ErrorToken: \ # ' #2207

Closed
martinburchell opened this issue Aug 14, 2022 · 5 comments · Fixed by #2208
Closed

C++ Lexer ErrorToken: \ # ' #2207

martinburchell opened this issue Aug 14, 2022 · 5 comments · Fixed by #2208
Labels
A-lexing area: changes to individual lexers
Milestone

Comments

@martinburchell
Copy link

I've attached some files that are failing with the C++ Lexer. Most have been broken since c1a0d82. One has been broken since fc56ab8

pygments_cpp_tests.tar.gz

@jeanas
Copy link
Contributor

jeanas commented Aug 14, 2022

@amitkummer @hgruniaux

@amitkummer
Copy link
Contributor

Looking into this.

@jeanas
Copy link
Contributor

jeanas commented Aug 14, 2022

It seems to work if you wrap the snippets in a function. For example, this is lexed okay:

int main() {
  const QString EMAIL_RE_STR(
      // Regex for an e-mail address.
      // From colander.__init__.py, in turn from
      // https://html.spec.whatwg.org/multipage/input.html#e-mail-state-(type=email)
      // Note that C++ raw strings start R"( and end )"

      R"(^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9])"
      R"((?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9])"
      R"((?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$)"
  );
}

AFAICS, the problem is that without the enclosing function, we try to parse the statement as a function definition/declaration. Of course, the code given is not valid as a standalone program, but in documentation, it makes sense to give examples without context (e.g. without a main() function). Then, when we see

type identifier(stuff ...);

it is tricky to know if this is a function declaration or a declaration of a variable that is called by initializing a constructor. Cf. “most vexing parse”.

I'm not sure yet what's the best way to fix this. Maybe just this?

diff --git a/pygments/lexers/c_cpp.py b/pygments/lexers/c_cpp.py
index 5d3b9c7d..198d288d 100644
--- a/pygments/lexers/c_cpp.py
+++ b/pygments/lexers/c_cpp.py
@@ -132,9 +132,9 @@ class CFamilyLexer(RegexLexer):
              r'(' + _possible_comments + r')'    # possible comments
              r'(' + _namespaced_ident + r')'             # method name
              r'(' + _possible_comments + r')'    # possible comments
-             r'(\([^;]*?\))'                          # signature
+             r'(\([^;"]*?\))'                          # signature
              r'(' + _possible_comments + r')'    # possible comments
-             r'([^;{/]*)(\{)',
+             r'([^;{/"]*)(\{)',
              bygroups(using(this), using(this, state='whitespace'), Name.Function, using(this, state='whitespace'),
                       using(this), using(this, state='whitespace'), using(this), Punctuation),
              'function'),
@@ -143,9 +143,9 @@ class CFamilyLexer(RegexLexer):
              r'(' + _possible_comments + r')'    # possible comments
              r'(' + _namespaced_ident + r')'             # method name
              r'(' + _possible_comments + r')'    # possible comments
-             r'(\([^;]*?\))'                          # signature
+             r'(\([^;"]*?\))'                          # signature
              r'(' + _possible_comments + r')'    # possible comments
-             r'([^;/]*)(;)',
+             r'([^;/"]*)(;)',
              bygroups(using(this), using(this, state='whitespace'), Name.Function, using(this, state='whitespace'),
                       using(this), using(this, state='whitespace'), using(this), Punctuation)),
             include('types'),

It's heuristic, and given that we're not going to reimplement a full-fledged C++ parser which is complex like hell, it will always remain heuristic ...

@jeanas
Copy link
Contributor

jeanas commented Aug 14, 2022

which is complex like hell

(actually not possible for us since parsing C++ requires knowing the contents of the include files)

@martinburchell
Copy link
Author

@jean-abou-samra @amitkummer Thanks for looking into this. In case it is of any help, the original code is at https://github.com/ucam-department-of-psychiatry/camcops/tree/master/tablet_qt

jeanas added a commit to jeanas/pygments that referenced this issue Aug 14, 2022
…ons and declarations

Something like

id id2("){ ... }");

is no longer wrongly recognized as a "function"

id id2(") {
  ...
}
");

As the difference in the tests shows, this has the unfortunate side
effect that we no longer highlight something like

int f(param="default");

as a function declaration, but it is hard to imagine another way to
fix this (cf. “most vexing parse” problem).

Fixes pygments#2207
jeanas added a commit to jeanas/pygments that referenced this issue Aug 14, 2022
…ons and declarations

Something like

id id2("){ ... }");

is no longer wrongly recognized as a "function"

id id2(") {
  ...
}
");

As the difference in the tests shows, this has the unfortunate side
effect that we no longer highlight something like

int f(param="default");

as a function declaration, but it is hard to imagine another way to
fix this (cf. “most vexing parse” problem).

Fixes pygments#2207
jeanas added a commit that referenced this issue Aug 15, 2022
…ons and declarations (#2208)

Something like

id id2("){ ... }");

is no longer wrongly recognized as a "function"

id id2(") {
  ...
}
");

As the difference in the tests shows, this has the unfortunate side
effect that we no longer highlight something like

int f(param="default");

as a function declaration, but it is hard to imagine another way to
fix this (cf. “most vexing parse” problem).

Fixes #2207
@Anteru Anteru added this to the 2.13.0 milestone Aug 15, 2022
@Anteru Anteru added the A-lexing area: changes to individual lexers label Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lexing area: changes to individual lexers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants