Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a catastrophic backtracking bug in JavaLexer #1594

Merged
merged 2 commits into from Nov 9, 2020

Conversation

kurtmckee
Copy link
Contributor

This closes #1586, which triggered the bug.

  1. Pygments guessed from an input string that it should use SspLexer to lex the file.
  2. The SspLexer is a delegating lexer that relies on XmlLexer and JspRootLexer.
  3. JspRootLexer relies on regular expressions from the JavaLexer.
  4. The JavaLexer has a catastrophic backtracking bug in its string literal regular expression.

This patch addresses the problem by replacing the string literal regex with a sub-state for string literal parsing.

@Anteru Anteru merged commit fb40b71 into pygments:master Nov 9, 2020
@Anteru
Copy link
Collaborator

Anteru commented Nov 9, 2020

Thanks! High time we add pytest-time, I guess :)

@Anteru Anteru added this to the 2.7.3 milestone Nov 9, 2020
@Anteru Anteru added the changelog-update Items which need to get mentioned in the changelog label Nov 9, 2020
@kurtmckee
Copy link
Contributor Author

Thanks! High time we add pytest-time, I guess :)

If it's okay with you and Georg, I'll work on a patch to add pytest-timeout (which will attempt to hard-stop tests that run too long) and to update existing backtrack-checking tests to use the timeout.

Let me know if that's something you'd like me to tackle. =)

@Anteru
Copy link
Collaborator

Anteru commented Nov 9, 2020

I have no objection, and given Georg didn't red flag it, I'd say please go ahead. It's fairly obvious to me by now that backtracking needs robust testing, and I don't see how we can do it without timeouts.

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Nov 9, 2020 via email

@birkenfeld
Copy link
Member

Sure, I'd just like a graceful behavior if the timeout plugin isn't installed.

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Nov 9, 2020 via email

@Anteru Anteru removed the changelog-update Items which need to get mentioned in the changelog label Dec 5, 2020
@kurtmckee
Copy link
Contributor Author

By the way, I just tested the pytest timeout plugin. It supports a thread-based and a signal-based method.

On Windows, which only supports thread-based timeouts, the plugin is unable to interrupt catastrophic backtracking like this:

import re

def test_catastrophic_backtracking():
    re.search(r'\d+\d+a', '0' * 10_000)

The thread-based timeouts can successfully interrupt operations like time.sleep(60) but cannot successfully interrupt catastrophic backtracking. For this reason I'm not pursuing using the pytest-timeout plugin.

@kurtmckee kurtmckee deleted the java-backtracking branch March 3, 2021 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Denial of service with malformed file
3 participants