Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: added a call limit setting to prevent crashes or timeouts #684

Merged
merged 6 commits into from Aug 10, 2022

Conversation

tomtau
Copy link
Contributor

@tomtau tomtau commented Aug 9, 2022

closes #674, closes #675

@tomtau tomtau requested a review from a team as a code owner August 9, 2022 13:52
@tomtau tomtau requested review from NoahTheDuke, CAD97 and a team and removed request for a team August 9, 2022 13:52
pest/src/parser_state.rs Outdated Show resolved Hide resolved
@dragostis
Copy link
Contributor

#675 looks like it shouldn't be closed by this since it's a regression. Ideally, the optimizer should be able to detect all exponential cases and have replacement rules for them. Maybe someone could take a look at egg and implement a simplified version of the grammar there so peephole optimizations can be more easily written?

@tomtau
Copy link
Contributor Author

tomtau commented Aug 10, 2022

@dragostis thanks. Yes, I agree that for the timeout issue, there should be a fix for the root cause. For the edge case found by the fuzzer, it seems it's due to the block comment feature introduced in this PR: #332
There are, however, a few other open issues related to performance which may or may not have the same root cause, e.g. #603 #571 #402 #279

Given the investigation and the corresponding proper backwards-compatible fix for timeouts may take time, there are potentially two ways (with regards to this API: #684 (comment) ) to keep set_call_limit with a single argument:

  1. make it track overall calls (true). This should fix those fuzzer issues, as demonstrated by the unit tests.
  2. make it track call depths (false). This should fix only the stack overflow crashes, but not the timeout issues.

I'm leaning towards the first one, because, in the meantime, it's practical in the sense it'd allow people to use pest on potentially problematic inputs (without worrying it'd eat up all CPU time) if their grammars suffered from this issue.
I'd then open up an issue that would reference the fuzzer timeout issue as well the other performance issues.

--
BTW, as for the solution, one potential option that comes to mind is packrat parsing with a fix: https://dl.acm.org/doi/10.1145/3377555.3377898 There's one older issue that was closed #7 (comment) -- do you have any more insight into why packrat parsing was not used?

@dragostis
Copy link
Contributor

dragostis commented Aug 10, 2022

Packrat parsing was not used because caching data is slow. While it removes the exponential cases, it adds substantial overhead for all other parsing.

My solution to this was to identify such patterns and replace them in the optimizer. This has worked for some cases, but I haven't had the time to identify all patterns of interest. Also, I have no proof that all such patterns are actually replaceable with equivalent but non-exponential patterns.

My vision for pest when I started it was to actually use something like ruler (which didn't exist at the time) to find these rules and use egg to optimize the grammars. Currently, I think this is non-trivial since pest's grammar is in need of a cleanup. I started working on a new meta grammar in pest3 a few years ago, but didn't have a chance to make it work. It has a few interesting ideas, including much better code generation, if you have the time to take over that work.

For the rewrite rules inference, you'd need a simplified version of the grammar, something like the one here, where all grammar would be reduced to something in the lines of this.

@tomtau
Copy link
Contributor Author

tomtau commented Aug 10, 2022

@glyn The API was simplified to only track calls as per the discussion: #684 (comment)

Option<NonZeroUsize> makes it a bit wordy with literals, but that may be improved in future Rust versions:
rust-lang/rust#69329

@tomtau tomtau requested a review from glyn August 10, 2022 10:09
pest/src/parser_state.rs Outdated Show resolved Hide resolved
Copy link

@glyn glyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docs of set_call_limit.

@tomtau tomtau requested a review from glyn August 10, 2022 13:48
@tomtau
Copy link
Contributor Author

tomtau commented Aug 10, 2022

@glyn fixed, thanks for catching that!

Copy link

@glyn glyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements suggested, but at the author's discretion.

Copy link

@glyn glyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocking review. Please note I have only reviewed a small part of this PR.

Copy link

@glyn glyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Copy link
Member

@NoahTheDuke NoahTheDuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion but take it or not, this is a good change.

@tomtau tomtau merged commit 9c42f12 into pest-parser:master Aug 10, 2022
Comment on lines +93 to +95
if let Some((current, _)) = &mut self.current_call_limit {
*current += 1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomtau @NoahTheDuke
Just for interest, there's yet another way to code this:

        if let Some((ref mut current, _)) = self.current_call_limit {
            *current += 1;
        }

This cropped up here in @jonhoo's video Crust of Rust: Lifetime Annotations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's cool. I didn't know you could embed those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

oss-fuzz: timout in parser oss-fuzz: stack overflow in pest::parser_state::ParserState<R>::rule
4 participants