Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

Closed
5225225 opened this issue Apr 1, 2021 · 16 comments · Fixed by #764
Closed

Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

5225225 opened this issue Apr 1, 2021 · 16 comments · Fixed by #764

Comments

@5225225
Copy link

5225225 commented Apr 1, 2021

This code segfaults with a stack overflow on both 0.9.0 from crates.io as well as loading from git (Version 35ef0ee)

use sqlparser::dialect::GenericDialect;
use sqlparser::parser::Parser;

fn main() {
    let sql = "(".repeat(1024);

    let dialect = GenericDialect {};

    let _ = Parser::parse_sql(&dialect, &sql);
}
@46bit
Copy link

46bit commented Mar 10, 2022

Still an issue as of 0.15.0

@46bit
Copy link

46bit commented Mar 22, 2022

@alamb Hi, pinging you as a maintainer. I want to use this crate, but I'd need to fix this issue to prevent users from triggering stack overflows. Hoping to get some feedback on whether you'd accept a fix.

Right now the Parser struct doesn't do any tracking of stack size. What I'm thinking is adding an extra depth: usize parameter to the various parse methods. Each method would return a ParserError if the recursion got deeper than a max_depth: Option<usize> field on Parser, preventing a stack overflow. The existing parse methods would then call out to the depth-limited variants supplying depth: 0.

Default behaviour would stay the same, but you could also prevent stack overflows by setting Parser.max_depth to a reasonable number.

Does that sound like a PR you'd be open to?

@alamb
Copy link
Collaborator

alamb commented Mar 22, 2022

Hi @46bit -- Thank you for offering.

The idea of adding a max_depth parameter to the existing (non pub) methods seems reasonable to me. Hopefully the interface that most external users see would remain the same

Also, for what it is worth, I have seen rust stack sizes significantly larger in debug code than release code -- so the actual stack limit may be much higher in practice in release builds (though certainly would still exist)

@46bit
Copy link

46bit commented Mar 22, 2022

@alamb Thanks for such a quick response, I appreciate it. I'll get working on the PR soonish and let you know how it goes :)

@5225225
Copy link
Author

5225225 commented Mar 22, 2022

Might be worth looking at what serde_json does, since I know they definitely have a depth limit by default. Their limit is 128.

@alamb
Copy link
Collaborator

alamb commented Mar 22, 2022

I think a default depth limit (that can be overridden) seems very reasonable to me and very much in the rust "safe by default" spirit

@46bit
Copy link

46bit commented Mar 22, 2022

serde-json is using a check_recursion { ... } macro each time it recurses deeper: https://github.com/serde-rs/json/blob/master/src/de.rs#L1273-L1304. That works OK for them, they only use it 8 times. But sqlparser branches a lot more and I think we'd use it hundreds of times:

pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
    match self.next_token() {
        Token::Word(w) => match w.keyword {
            Keyword::DESCRIBE => Ok(check_recursion! { self.parse_explain(true)? }),
            Keyword::EXPLAIN => Ok(check_recursion! { self.parse_explain(false)? }),
            Keyword::ANALYZE => Ok(check_recursion! { self.parse_analyze()? }),
            [...]

Using it inside each function is pretty bad too, the diff would be huge:

pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
    check_recursion! {
        match self.next_token() {
            Token::Word(w) => match w.keyword {
                Keyword::DESCRIBE => Ok(self.parse_explain(true)?),
                Keyword::EXPLAIN => Ok(self.parse_explain(false)?),
                Keyword::ANALYZE => Ok(self.parse_analyze()?),

The ideal from my point of view would probably be a macro annotation on each function, like this:

#[check_recursion]
pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
    [...]
}

Those macros have to be defined in a separate crate, and I can't find anything like it that already exists. It'd be by far the cleanest approach but I don't know if you'd be OK with introducing a tiny new dependency (sqlparser-rs/proc-macros?) @alamb?

@alamb
Copy link
Collaborator

alamb commented Mar 22, 2022

Those macros have to be defined in a separate crate

I don't understand why they need to be in a separate crate

Adding a macro to check recursion seems like a good idea to me.

@46bit
Copy link

46bit commented Mar 22, 2022

I don't understand why they need to be in a separate crate

Specifically the function annotation. As I understand it those have to be proc macros, and proc macros have to be in a separate crate (fourth paragraph of https://doc.rust-lang.org/reference/procedural-macros.html.)

@alamb
Copy link
Collaborator

alamb commented Mar 22, 2022

🤔 I am not sure how much time I have to devote to figuring out how to release multiple dependent crates in parallel for this crate

@nickolay
Copy link
Collaborator

I think we'd use it hundreds of times:

Why? I'd expect only a few productions of the SQL grammar to support recursion (e.g. subqueries, subexpressions, EXPLAIN-like commands that operate on other statements).

If we track the recursion depth only when recursing, maintaining the recursion counter manually shouldn't be much of a problem.

Also: were you really going to add an extra depth: usize parameter to the various parse methods or to the Parser struct itself?

@46bit
Copy link

46bit commented Mar 29, 2022

Why? I'd expect only a few productions of the SQL grammar to support recursion

I'd intended to track the parser's call stack. Good point though.

@46bit
Copy link

46bit commented May 18, 2022

I've opened an PR for this: #501. Feedback welcome.

@alamb
Copy link
Collaborator

alamb commented May 19, 2022 via email

@zidaye
Copy link
Contributor

zidaye commented Dec 7, 2022

    fn test_stack_overflow() {
        let pg_dialect = PostgreSqlDialect {};
        let sql = r#"
        SELECT COUNT(*) FROM (SELECT id, user_id FROM test WHERE ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((user_id = $1 OR user_id = $2) OR user_id = $3) OR user_id = $4) OR user_id = $5) OR user_id = $6) OR user_id = $7) OR user_id = $8) OR user_id = $9) OR user_id = $10) OR user_id = $11) OR user_id = $12) OR user_id = $13) OR user_id = $14) OR user_id = $15) OR user_id = $16) OR user_id = $17) OR user_id = $18) OR user_id = $19) OR user_id = $20) OR user_id = $21) OR user_id = $22) OR user_id = $23) OR user_id = $24) OR user_id = $25) OR user_id = $26) OR user_id = $27) OR user_id = $28) OR user_id = $29) OR user_id = $30) OR user_id = $31) OR user_id = $32) OR user_id = $33) OR user_id = $34) OR user_id = $35) OR user_id = $36) OR user_id = $37) OR user_id = $38) OR user_id = $39) OR user_id = $40) OR user_id = $41) OR user_id = $42) OR user_id = $43) OR user_id = $44) OR user_id = $45) OR user_id = $46) OR user_id = $47) OR user_id = $48) OR user_id = $49) OR user_id = $50) OR user_id = $51) OR user_id = $52) OR user_id = $53) OR user_id = $54) OR user_id = $55) OR user_id = $56) OR user_id = $57) OR user_id = $58) OR user_id = $59) OR user_id = $60) OR user_id = $61) OR user_id = $62) OR user_id = $63) OR user_id = $64) OR user_id = $65) OR user_id = $66) OR user_id = $67) OR user_id = $68) OR user_id = $69) OR user_id = $70) OR user_id = $71) OR user_id = $72) OR user_id = $73) OR user_id = $74) OR user_id = $75) OR user_id = $76) OR user_id = $77) OR user_id = $78) OR user_id = $79) OR user_id = $80) OR user_id = $81) OR user_id = $82) OR user_id = $83) OR user_id = $84) OR user_id = $85) OR user_id = $86) OR user_id = $87) OR user_id = $88) OR user_id = $89) OR user_id = $90) OR user_id = $91) OR user_id = $92) OR user_id = $93) OR user_id = $94) OR user_id = $95) OR user_id = $96) OR user_id = $97) OR user_id = $98) OR user_id = $99) OR user_id = $100) OR user_id = $101) OR user_id = $102) OR user_id = $103) OR user_id = $104) OR user_id = $105) OR user_id = $106) OR user_id = $107) OR user_id = $108) OR user_id = $109) OR user_id = $110) OR user_id = $111) OR user_id = $112) OR user_id = $113) AND visible = $114) t"#;
        let  ast = Parser::parse_sql(&pg_dialect, sql).unwrap();
        for item in ast {
            println!("{}", item);
        }
    }

The result after the test case is executed:

thread 'tests::test_name' has overflowed its stack
fatal runtime error: stack overflow

I found some discussions:

  1. https://stackoverflow.com/questions/39840663/recursive-function-calculating-factorials-leads-to-stack-overflow
  2. Stack overflow in fmt::Display impl rust-lang/rust#45838

@alamb Could you consider using stacker to solve related problems?

@alamb
Copy link
Collaborator

alamb commented Dec 14, 2022

Proposed solution in #764

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants