Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

5225225 · 2021-04-01T11:43:52Z

This code segfaults with a stack overflow on both 0.9.0 from crates.io as well as loading from git (Version 35ef0ee)

use sqlparser::dialect::GenericDialect;
use sqlparser::parser::Parser;

fn main() {
    let sql = "(".repeat(1024);

    let dialect = GenericDialect {};

    let _ = Parser::parse_sql(&dialect, &sql);
}

The text was updated successfully, but these errors were encountered:

46bit · 2022-03-10T13:16:45Z

Still an issue as of 0.15.0

46bit · 2022-03-22T09:56:00Z

@alamb Hi, pinging you as a maintainer. I want to use this crate, but I'd need to fix this issue to prevent users from triggering stack overflows. Hoping to get some feedback on whether you'd accept a fix.

Right now the Parser struct doesn't do any tracking of stack size. What I'm thinking is adding an extra depth: usize parameter to the various parse methods. Each method would return a ParserError if the recursion got deeper than a max_depth: Option<usize> field on Parser, preventing a stack overflow. The existing parse methods would then call out to the depth-limited variants supplying depth: 0.

Default behaviour would stay the same, but you could also prevent stack overflows by setting Parser.max_depth to a reasonable number.

Does that sound like a PR you'd be open to?

alamb · 2022-03-22T10:18:18Z

Hi @46bit -- Thank you for offering.

The idea of adding a max_depth parameter to the existing (non pub) methods seems reasonable to me. Hopefully the interface that most external users see would remain the same

Also, for what it is worth, I have seen rust stack sizes significantly larger in debug code than release code -- so the actual stack limit may be much higher in practice in release builds (though certainly would still exist)

46bit · 2022-03-22T10:55:14Z

@alamb Thanks for such a quick response, I appreciate it. I'll get working on the PR soonish and let you know how it goes :)

5225225 · 2022-03-22T11:42:58Z

Might be worth looking at what serde_json does, since I know they definitely have a depth limit by default. Their limit is 128.

alamb · 2022-03-22T13:04:47Z

I think a default depth limit (that can be overridden) seems very reasonable to me and very much in the rust "safe by default" spirit

46bit · 2022-03-22T15:25:59Z

serde-json is using a check_recursion { ... } macro each time it recurses deeper: https://github.com/serde-rs/json/blob/master/src/de.rs#L1273-L1304. That works OK for them, they only use it 8 times. But sqlparser branches a lot more and I think we'd use it hundreds of times:

pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
    match self.next_token() {
        Token::Word(w) => match w.keyword {
            Keyword::DESCRIBE => Ok(check_recursion! { self.parse_explain(true)? }),
            Keyword::EXPLAIN => Ok(check_recursion! { self.parse_explain(false)? }),
            Keyword::ANALYZE => Ok(check_recursion! { self.parse_analyze()? }),
            [...]

Using it inside each function is pretty bad too, the diff would be huge:

pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
    check_recursion! {
        match self.next_token() {
            Token::Word(w) => match w.keyword {
                Keyword::DESCRIBE => Ok(self.parse_explain(true)?),
                Keyword::EXPLAIN => Ok(self.parse_explain(false)?),
                Keyword::ANALYZE => Ok(self.parse_analyze()?),

The ideal from my point of view would probably be a macro annotation on each function, like this:

#[check_recursion]
pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
    [...]
}

Those macros have to be defined in a separate crate, and I can't find anything like it that already exists. It'd be by far the cleanest approach but I don't know if you'd be OK with introducing a tiny new dependency (sqlparser-rs/proc-macros?) @alamb?

alamb · 2022-03-22T18:17:55Z

Those macros have to be defined in a separate crate

I don't understand why they need to be in a separate crate

Adding a macro to check recursion seems like a good idea to me.

46bit · 2022-03-22T18:23:53Z

I don't understand why they need to be in a separate crate

Specifically the function annotation. As I understand it those have to be proc macros, and proc macros have to be in a separate crate (fourth paragraph of https://doc.rust-lang.org/reference/procedural-macros.html.)

alamb · 2022-03-22T18:27:08Z

🤔 I am not sure how much time I have to devote to figuring out how to release multiple dependent crates in parallel for this crate

nickolay · 2022-03-28T02:31:18Z

I think we'd use it hundreds of times:

Why? I'd expect only a few productions of the SQL grammar to support recursion (e.g. subqueries, subexpressions, EXPLAIN-like commands that operate on other statements).

If we track the recursion depth only when recursing, maintaining the recursion counter manually shouldn't be much of a problem.

Also: were you really going to add an extra depth: usize parameter to the various parse methods or to the Parser struct itself?

46bit · 2022-03-29T09:52:29Z

Why? I'd expect only a few productions of the SQL grammar to support recursion

I'd intended to track the parser's call stack. Good point though.

46bit · 2022-05-18T17:17:47Z

I've opened an PR for this: #501. Feedback welcome.

alamb · 2022-05-19T10:52:15Z

Thank you -- I am a little behind on sqlparser reviews at the moment. I'll try and take a look over the weekend. Hopefully we can get some other community feedback as well

…

On Wed, May 18, 2022 at 1:18 PM Miki Mokrysz ***@***.***> wrote: I've opened an PR for this: #501 <#501>. Feedback welcome. — Reply to this email directly, view it on GitHub <#305 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADXZMPNPV2QLAZIBDW3CZTVKUQ4VANCNFSM42G57WBA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

zidaye · 2022-12-07T09:59:41Z

    fn test_stack_overflow() {
        let pg_dialect = PostgreSqlDialect {};
        let sql = r#"
        SELECT COUNT(*) FROM (SELECT id, user_id FROM test WHERE ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((user_id = $1 OR user_id = $2) OR user_id = $3) OR user_id = $4) OR user_id = $5) OR user_id = $6) OR user_id = $7) OR user_id = $8) OR user_id = $9) OR user_id = $10) OR user_id = $11) OR user_id = $12) OR user_id = $13) OR user_id = $14) OR user_id = $15) OR user_id = $16) OR user_id = $17) OR user_id = $18) OR user_id = $19) OR user_id = $20) OR user_id = $21) OR user_id = $22) OR user_id = $23) OR user_id = $24) OR user_id = $25) OR user_id = $26) OR user_id = $27) OR user_id = $28) OR user_id = $29) OR user_id = $30) OR user_id = $31) OR user_id = $32) OR user_id = $33) OR user_id = $34) OR user_id = $35) OR user_id = $36) OR user_id = $37) OR user_id = $38) OR user_id = $39) OR user_id = $40) OR user_id = $41) OR user_id = $42) OR user_id = $43) OR user_id = $44) OR user_id = $45) OR user_id = $46) OR user_id = $47) OR user_id = $48) OR user_id = $49) OR user_id = $50) OR user_id = $51) OR user_id = $52) OR user_id = $53) OR user_id = $54) OR user_id = $55) OR user_id = $56) OR user_id = $57) OR user_id = $58) OR user_id = $59) OR user_id = $60) OR user_id = $61) OR user_id = $62) OR user_id = $63) OR user_id = $64) OR user_id = $65) OR user_id = $66) OR user_id = $67) OR user_id = $68) OR user_id = $69) OR user_id = $70) OR user_id = $71) OR user_id = $72) OR user_id = $73) OR user_id = $74) OR user_id = $75) OR user_id = $76) OR user_id = $77) OR user_id = $78) OR user_id = $79) OR user_id = $80) OR user_id = $81) OR user_id = $82) OR user_id = $83) OR user_id = $84) OR user_id = $85) OR user_id = $86) OR user_id = $87) OR user_id = $88) OR user_id = $89) OR user_id = $90) OR user_id = $91) OR user_id = $92) OR user_id = $93) OR user_id = $94) OR user_id = $95) OR user_id = $96) OR user_id = $97) OR user_id = $98) OR user_id = $99) OR user_id = $100) OR user_id = $101) OR user_id = $102) OR user_id = $103) OR user_id = $104) OR user_id = $105) OR user_id = $106) OR user_id = $107) OR user_id = $108) OR user_id = $109) OR user_id = $110) OR user_id = $111) OR user_id = $112) OR user_id = $113) AND visible = $114) t"#;
        let  ast = Parser::parse_sql(&pg_dialect, sql).unwrap();
        for item in ast {
            println!("{}", item);
        }
    }

The result after the test case is executed：

thread 'tests::test_name' has overflowed its stack
fatal runtime error: stack overflow

I found some discussions：

@alamb Could you consider using stacker to solve related problems?

alamb · 2022-12-14T19:48:26Z

Proposed solution in #764

46bit mentioned this issue May 18, 2022

Prevent stack overflows by limiting recursion #501

Closed

step-baby mentioned this issue Dec 8, 2022

fix: stack overflow #751

Closed

alamb mentioned this issue Dec 14, 2022

Avoid stack overflows via configurable with_recursion_limit #764

Merged

alamb closed this as completed in #764 Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

5225225 commented Apr 1, 2021

46bit commented Mar 10, 2022

46bit commented Mar 22, 2022 •

edited

alamb commented Mar 22, 2022

46bit commented Mar 22, 2022

5225225 commented Mar 22, 2022

alamb commented Mar 22, 2022

46bit commented Mar 22, 2022 •

edited

alamb commented Mar 22, 2022

46bit commented Mar 22, 2022

alamb commented Mar 22, 2022 •

edited

nickolay commented Mar 28, 2022

46bit commented Mar 29, 2022

46bit commented May 18, 2022

alamb commented May 19, 2022 via email

zidaye commented Dec 7, 2022 •

edited

alamb commented Dec 14, 2022

Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

Stack overflow when parsing SQL consisting of many imbalanced open parenthesis #305

Comments

5225225 commented Apr 1, 2021

46bit commented Mar 10, 2022

46bit commented Mar 22, 2022 • edited

alamb commented Mar 22, 2022

46bit commented Mar 22, 2022

5225225 commented Mar 22, 2022

alamb commented Mar 22, 2022

46bit commented Mar 22, 2022 • edited

alamb commented Mar 22, 2022

46bit commented Mar 22, 2022

alamb commented Mar 22, 2022 • edited

nickolay commented Mar 28, 2022

46bit commented Mar 29, 2022

46bit commented May 18, 2022

alamb commented May 19, 2022 via email

zidaye commented Dec 7, 2022 • edited

alamb commented Dec 14, 2022

46bit commented Mar 22, 2022 •

edited

46bit commented Mar 22, 2022 •

edited

alamb commented Mar 22, 2022 •

edited

zidaye commented Dec 7, 2022 •

edited