Is it misguided to use this as an expression input sanitizer / sql injection protection? #505

doulighan · 2023-08-01T18:59:27Z

doulighan
Aug 1, 2023

I have the unfortunate business requirement of allowing users to type arbitrary math expressions and send them to our SQL database. Granted these are not full SQL queries; we just give them control over math expressions within SELECT or WHERE clauses. Think WHERE (column_a + column_b) / (100 * column_c).

In order to validate and prevent injection, simple SQL parameterization doesn't work here. Statements are arbitrary, with any number of parenthesis or operations.

Using pyparser (based on the fourFn.py template) to validate these expressions seems like the best solution so far. We can very strictly define which symbols are allowed, validate the column names exist, then swap out the column names with random non-zero values and check if the expression is syntactically valid. Extending the parser to allow more complicated functions or operations is trivial.

ptmcg
Aug 1, 2023
Maintainer

Great question! I've always looked at pyparsing as a way to implement a bespoke parser, as a tactic for preventing various attacks based on eval(). fourFn.py was a very early implementation of this, but pyparsing has evolved over time. I wrote operatorPrecedence as a built-in that simplifies writing infix notations (and which I subsequently renamed to infixNotation, and most recently the PEP8 infix_notation). And I've used infix_notation extensively in various projects. I think you'll make some good headway looking at how infix_notation and SQL get parsed in some of the examples that do SQL parsing (simpleSQL.py and select_parser.py).

Even so, I've felt that infix_notation has been under-utilized, and took another look at its API. I have since wrapped infix_notation inside a friendler API, under the project plusminus (https://github.com/pyparsing/plusminus), and have even posted a challenge "can you hack this?" site. While writing plusminus, I did learn a few things about different SQL and HTML injections, as well as dealing with DoS attacks that are possible even with purely legitimate expressions.

I'll follow up with some more details on these - I am about to be late for a meeting!

1 reply

doulighan Aug 1, 2023
Author

Thank you! I'll be sure to look into infix_notation / plusminus. Do you think its potentially more secure, or just a friendlier api to use?

I've only really been thinking about the more obvious forms of SQL injections, problems with legitimate expressions or attacks on the python layer feel tougher to solve. If you have any wisdom there please let me know! For now I'll likely just limit expression length and complexity, allow a very small set of math operators, set a query timeout on the DB, and API limits on the endpoint.

As for general SQL parsing thats actually not a requirement on our end, every part of the query except for these particular math expressions are parsed through safe means. We're only stuck with the unvalidated math expression, that may or may not contain column names. The column names are trivial to sanitize. The math expression is popped back into the main query once it passes successfully through pyparser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it misguided to use this as an expression input sanitizer / sql injection protection? #505

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is it misguided to use this as an expression input sanitizer / sql injection protection? #505

doulighan Aug 1, 2023

Replies: 1 comment · 1 reply

ptmcg Aug 1, 2023 Maintainer

doulighan Aug 1, 2023 Author

doulighan
Aug 1, 2023

Replies: 1 comment 1 reply

ptmcg
Aug 1, 2023
Maintainer

doulighan Aug 1, 2023
Author