Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgres] Add support for custom binary operators #548

Merged
merged 5 commits into from Aug 5, 2022

Conversation

iskakaushik
Copy link
Contributor

More details about operators in general are at: https://www.postgresql.org/docs/current/sql-createoperator.html. This patch attempts to parse SELECT queries that reference an operator using OPERATOR(<optional_schema>.<operator_name>) syntax. Tests added in the patch illustrate a common use-case of this feature.

This is a PostgreSQL extension. There are no provisions for user-defined operators in the SQL standard.

More details about operators in general are at:
https://www.postgresql.org/docs/current/sql-createoperator.html. This
patch attempts to parse `SELECT` queries that reference an operator
using `OPERATOR(<optional_schema>.<operator_name>)` syntax.

This is a PostgreSQL extension. There are no provisions for user-defined operators in the SQL standard.
@iskakaushik
Copy link
Contributor Author

@alamb could you please help with triggering the workflow?

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @iskakaushik -- I'll trigger the tests

src/parser.rs Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Aug 3, 2022

Pull Request Test Coverage Report for Build 2805285978

  • 70 of 74 (94.59%) changed or added relevant lines in 3 files are covered.
  • 633 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.04%) to 89.953%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser.rs 12 13 92.31%
src/ast/operator.rs 30 33 90.91%
Files with Coverage Reduction New Missed Lines %
tests/sqlparser_snowflake.rs 1 98.11%
tests/sqlparser_postgres.rs 12 97.89%
src/ast/query.rs 20 86.18%
tests/sqlparser_common.rs 24 97.23%
src/ast/mod.rs 206 78.77%
src/parser.rs 370 83.41%
Totals Coverage Status
Change from base Build 2756043200: 0.04%
Covered Lines: 9096
Relevant Lines: 10112

💛 - Coveralls

@ovr
Copy link
Contributor

ovr commented Aug 4, 2022

Just an addition about real-life usage:

image

Official psql CLI uses it under the hood behind some backslash commands.

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iskakaushik -- this is looking close.

src/parser.rs Outdated
@@ -1159,6 +1159,32 @@ impl<'a> Parser<'a> {
}
}
Keyword::XOR => Some(BinaryOperator::Xor),
Keyword::OPERATOR if dialect_of!(self is PostgreSqlDialect) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given @ovr 's comment here, perhaps we should also add support when parsing in GenericDialect? apache/datafusion#3037 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I kept the name as PGCustomOperator but happy to change it as well.

Comment on lines 99 to 101
pub struct PGCustomOperator {
pub schema: Option<String>,
pub name: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about using the pre-existing Identifier here (that supports various qualified names)

like:

Suggested change
pub struct PGCustomOperator {
pub schema: Option<String>,
pub name: String,
pub struct PGCustomOperator {
pub ident: ObjectName

Thank you could use parse_object_name instead of custom parsing logic.

sqlparser-rs/src/parser.rs

Lines 3114 to 3125 in 076b587

/// Parse a possibly qualified, possibly quoted identifier, e.g.
/// `foo` or `myschema."table"
pub fn parse_object_name(&mut self) -> Result<ObjectName, ParserError> {
let mut idents = vec![];
loop {
idents.push(self.parse_identifier()?);
if !self.consume_token(&Token::Period) {
break;
}
}
Ok(ObjectName(idents))
}

This would likely require less code as well as handle parsing more cases (like OPERATOR("~") -- aka quoted strings)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing this, but looks like Identifier doesn't allow for Tokens that are un-quoted operators, for e.g: ~ is not a valid identifier. I am not sure if extending the parse_identifier to allow for un-quoted operators is the right call here.

IMO we could start with the existing mechanism of schema qualifier operator name (though I would've liked to use the parse_object_name logic) and can extend it as needed later. Let me know.

@alamb
Copy link
Collaborator

alamb commented Aug 5, 2022

@iskakaushik I am planning to make a sqlparser release sometime soon -- I think this PR has just a few more items and it could be ready to merge. Let me know what you think.

@iskakaushik
Copy link
Contributor Author

@iskakaushik I am planning to make a sqlparser release sometime soon -- I think this PR has just a few more items and it could be ready to merge. Let me know what you think.

I will address the feedback today 😊

Comment on lines 100 to 101
pub schema: Option<String>,
pub name: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for trying -- Something about hard coding "schema" seems not right to me here -- at the very least because there can also be a database in identifiers https://www.postgresql.org/docs/current/ddl-schemas.html -- e.g. database.schema.table, so maybe it should be a Vec<String>

I think given the special set of allowed characters described in https://www.postgresql.org/docs/current/sql-createoperator.html in The operator name is a sequence of up to NAMEDATALEN-1 (63 by default) characters from the following list: to properly parse this limitation some extra code would be needed

Let me give it a shot

@alamb
Copy link
Collaborator

alamb commented Aug 5, 2022

@iskakaushik here is my proposal: iskakaushik#1

let me know what you think

@iskakaushik
Copy link
Contributor Author

@alamb I like you approach. The comment you made about converting to (ident: Vec<Identifier>, op: String) is a good idea, but I think this might be good enough for now as you mentioned.

Thanks for fixing up my patch, really appreciate it!

@alamb
Copy link
Collaborator

alamb commented Aug 5, 2022

I will fix CI

@alamb
Copy link
Collaborator

alamb commented Aug 5, 2022

Thanks again @iskakaushik

@alamb alamb merged commit 1c64129 into sqlparser-rs:main Aug 5, 2022
@iskakaushik iskakaushik deleted the postgres-custom-operator branch August 5, 2022 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants