Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PostgreSQL array subquery constructor #566

Merged
merged 1 commit into from Aug 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/ast/mod.rs
Expand Up @@ -376,6 +376,8 @@ pub enum Expr {
/// A parenthesized subquery `(SELECT ...)`, used in expression like
/// `SELECT (subquery) AS x` or `WHERE (subquery) = x`
Subquery(Box<Query>),
/// An array subquery constructor, e.g. `SELECT ARRAY(SELECT 1 UNION SELECT 2)`
ArraySubquery(Box<Query>),
/// The `LISTAGG` function `SELECT LISTAGG(...) WITHIN GROUP (ORDER BY ...)`
ListAgg(ListAgg),
/// The `GROUPING SETS` expr.
Expand Down Expand Up @@ -573,6 +575,7 @@ impl fmt::Display for Expr {
subquery
),
Expr::Subquery(s) => write!(f, "({})", s),
Expr::ArraySubquery(s) => write!(f, "ARRAY({})", s),
Expr::ListAgg(listagg) => write!(f, "{}", listagg),
Expr::GroupingSets(sets) => {
write!(f, "GROUPING SETS (")?;
Expand Down
16 changes: 15 additions & 1 deletion src/parser.rs
Expand Up @@ -449,11 +449,18 @@ impl<'a> Parser<'a> {
Keyword::TRIM => self.parse_trim_expr(),
Keyword::INTERVAL => self.parse_literal_interval(),
Keyword::LISTAGG => self.parse_listagg_expr(),
// Treat ARRAY[1,2,3] as an array [1,2,3], otherwise try as function call
// Treat ARRAY[1,2,3] as an array [1,2,3], otherwise try as subquery or a function call
Keyword::ARRAY if self.peek_token() == Token::LBracket => {
self.expect_token(&Token::LBracket)?;
self.parse_array_expr(true)
}
Keyword::ARRAY
if self.peek_token() == Token::LParen
&& !dialect_of!(self is ClickHouseDialect) =>
{
self.expect_token(&Token::LParen)?;
self.parse_array_subquery()
}
Keyword::NOT => self.parse_not(),
// Here `w` is a word, check if it's a part of a multi-part
// identifier, a function call, or a simple identifier:
Expand Down Expand Up @@ -927,6 +934,13 @@ impl<'a> Parser<'a> {
}
}

// Parses an array constructed from a subquery
pub fn parse_array_subquery(&mut self) -> Result<Expr, ParserError> {
let query = self.parse_query()?;
self.expect_token(&Token::RParen)?;
Ok(Expr::ArraySubquery(Box::new(query)))
}

/// Parse a SQL LISTAGG expression, e.g. `LISTAGG(...) WITHIN GROUP (ORDER BY ...)`.
pub fn parse_listagg_expr(&mut self) -> Result<Expr, ParserError> {
self.expect_token(&Token::LParen)?;
Expand Down
19 changes: 19 additions & 0 deletions tests/sqpparser_clickhouse.rs → tests/sqlparser_clickhouse.rs
Expand Up @@ -121,6 +121,25 @@ fn parse_array_expr() {
)
}

#[test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 Very nice @MazterQyou

fn parse_array_fn() {
let sql = "SELECT array(x1, x2) FROM foo";
let select = clickhouse().verified_only_select(sql);
assert_eq!(
&Expr::Function(Function {
name: ObjectName(vec![Ident::new("array")]),
args: vec![
FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Identifier(Ident::new("x1")))),
FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Identifier(Ident::new("x2")))),
],
over: None,
distinct: false,
special: false,
}),
expr_from_projection(only(&select.projection))
);
}

#[test]
fn parse_kill() {
let stmt = clickhouse().verified_stmt("KILL MUTATION 5");
Expand Down
2 changes: 1 addition & 1 deletion tests/sqlparser_common.rs
Expand Up @@ -2597,7 +2597,7 @@ fn parse_bad_constraint() {

#[test]
fn parse_scalar_function_in_projection() {
let names = vec!["sqrt", "array", "foo"];
let names = vec!["sqrt", "foo"];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a controversial change for this test. PostgreSQL doesn't have a function-based array constructor (array(1, 2, 3)), nor can I find information on other databases using such a syntax. Some, like BigQuery, use the ARRAY(subquery) syntax implemented there. It'd be great to hear some thoughts on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was added in #432

Can you review the description on that PR and let me know what you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, postgres doesn't allow the syntax used in the DataFusion test (and referred to in #432)

alamb=# SELECT array(c1, cast(c2 as varchar)) FROM test;
ERROR:  syntax error at or near "c1"
LINE 1: SELECT array(c1, cast(c2 as varchar)) FROM test;
                     ^

https://github.com/apache/arrow-datafusion/blame/834924f274d9e04444030b8eb3e07e1035fcd3cf/datafusion/core/tests/sql/functions.rs#L134 doesn't really help me figure out from who the test originated from as it was moved (maybe some more poking around could find the original author)

@ovr do you remember any of the previous history here?

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked around and found this commit: apache/datafusion@98b710a

Copy link
Collaborator

@alamb alamb Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked around and found this commit: apache/datafusion@98b710a

Nice -- that came in from apache/arrow#8102 / https://issues.apache.org/jira/browse/ARROW-9902 by @jorgecarleitao and approved by @andygrove

Can either if you remember if/how using array as a function (rather than the more specialized SQL support offered by postgres) was important? We are triyng to sort out how important backwards compatibility is in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this as long as DataFusion team's okay with that. I haven't been able to find a dialect with built-in array function, but at least two of them with built-in array subquery constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb do you feel it's a good idea to remove the GenericDialiect | PostgreSqlDialect check now so later someone can make negated checks for dialects that shouldn't go the subquery route?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to find a dialect with built-in array function, but at least two of them with built-in array subquery constructor.

Here is the spark reference docs: https://spark.apache.org/docs/2.4.1/api/sql/index.html#array (though perhaps you mean there is no SparkDialect in this crate)?

do you feel it's a good idea to remove the GenericDialiect | PostgreSqlDialect check now so later someone can make negated checks for dialects that shouldn't go the subquery route?

Sounds reasonable to me -- can you also possibly link one or two examples from other dialects that support the array subquery constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL array subquery constructor (in fact, all of array constructors): https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS

I haven't been able to find info on other dialects present in the crate to support array subquery, but at least CockroachDB and BigQuery support array subquery.

However while searching around I found that ClickHouse (which we do have a dialect for) actually has array function constructor built in: https://clickhouse.com/docs/en/sql-reference/functions/array-functions/#arrayx1--operator-x1-

So we can either make a decision for leaving this on for specific dialects (GenericDialect | PostgreSqlDialect for instance) or negate this to support ARRAY(x1, x2, …) for dialects known to support array function constructor (currently ClickHouseDialect only).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's negate it for ClickHouseDialect and call it a day


for function_name in names {
// like SELECT sqrt(id) FROM foo
Expand Down
63 changes: 63 additions & 0 deletions tests/sqlparser_postgres.rs
Expand Up @@ -1241,6 +1241,69 @@ fn parse_array_index_expr() {
);
}

#[test]
fn parse_array_subquery_expr() {
let sql = "SELECT ARRAY(SELECT 1 UNION SELECT 2)";
let select = pg().verified_only_select(sql);
assert_eq!(
&Expr::ArraySubquery(Box::new(Query {
with: None,
body: Box::new(SetExpr::SetOperation {
op: SetOperator::Union,
all: false,
left: Box::new(SetExpr::Select(Box::new(Select {
distinct: false,
top: None,
projection: vec![SelectItem::UnnamedExpr(Expr::Value(Value::Number(
#[cfg(not(feature = "bigdecimal"))]
"1".to_string(),
#[cfg(feature = "bigdecimal")]
bigdecimal::BigDecimal::from(1),
false,
)))],
into: None,
from: vec![],
lateral_views: vec![],
selection: None,
group_by: vec![],
cluster_by: vec![],
distribute_by: vec![],
sort_by: vec![],
having: None,
qualify: None,
}))),
right: Box::new(SetExpr::Select(Box::new(Select {
distinct: false,
top: None,
projection: vec![SelectItem::UnnamedExpr(Expr::Value(Value::Number(
#[cfg(not(feature = "bigdecimal"))]
"2".to_string(),
#[cfg(feature = "bigdecimal")]
bigdecimal::BigDecimal::from(2),
false,
)))],
into: None,
from: vec![],
lateral_views: vec![],
selection: None,
group_by: vec![],
cluster_by: vec![],
distribute_by: vec![],
sort_by: vec![],
having: None,
qualify: None,
}))),
}),
order_by: vec![],
limit: None,
offset: None,
fetch: None,
lock: None,
})),
expr_from_projection(only(&select.projection)),
);
}

#[test]
fn test_transaction_statement() {
let statement = pg().verified_stmt("SET TRANSACTION SNAPSHOT '000003A1-1'");
Expand Down