Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DISTINCT for SetOperator #689

Merged
merged 11 commits into from Nov 7, 2022
4 changes: 2 additions & 2 deletions src/ast/mod.rs
Expand Up @@ -32,8 +32,8 @@ pub use self::ddl::{
pub use self::operator::{BinaryOperator, UnaryOperator};
pub use self::query::{
Cte, Fetch, Join, JoinConstraint, JoinOperator, LateralView, LockType, Offset, OffsetRows,
OrderByExpr, Query, Select, SelectInto, SelectItem, SetExpr, SetOperator, TableAlias,
TableFactor, TableWithJoins, Top, Values, With,
OrderByExpr, Query, Select, SelectInto, SelectItem, SetExpr, SetOperator, SetOperatorOption,
TableAlias, TableFactor, TableWithJoins, Top, Values, With,
};
pub use self::value::{DateTimeField, TrimWhereField, Value};

Expand Down
29 changes: 25 additions & 4 deletions src/ast/query.rs
Expand Up @@ -64,6 +64,23 @@ impl fmt::Display for Query {
}
}

/// Options for SetOperator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice having more description about the structure. Maybe a reference of which big dialects use it...

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum SetOperatorOption {
All,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it should be possible to round trip structures in sqlparser -- so it should be possible to distinguish between ALL, DISTINCT or nothing was specified in the query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the None to the enum SetOperatorOption, and remove Option<T> for SetOperatorOption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this at ac41a8e

Distinct,
}

impl fmt::Display for SetOperatorOption {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
SetOperatorOption::All => write!(f, " ALL"),
SetOperatorOption::Distinct => write!(f, " DISTINCT"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing spaces inside the enum structure doesn't make much sense.

If there's a space or not, the structure that holds it should handle it (in this case, the SetExpr).

Copy link
Contributor Author

@unvalley unvalley Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this at f6e8305

}
}
}

/// A node in a tree, representing a "query body" expression, roughly:
/// `SELECT ... [ {UNION|EXCEPT|INTERSECT} SELECT ...]`
#[allow(clippy::large_enum_variant)]
Expand All @@ -78,7 +95,7 @@ pub enum SetExpr {
/// UNION/EXCEPT/INTERSECT of two queries
SetOperation {
op: SetOperator,
all: bool,
op_option: Option<SetOperatorOption>,
Copy link
Contributor Author

@unvalley unvalley Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes users change their code to use Option<SetOperatorOption>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley I'd name it set_quantifier, as it is the name used in ANSI (1).

[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#set-quantifier

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree set_quantifier sounds good (along with SetOperatorOption --> SetQuantifier)

Copy link
Contributor Author

@unvalley unvalley Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fiexed this at f6e8305 826dcc5

left: Box<SetExpr>,
right: Box<SetExpr>,
},
Expand All @@ -98,10 +115,14 @@ impl fmt::Display for SetExpr {
left,
right,
op,
all,
op_option,
} => {
let all_str = if *all { " ALL" } else { "" };
write!(f, "{} {}{} {}", left, op, all_str, right)
write!(f, "{} {}", left, op)?;
if let Some(ref o) = op_option {
write!(f, "{}", o)?;
}
write!(f, " {}", right)?;
Ok(())
}
}
}
Expand Down
37 changes: 36 additions & 1 deletion src/parser.rs
Expand Up @@ -4173,10 +4173,11 @@ impl<'a> Parser<'a> {
break;
}
self.next_token(); // skip past the set operator
let op_option = self.parse_set_operator_option(&op);
expr = SetExpr::SetOperation {
left: Box::new(expr),
op: op.unwrap(),
all: self.parse_keyword(Keyword::ALL),
op_option,
right: Box::new(self.parse_query_body(next_precedence)?),
};
}
Expand All @@ -4193,6 +4194,40 @@ impl<'a> Parser<'a> {
}
}

pub fn parse_set_operator_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be cautious about 2 things:

  1. Are you really sure that the ALL and DISTINCT change between operations?
  2. You're being restrictive here. Please add GenericDialect and be aware of one thing: unless you're REALLY sure that only those 2 dialects support that feature, please be permissive and do not validate dialects.

I found that at least ANSI supports this. Please, or be permissive or validate perfectly (which is really hard)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)

Copy link
Contributor Author

@unvalley unvalley Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for commenting.

Are you really sure that the ALL and DISTINCT change between operations?

sorry, could you elaborate this?

You're being restrictive here. Please add GenericDialect and be aware of one thing: unless you're REALLY sure that only those 2 dialects support that feature, please be permissive and do not validate dialects.

I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)

OK, I'll remove the specific dialects handling (bq, mysql) and make it parse by GenericDialect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley you're parsing them differently. Union is being parsed with both ALL and DISTINCT, while Except and Intersect not.

Is there any reason why you can't do something like:

match op {
            Some(SetOperator::Union) => {
                if self.parse_keyword(Keyword::ALL) {
                    Some(SetOperatorOption::All)
                } else if self.parse_keyword(Keyword::DISTINCT)
                    && dialect_of!(self is MySqlDialect | BigQueryDialect)
                {
                    Some(SetOperatorOption::Distinct)
                } else {
                    None
                }
            }
            Some(SetOperator::Except) | Some(SetOperator::Intersect) => {
                if self.parse_keyword(Keyword::ALL) {
                    Some(SetOperatorOption::All)
                } else {
                    None
                }
            }
            _ => None,
        }

Also, FYI (not saying you must do it in this PR, besides the second one):

  • ANSI supports both (1).
  • The GenericDialect should be added to parse it as well. Unless the syntax is REALLY different between dialects, I'd say to add every syntax to the generic because, as the name states, it tries to be as generic as possible.
  • MySQL also accepts it in INTERSECT (2) and EXCEPT (3).
  • BigQuery, following your implementation, support both only for UNION (4)

[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#query-expression-body
[2] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#query-expression-body
[3] : https://dev.mysql.com/doc/refman/8.0/en/except.html
[4] : https://dev.mysql.com/doc/refman/8.0/en/except.html

Copy link
Contributor Author

@unvalley unvalley Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AugustoFKL Thanks, I got it. I should implement ALL and DISTINCT for INTERSECT and EXCEPT as I do for UNION. And, I should focus mainly on GenericDialect.
In this pr, I'll only implement the [1] and [2] for Generic Dialect.

I thought the BigQuery dialect should not allow ALL to be attached to INTERSECT and EXCEPT, but I see that is not correct. Because ANSI supports it, there is no trouble with parsing being done. And it is difficult to test that it cannot be parsed as you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley one option was to parse it conditionally for all of them, the same you do for UNION. But IMHO I think this can be scoped in another PR.

Your PR is to support UNION DISTINCT for both dialects and, parsing with Generic, you can have it.

If you want the EXACT same support for a dialect (being it permissive or restrictive), I'd say to do it separately, working from a generic scope to a restrict one. Trying to do all at once make the code confusing, and requires reading all documentations at once.

&mut self,
op: &Option<SetOperator>,
) -> Option<SetOperatorOption> {
match op {
Some(SetOperator::Union) => {
if self.parse_keyword(Keyword::ALL) {
Some(SetOperatorOption::All)
} else if self.parse_keyword(Keyword::DISTINCT)
&& dialect_of!(self is MySqlDialect | BigQueryDialect)
{
Some(SetOperatorOption::Distinct)
} else {
None
}
}
Some(SetOperator::Except) => {
if self.parse_keyword(Keyword::ALL) {
Some(SetOperatorOption::All)
} else {
None
}
}
Some(SetOperator::Intersect) => {
if self.parse_keyword(Keyword::ALL) {
Some(SetOperatorOption::All)
} else {
None
}
}
_ => None,
}
}

/// Parse a restricted `SELECT` statement (no CTEs / `UNION` / `ORDER BY`),
/// assuming the initial `SELECT` was already consumed
pub fn parse_select(&mut self) -> Result<Select, ParserError> {
Expand Down
12 changes: 11 additions & 1 deletion tests/sqlparser_common.rs
Expand Up @@ -26,7 +26,7 @@ use sqlparser::ast::SelectItem::UnnamedExpr;
use sqlparser::ast::*;
use sqlparser::dialect::{
AnsiDialect, BigQueryDialect, ClickHouseDialect, GenericDialect, HiveDialect, MsSqlDialect,
PostgreSqlDialect, SQLiteDialect, SnowflakeDialect,
MySqlDialect, PostgreSqlDialect, SQLiteDialect, SnowflakeDialect,
};
use sqlparser::keywords::ALL_KEYWORDS;
use sqlparser::parser::{Parser, ParserError};
Expand Down Expand Up @@ -4096,6 +4096,16 @@ fn parse_union() {
verified_stmt("(SELECT * FROM new EXCEPT SELECT * FROM old) UNION ALL (SELECT * FROM old EXCEPT SELECT * FROM new) ORDER BY 1");
}

#[test]
fn parse_union_distinct() {
let dialects = TestedDialects {
dialects: vec![Box::new(BigQueryDialect {}), Box::new(MySqlDialect {})],
};
dialects.verified_stmt("SELECT 1 UNION DISTINCT SELECT 2");
dialects.verified_stmt("SELECT 1 UNION DISTINCT SELECT 2 INTERSECT ALL SELECT 3");
dialects.verified_stmt("SELECT 1 EXCEPT (SELECT 2 UNION DISTINCT SELECT 3)");
unvalley marked this conversation as resolved.
Show resolved Hide resolved
}

#[test]
fn parse_values() {
verified_stmt("SELECT * FROM (VALUES (1), (2), (3))");
Expand Down
2 changes: 1 addition & 1 deletion tests/sqlparser_postgres.rs
Expand Up @@ -1310,7 +1310,7 @@ fn parse_array_subquery_expr() {
with: None,
body: Box::new(SetExpr::SetOperation {
op: SetOperator::Union,
all: false,
op_option: None,
left: Box::new(SetExpr::Select(Box::new(Select {
distinct: false,
top: None,
Expand Down