New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DISTINCT
for SetOperator
#689
Conversation
cargo fmt Fix parse_set_operator_option after next_token
src/ast/query.rs
Outdated
@@ -78,7 +95,7 @@ pub enum SetExpr { | |||
/// UNION/EXCEPT/INTERSECT of two queries | |||
SetOperation { | |||
op: SetOperator, | |||
all: bool, | |||
op_option: Option<SetOperatorOption>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes users change their code to use Option<SetOperatorOption>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unvalley I'd name it set_quantifier, as it is the name used in ANSI (1).
[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#set-quantifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree set_quantifier sounds good (along with SetOperatorOption
--> SetQuantifier
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/ast/query.rs
Outdated
@@ -64,6 +64,23 @@ impl fmt::Display for Query { | |||
} | |||
} | |||
|
|||
/// Options for SetOperator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice having more description about the structure. Maybe a reference of which big dialects use it...
src/ast/query.rs
Outdated
@@ -78,7 +95,7 @@ pub enum SetExpr { | |||
/// UNION/EXCEPT/INTERSECT of two queries | |||
SetOperation { | |||
op: SetOperator, | |||
all: bool, | |||
op_option: Option<SetOperatorOption>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unvalley I'd name it set_quantifier, as it is the name used in ANSI (1).
[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#set-quantifier
src/ast/query.rs
Outdated
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
match self { | ||
SetOperatorOption::All => write!(f, " ALL"), | ||
SetOperatorOption::Distinct => write!(f, " DISTINCT"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing spaces inside the enum structure doesn't make much sense.
If there's a space or not, the structure that holds it should handle it (in this case, the SetExpr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this at f6e8305
src/parser.rs
Outdated
@@ -4193,6 +4194,40 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
pub fn parse_set_operator_option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be cautious about 2 things:
- Are you really sure that the
ALL
andDISTINCT
change between operations? - You're being restrictive here. Please add
GenericDialect
and be aware of one thing: unless you're REALLY sure that only those 2 dialects support that feature, please be permissive and do not validate dialects.
I found that at least ANSI
supports this. Please, or be permissive or validate perfectly (which is really hard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for commenting.
Are you really sure that the ALL and DISTINCT change between operations?
sorry, could you elaborate this?
You're being restrictive here. Please add GenericDialect and be aware of one thing: unless you're REALLY sure that only those 2 dialects support that feature, please be permissive and do not validate dialects.
I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)
OK, I'll remove the specific dialects handling (bq, mysql) and make it parse by GenericDialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unvalley you're parsing them differently. Union
is being parsed with both ALL
and DISTINCT
, while Except
and Intersect
not.
Is there any reason why you can't do something like:
match op {
Some(SetOperator::Union) => {
if self.parse_keyword(Keyword::ALL) {
Some(SetOperatorOption::All)
} else if self.parse_keyword(Keyword::DISTINCT)
&& dialect_of!(self is MySqlDialect | BigQueryDialect)
{
Some(SetOperatorOption::Distinct)
} else {
None
}
}
Some(SetOperator::Except) | Some(SetOperator::Intersect) => {
if self.parse_keyword(Keyword::ALL) {
Some(SetOperatorOption::All)
} else {
None
}
}
_ => None,
}
Also, FYI (not saying you must do it in this PR, besides the second one):
- ANSI supports both (1).
- The
GenericDialect
should be added to parse it as well. Unless the syntax is REALLY different between dialects, I'd say to add every syntax to the generic because, as the name states, it tries to be as generic as possible. - MySQL also accepts it in
INTERSECT
(2) andEXCEPT
(3). - BigQuery, following your implementation, support both only for
UNION
(4)
[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#query-expression-body
[2] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#query-expression-body
[3] : https://dev.mysql.com/doc/refman/8.0/en/except.html
[4] : https://dev.mysql.com/doc/refman/8.0/en/except.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AugustoFKL Thanks, I got it. I should implement ALL
and DISTINCT
for INTERSECT
and EXCEPT
as I do for UNION
. And, I should focus mainly on GenericDialect
.
In this pr, I'll only implement the [1] and [2] for Generic Dialect
.
I thought the BigQuery dialect should not allow ALL
to be attached to INTERSECT
and EXCEPT
, but I see that is not correct. Because ANSI supports it, there is no trouble with parsing being done. And it is difficult to test that it cannot be parsed as you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unvalley one option was to parse it conditionally for all of them, the same you do for UNION
. But IMHO I think this can be scoped in another PR.
Your PR is to support UNION DISTINCT
for both dialects and, parsing with Generic
, you can have it.
If you want the EXACT same support for a dialect (being it permissive or restrictive), I'd say to do it separately, working from a generic scope to a restrict one. Trying to do all at once make the code confusing, and requires reading all documentations at once.
Pull Request Test Coverage Report for Build 3393519226
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @unvalley -- this is looking pretty close!
src/ast/query.rs
Outdated
@@ -78,7 +95,7 @@ pub enum SetExpr { | |||
/// UNION/EXCEPT/INTERSECT of two queries | |||
SetOperation { | |||
op: SetOperator, | |||
all: bool, | |||
op_option: Option<SetOperatorOption>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree set_quantifier sounds good (along with SetOperatorOption
--> SetQuantifier
)
src/ast/query.rs
Outdated
#[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
pub enum SetOperatorOption { | ||
All, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it should be possible to round trip structures in sqlparser -- so it should be possible to distinguish between ALL
, DISTINCT
or nothing was specified in the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the None
to the enum SetOperatorOption
, and remove Option<T>
for SetOperatorOption
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this at ac41a8e
src/parser.rs
Outdated
@@ -4193,6 +4194,40 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
pub fn parse_set_operator_option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)
/// A quantifier for [SetOperator]. | ||
// TODO: Restrict parsing specific SetQuantifier in some specific dialects. | ||
// For example, BigQuery does not support `DISTINCT` for `EXCEPT` and `INTERSECT` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments at ae6fb81 , but I wasn't quite sure how to describe this enum🥲
sorry for the notification.. I didn't know I can set one reviewer only. |
No worries -- without these notifications it takes me a while to find PRs to review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great @unvalley -- thank you!
Any concerns @AugustoFKL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks again @unvalley ! |
closes #688
This supports following for GenericDialect
BigQuery and MySql dialects.