Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DISTINCT for SetOperator #689

Merged
merged 11 commits into from Nov 7, 2022
Merged

Conversation

unvalley
Copy link
Contributor

@unvalley unvalley commented Oct 26, 2022

closes #688

This supports following for GenericDialect BigQuery and MySql dialects.

  • UNION DISTINCT
  • INTERSECT DISTINCT
  • EXCEPT DISTINCT

@unvalley unvalley marked this pull request as ready for review October 26, 2022 09:16
tests/sqlparser_common.rs Outdated Show resolved Hide resolved
src/ast/query.rs Outdated
@@ -78,7 +95,7 @@ pub enum SetExpr {
/// UNION/EXCEPT/INTERSECT of two queries
SetOperation {
op: SetOperator,
all: bool,
op_option: Option<SetOperatorOption>,
Copy link
Contributor Author

@unvalley unvalley Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes users change their code to use Option<SetOperatorOption>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley I'd name it set_quantifier, as it is the name used in ANSI (1).

[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#set-quantifier

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree set_quantifier sounds good (along with SetOperatorOption --> SetQuantifier)

Copy link
Contributor Author

@unvalley unvalley Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fiexed this at f6e8305 826dcc5

src/ast/query.rs Outdated
@@ -64,6 +64,23 @@ impl fmt::Display for Query {
}
}

/// Options for SetOperator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice having more description about the structure. Maybe a reference of which big dialects use it...

src/ast/query.rs Outdated
@@ -78,7 +95,7 @@ pub enum SetExpr {
/// UNION/EXCEPT/INTERSECT of two queries
SetOperation {
op: SetOperator,
all: bool,
op_option: Option<SetOperatorOption>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley I'd name it set_quantifier, as it is the name used in ANSI (1).

[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#set-quantifier

src/ast/query.rs Outdated
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
SetOperatorOption::All => write!(f, " ALL"),
SetOperatorOption::Distinct => write!(f, " DISTINCT"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing spaces inside the enum structure doesn't make much sense.

If there's a space or not, the structure that holds it should handle it (in this case, the SetExpr).

Copy link
Contributor Author

@unvalley unvalley Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this at f6e8305

src/parser.rs Outdated
@@ -4193,6 +4194,40 @@ impl<'a> Parser<'a> {
}
}

pub fn parse_set_operator_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be cautious about 2 things:

  1. Are you really sure that the ALL and DISTINCT change between operations?
  2. You're being restrictive here. Please add GenericDialect and be aware of one thing: unless you're REALLY sure that only those 2 dialects support that feature, please be permissive and do not validate dialects.

I found that at least ANSI supports this. Please, or be permissive or validate perfectly (which is really hard)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)

Copy link
Contributor Author

@unvalley unvalley Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for commenting.

Are you really sure that the ALL and DISTINCT change between operations?

sorry, could you elaborate this?

You're being restrictive here. Please add GenericDialect and be aware of one thing: unless you're REALLY sure that only those 2 dialects support that feature, please be permissive and do not validate dialects.

I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)

OK, I'll remove the specific dialects handling (bq, mysql) and make it parse by GenericDialect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley you're parsing them differently. Union is being parsed with both ALL and DISTINCT, while Except and Intersect not.

Is there any reason why you can't do something like:

match op {
            Some(SetOperator::Union) => {
                if self.parse_keyword(Keyword::ALL) {
                    Some(SetOperatorOption::All)
                } else if self.parse_keyword(Keyword::DISTINCT)
                    && dialect_of!(self is MySqlDialect | BigQueryDialect)
                {
                    Some(SetOperatorOption::Distinct)
                } else {
                    None
                }
            }
            Some(SetOperator::Except) | Some(SetOperator::Intersect) => {
                if self.parse_keyword(Keyword::ALL) {
                    Some(SetOperatorOption::All)
                } else {
                    None
                }
            }
            _ => None,
        }

Also, FYI (not saying you must do it in this PR, besides the second one):

  • ANSI supports both (1).
  • The GenericDialect should be added to parse it as well. Unless the syntax is REALLY different between dialects, I'd say to add every syntax to the generic because, as the name states, it tries to be as generic as possible.
  • MySQL also accepts it in INTERSECT (2) and EXCEPT (3).
  • BigQuery, following your implementation, support both only for UNION (4)

[1] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#query-expression-body
[2] : https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#query-expression-body
[3] : https://dev.mysql.com/doc/refman/8.0/en/except.html
[4] : https://dev.mysql.com/doc/refman/8.0/en/except.html

Copy link
Contributor Author

@unvalley unvalley Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AugustoFKL Thanks, I got it. I should implement ALL and DISTINCT for INTERSECT and EXCEPT as I do for UNION. And, I should focus mainly on GenericDialect.
In this pr, I'll only implement the [1] and [2] for Generic Dialect.

I thought the BigQuery dialect should not allow ALL to be attached to INTERSECT and EXCEPT, but I see that is not correct. Because ANSI supports it, there is no trouble with parsing being done. And it is difficult to test that it cannot be parsed as you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unvalley one option was to parse it conditionally for all of them, the same you do for UNION. But IMHO I think this can be scoped in another PR.

Your PR is to support UNION DISTINCT for both dialects and, parsing with Generic, you can have it.

If you want the EXACT same support for a dialect (being it permissive or restrictive), I'd say to do it separately, working from a generic scope to a restrict one. Trying to do all at once make the code confusing, and requires reading all documentations at once.

@coveralls
Copy link

coveralls commented Oct 31, 2022

Pull Request Test Coverage Report for Build 3393519226

  • 27 of 29 (93.1%) changed or added relevant lines in 2 files are covered.
  • 794 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+0.02%) to 85.969%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/query.rs 12 13 92.31%
src/parser.rs 15 16 93.75%
Files with Coverage Reduction New Missed Lines %
tests/sqlparser_snowflake.rs 1 96.83%
src/lib.rs 2 10.0%
src/ast/value.rs 8 87.67%
src/ast/data_type.rs 13 88.62%
src/ast/ddl.rs 20 82.29%
tests/sqlparser_postgres.rs 23 97.81%
tests/sqlparser_common.rs 54 96.85%
src/ast/mod.rs 283 77.07%
src/parser.rs 390 83.72%
Totals Coverage Status
Change from base Build 3292360592: 0.02%
Covered Lines: 11053
Relevant Lines: 12857

💛 - Coveralls

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @unvalley -- this is looking pretty close!

src/ast/query.rs Outdated
@@ -78,7 +95,7 @@ pub enum SetExpr {
/// UNION/EXCEPT/INTERSECT of two queries
SetOperation {
op: SetOperator,
all: bool,
op_option: Option<SetOperatorOption>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree set_quantifier sounds good (along with SetOperatorOption --> SetQuantifier)

src/ast/query.rs Outdated
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum SetOperatorOption {
All,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it should be possible to round trip structures in sqlparser -- so it should be possible to distinguish between ALL, DISTINCT or nothing was specified in the query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the None to the enum SetOperatorOption, and remove Option<T> for SetOperatorOption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this at ac41a8e

src/parser.rs Outdated
@@ -4193,6 +4194,40 @@ impl<'a> Parser<'a> {
}
}

pub fn parse_set_operator_option(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ALL / DISTINCT are ANSI SQL and thus should be handled by all dialects (unless we have specific evidence that some dialects don't support them there is no reason to prevent them being parsed)

@unvalley unvalley changed the title Support UNION DISTINCT Support DISTINCT for SetOperator Nov 4, 2022
@unvalley unvalley requested review from AugustoFKL and alamb and removed request for AugustoFKL November 4, 2022 11:52
Comment on lines +135 to +137
/// A quantifier for [SetOperator].
// TODO: Restrict parsing specific SetQuantifier in some specific dialects.
// For example, BigQuery does not support `DISTINCT` for `EXCEPT` and `INTERSECT`
Copy link
Contributor Author

@unvalley unvalley Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments at ae6fb81 , but I wasn't quite sure how to describe this enum🥲

@unvalley unvalley requested review from AugustoFKL and removed request for alamb November 4, 2022 11:54
@unvalley
Copy link
Contributor Author

unvalley commented Nov 4, 2022

sorry for the notification.. I didn't know I can set one reviewer only.

@alamb
Copy link
Collaborator

alamb commented Nov 4, 2022

sorry for the notification.. I didn't know I can set one reviewer only.

No worries -- without these notifications it takes me a while to find PRs to review!

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great @unvalley -- thank you!

Any concerns @AugustoFKL ?

Copy link
Contributor

@AugustoFKL AugustoFKL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb alamb merged commit f7817bc into sqlparser-rs:main Nov 7, 2022
@alamb
Copy link
Collaborator

alamb commented Nov 7, 2022

Thanks again @unvalley !

@unvalley unvalley deleted the union-distinct branch November 7, 2022 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DISTINCT for SetOperator
4 participants