Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for INTERVAL inside window frames #655

Merged
merged 5 commits into from Oct 15, 2022
Merged

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Oct 5, 2022

Support for INTERVAL Inside Window Frames

Which issue does this PR close?
Closes #631 by adding INTERVAL support inside window frames. Now we can parse queries such as

SELECT 
COUNT(*) OVER (ORDER BY ts RANGE BETWEEN  INTERVAL '1 DAY' PRECEDING AND INTERVAL '1 DAY' FOLLOWING) 
FROM t;

Rationale for this change

Datafusion now supports window frame queries with PR #3570. However, since window frame bound only emits u64 values, we cannot run queries like the one above in Datafusion. If this PR merges, after some minor changes we would be able to run RANGE queries involving INTERVAL in Datafusion. The roadmap of Datafusion includes a plan to support different types for WindowFrameBound.

src/ast/mod.rs Outdated
@@ -880,9 +880,9 @@ pub enum WindowFrameBound {
/// `CURRENT ROW`
CurrentRow,
/// `<N> PRECEDING` or `UNBOUNDED PRECEDING`
Preceding(Option<u64>),
Preceding(Option<RangeBounds>),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this in general can be any Expression (not just a Number of Interval):

https://www.postgresql.org/docs/current/sql-expressions.html

In the offset PRECEDING and offset FOLLOWING frame options, the offset must be an expression not containing any variables, aggregate functions, or window functions. The meaning of the offset depends on the frame mode:

What do you think about just parsing it as an Expression rather than special casing Number / Interval?

@alamb
Copy link
Collaborator

alamb commented Oct 7, 2022

Thank you @mustafasrepo for the contribution

@coveralls
Copy link

coveralls commented Oct 7, 2022

Pull Request Test Coverage Report for Build 3250493085

  • 4 of 4 (100.0%) changed or added relevant lines in 2 files are covered.
  • 846 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.2%) to 85.799%

Files with Coverage Reduction New Missed Lines %
src/ast/value.rs 7 87.32%
src/ast/data_type.rs 14 87.5%
tests/sqlparser_postgres.rs 22 97.74%
tests/sqlparser_common.rs 44 97.05%
src/ast/mod.rs 278 77.4%
src/parser.rs 481 83.33%
Totals Coverage Status
Change from base Build 3181450573: 0.2%
Covered Lines: 10519
Relevant Lines: 12260

💛 - Coveralls

@ozankabak
Copy link
Contributor

Hi Andrew! The SQL:2016 reference seems to expect an unsigned literal, that's why we opted for this. But you are right, PostgreSQL seems to allow expressions -- it only requires such expressions to evaluate to appropriate unsigned values. We can certainly follow the latter approach, but doing that will require both sqlparser and datafusion to accommodate that logic.

It seems to me a good approach could be two-tiered: We can implement the unsigned-value-only design first (both here and in datafusion), and then subsequently work towards PSQL compatibility. What do you think?

@alamb
Copy link
Collaborator

alamb commented Oct 11, 2022

It seems to me a good approach could be two-tiered: We can implement the unsigned-value-only design first (both here and in datafusion), and then subsequently work towards PSQL compatibility. What do you think?

That is fine with me -- I was thinking that using a general Expr in the parser at first would actually make the process easier overall -- the parser / parsed structs would be simpler and DataFusion would have to special case what types it handled. Then over time datafusion could be extended to handle more and more types, but the parser wouldn't need to be changed again

@mustafasrepo
Copy link
Contributor Author

mustafasrepo commented Oct 14, 2022

Hi Andrew, we have changed the implementation as you suggested. It improves readability and decreases the changes required. Thanks for the suggestion. In this PR https://github.com/synnada-ai/arrow-datafusion/pull/5 you can see the corresponding datafusion change as POC implementation to support non-u64 values inside window frames.

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it -- thank you @mustafasrepo

@alamb alamb merged commit 427bec4 into sqlparser-rs:main Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support "date" and "timestamp" inside window frame RANGE Queries
4 participants