Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_part support fractions of second #4385

Merged
merged 8 commits into from
Dec 6, 2022
Merged

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #3997

Rationale for this change

See #3997

What changes are included in this PR?

Extending date_part to support millis, micros, nanos

Are these changes tested?

Yes

Are there any user-facing changes?

No

No

@github-actions github-actions bot added core Core datafusion crate physical-expr Physical Expressions labels Nov 27, 2022
@comphead
Copy link
Contributor Author

comphead commented Nov 27, 2022

@waitingkuo finally we can get back to #3997 resolution.
Please check the PR.

Now date_part supports second fraction, but its still not the same as in PSQL like

willy=# select date_part('second', timestamp '2000-01-01T00:00:00.1');
 date_part 
-----------
       0.1
(1 row)

To do the same you need to run

select date_part('milliseconds', timestamp)
 date_part 
-----------
       1
(1 row)

Let me know if this is ok

@waitingkuo
Copy link
Contributor

waitingkuo commented Nov 28, 2022

hi @comphead thank you

i tested the pr

select date_part('second', timestamp '2000-01-01T00:00:09.123456');
+-------------------------------------------------------------+
| datepart(Utf8("second"),Utf8("2000-01-01T00:00:09.123456")) |
+-------------------------------------------------------------+
| 9                                                           |
+-------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.
❯ select date_part('millisecond', timestamp '2000-01-01T00:00:09.123456');
+------------------------------------------------------------------+
| datepart(Utf8("millisecond"),Utf8("2000-01-01T00:00:09.123456")) |
+------------------------------------------------------------------+
| 123                                                              |
+------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.
❯ select date_part('microsecond', timestamp '2000-01-01T00:00:09.123456');
+------------------------------------------------------------------+
| datepart(Utf8("microsecond"),Utf8("2000-01-01T00:00:09.123456")) |
+------------------------------------------------------------------+
| 123456                                                           |
+------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.

the behavior is different than what postgresql has

willy=# select date_part('second', timestamp '2000-01-01T00:00:09.123456');
 date_part 
-----------
  9.123456
(1 row)

willy=# select date_part('millisecond', timestamp '2000-01-01T00:00:09.123456');
 date_part 
-----------
  9123.456
(1 row)

willy=# select date_part('microsecond', timestamp '2000-01-01T00:00:09.123456');
 date_part 
-----------
   9123456
(1 row)

i checked some other system,
spark seems to have the same behavior as postgresql

# this is spark
SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001');
+----------------------------------------------------------+
|date_part(SECONDS, TIMESTAMP '2019-10-01 00:00:01.000001')|
+----------------------------------------------------------+
|                                                  1.000001|
+----------------------------------------------------------+

while mysql's is similar as this pr

# this is MYSQL
EXTRACT(SECOND FROM "2017-06-20 00:00:01.123456");
1

I originally purposed to output f64 instead of i32 since i'd like to follow postgresql's
I don't have strong opinion here, if stick with what MySQL is like, then i think i32 is enough

@alamb @tustvold do you have any suggestion?
note that this issue was from #3980 (comment)

@comphead
Copy link
Contributor Author

Thanks @waitingkuo for testing the PR. I think since we now having seconds fraction support so we can concat the value and be in sync with postgres date_part if this is a preferred way

@alamb
Copy link
Contributor

alamb commented Dec 1, 2022

I think we should follow postgres (output floating point) if possible

# Conflicts:
#	datafusion/physical-expr/src/datetime_expressions.rs
…afusion into date_part_f64

# Conflicts:
#	datafusion/physical-expr/src/datetime_expressions.rs
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Dec 6, 2022
@comphead comphead marked this pull request as ready for review December 6, 2022 06:01
@comphead
Copy link
Contributor Author

comphead commented Dec 6, 2022

@waitingkuo made in sync with postgres.
I have found extract nanosecond faces parser issue

DataFusion CLI v14.0.0
❯ select EXTRACT(nanosecond FROM to_timestamp('2020-09-08T12:00:12.12345678+00:00'));  🤔 Invalid statement: sql parser error: Expected date/time field, found: nanosecond

whereas nanosecond date_part works. I will create a ticket for this

Copy link
Contributor

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @comphead thank you, this looks good to me.

Comment on lines +1299 to +1302
// test_expression!(
// "EXTRACT(nanosecond FROM to_timestamp('2020-09-08T12:00:12.12345678+00:00'))",
// "1212345678"
// );
Copy link
Contributor

@waitingkuo waitingkuo Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waitingkuo
Copy link
Contributor

@waitingkuo made in sync with postgres. I have found extract nanosecond faces parser issue

DataFusion CLI v14.0.0
❯ select EXTRACT(nanosecond FROM to_timestamp('2020-09-08T12:00:12.12345678+00:00'));  🤔 Invalid statement: sql parser error: Expected date/time field, found: nanosecond

whereas nanosecond date_part works. I will create a ticket for this

@comphead sqlparser doesn't support nanosecond for now.
https://github.com/sqlparser-rs/sqlparser-rs/blob/813f4a2eff8f091b643058ac1a46a4fbffd96806/src/parser.rs#L1187-L1221

@waitingkuo
Copy link
Contributor

@comphead i created a ticket and submitted a pr in sqlparser sqlparser-rs/sqlparser-rs#748

it would be great if you could creat another ticket in datafusion so that we could fix it once the issue in sqlparser merged and new version released

@comphead
Copy link
Contributor Author

comphead commented Dec 6, 2022

@comphead i created a ticket and submitted a pr in sqlparser sqlparser-rs/sqlparser-rs#748

it would be great if you could creat another ticket in datafusion so that we could fix it once the issue in sqlparser merged and new version released

#4528
Please assign this to me.

@alamb
Copy link
Contributor

alamb commented Dec 6, 2022

Thanks @comphead and @waitingkuo !

@ursabot
Copy link

ursabot commented Dec 6, 2022

Benchmark runs are scheduled for baseline = 19cddf5 and contender = cedb05a. cedb05a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@comphead comphead deleted the date_part_f64 branch January 1, 2023 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change date_part return types to f64
4 participants