Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow_cast to cast to timestamptz #5914

Merged
merged 5 commits into from
Apr 7, 2023

Conversation

waitingkuo
Copy link
Contributor

Which issue does this PR close?

Closes #5913

cc @alamb

Rationale for this change

What changes are included in this PR?

allow arrow_cast to cast to timestamptz

select arrow_cast(timestamp '2000-01-01T00:00:00', 'Timestamp(Nanosecond, Some( "+08:00" ))');
+-----------------------------+
| Utf8("2000-01-01T00:00:00") |
+-----------------------------+
| 2000-01-01T08:00:00+08:00   |
+-----------------------------+
1 row in set. Query took 0.010 seconds.
select arrow_cast(timestamptz '2000-01-01T00:00:00Z', 'Timestamp(Nanosecond, Some( "+08:00" ))');
+------------------------------+
| Utf8("2000-01-01T00:00:00Z") |
+------------------------------+
| 2000-01-01T08:00:00+08:00    |
+------------------------------+
1 row in set. Query took 0.002 seconds.
select arrow_cast(0, 'Timestamp(Nanosecond, Some( "+08:00" ))');
+---------------------------+
| Int64(0)                  |
+---------------------------+
| 1970-01-01T08:00:00+08:00 |
+---------------------------+
1 row in set. Query took 0.004 seconds.

Are these changes tested?

yes

Are there any user-facing changes?

@github-actions github-actions bot added core Core datafusion crate sql sqllogictest labels Apr 7, 2023
Comment on lines +171 to +185
fn parse_timezone(&mut self, context: &str) -> Result<Option<String>> {
match self.next_token()? {
Token::None => Ok(None),
Token::Some => {
self.expect_token(Token::LParen)?;
let timezone = self.parse_double_quoted_string("Timezone")?;
self.expect_token(Token::RParen)?;
Ok(Some(timezone))
}
tok => Err(make_error(
self.val,
&format!("finding Timezone for {context}, got {tok}"),
)),
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a parser to parse timezone Option<String>
either None, or Some("SomeTimezone")

Comment on lines +187 to +197
/// Parses the next double quoted string
fn parse_double_quoted_string(&mut self, context: &str) -> Result<String> {
match self.next_token()? {
Token::DoubleQuotedString(s) => Ok(s),
tok => Err(make_error(
self.val,
&format!("finding double quoted string for {context}, got '{tok}'"),
)),
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add parser to parse the double quoted string e.g. "+00:00"

@tustvold
Copy link
Contributor

tustvold commented Apr 7, 2023

❯ select arrow_cast(timestamp '2000-01-01T00:00:00', 'Timestamp(Nanosecond, Some( "+08:00" ))');
+-----------------------------+
| Utf8("2000-01-01T00:00:00") |
+-----------------------------+
| 2000-01-01T08:00:00+08:00   |
+-----------------------------+
1 row in set. Query took 0.010 seconds.

IMO this behaviour is a bug in arrow-cast that should be fixed, I'm not sure if we wish to expose this.

The "correct" behaviour would be

❯ select arrow_cast(timestamp '2000-01-01T00:00:00', 'Timestamp(Nanosecond, Some( "+08:00" ))');
+-----------------------------+
| Utf8("2000-01-01T00:00:00") |
+-----------------------------+
| 2000-01-01T00:00:00+08:00   |
+-----------------------------+
1 row in set. Query took 0.010 seconds.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @waitingkuo -- I had some small suggestions on some other tests but we can also add them as a follow on PR

DataType::Timestamp(TimeUnit::Nanosecond, Some("+08:00".into())),
DataType::Timestamp(TimeUnit::Microsecond, Some("+08:00".into())),
DataType::Timestamp(TimeUnit::Millisecond, Some("+08:00".into())),
DataType::Timestamp(TimeUnit::Second, Some("+08:00".into())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other test cases that would be good:

"" (empty timezone)

Also errors:

Three double quotes

"Timestamp(Nanosecond, Some("+00:00""))" 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @alamb , more error handling and test cases added

@tustvold
Copy link
Contributor

tustvold commented Apr 7, 2023

apache/arrow-rs#4034 should fix the timestamp handling in arrow-cast, if we are quick it may still make the release this afternoon

Comment on lines +752 to +765
r#"Timestamp(Nanosecond, Some(+00:00))"#,
"Error unrecognized word: +00:00",
),
(
r#"Timestamp(Nanosecond, Some("+00:00))"#,
r#"parsing "+00:00 as double quoted string: last char must be ""#,
),
(
r#"Timestamp(Nanosecond, Some(""))"#,
r#"parsing "" as double quoted string: empty string isn't supported"#,
),
(
r#"Timestamp(Nanosecond, Some("+00:00""))"#,
r#"parsing "+00:00"" as double quoted string: escaped double quote isn't supported"#,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
added error cases for
empty string and
double quoted string that contains double quote

Comment on lines +324 to +328
# once https://github.com/apache/arrow-rs/issues/1936 fixed, please update this query
query P
select arrow_cast(timestamp '2000-01-01T00:00:00Z', 'Timestamp(Nanosecond, Some( "+08:00" ))');
----
2000-01-01T08:00:00+08:00
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold thank you
added this case to notify users once the kernel casting issue fixed

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it -- thanks again @waitingkuo

}
}

if len == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice error messages

@alamb alamb merged commit a0fb3c6 into apache:main Apr 7, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate sql sqllogictest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow ARROW_CAST to cast to TimestampTz
3 participants