New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format Timestamps as RFC3339 #2939
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,7 @@ use crate::{array, datatypes::IntervalUnit}; | |
use array::DictionaryArray; | ||
|
||
use crate::error::{ArrowError, Result}; | ||
use arrow_array::timezone::Tz; | ||
|
||
macro_rules! make_string { | ||
($array_type:ty, $column: ident, $row: ident) => {{ | ||
|
@@ -190,14 +191,37 @@ macro_rules! make_string_datetime { | |
} else { | ||
array | ||
.value_as_datetime($row) | ||
.map(|d| d.to_string()) | ||
.map(|d| format!("{:?}", d)) | ||
.unwrap_or_else(|| "ERROR CONVERTING DATE".to_string()) | ||
}; | ||
|
||
Ok(s) | ||
}}; | ||
} | ||
|
||
macro_rules! make_string_datetime_with_tz { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it might also be worth commenting that date/times are formatted as rfc3339? |
||
($array_type:ty, $tz_string: ident, $column: ident, $row: ident) => {{ | ||
let array = $column.as_any().downcast_ref::<$array_type>().unwrap(); | ||
|
||
let s = if array.is_null($row) { | ||
"".to_string() | ||
} else { | ||
match $tz_string.parse::<Tz>() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as written this is going to parse the timezone string for every row -- perhaps we can parse it once per array 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general this display code is very inefficient, I think there is a broader issue to clean it up. See the disclaimer on https://docs.rs/arrow/latest/arrow/util/display/fn.array_value_to_string.html |
||
Ok(tz) => array | ||
.value_as_datetime_with_tz($row, tz) | ||
.map(|d| format!("{}", d.to_rfc3339())) | ||
.unwrap_or_else(|| "ERROR CONVERTING DATE".to_string()), | ||
Err(_) => array | ||
.value_as_datetime($row) | ||
.map(|d| format!("{:?} (Unknown Time Zone '{}')", d, $tz_string)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't seem right -- isn't this error for the case when I think we should remove the match statement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the logic behind is: while parsing TZ failed, parse datetime only
+--------------------------------------------------------+
| Timestamp(Second, Some("Asia/Taipei2")) |
+--------------------------------------------------------+
| 1970-01-01T00:00:00 (Unknown Time Zone 'Asia/Taipei2') |
+--------------------------------------------------------+
+--------------------------------------------------------+
| Timestamp(Second, Some("Asia/Taipei2")) |
+--------------------------------------------------------+
| ERROR CONVERTING DATE |
+--------------------------------------------------------+ @alamb @tustvold +--------------------------------------------------------+
| Timestamp(Second, Some("Asia/Taipei2")) |
+--------------------------------------------------------+
| Unknown Time Zone 'Asia/Taipei2' |
+--------------------------------------------------------+ i agree the logic here is confusing, i should've added some comments here |
||
.unwrap_or_else(|| "ERROR CONVERTING DATE".to_string()), | ||
} | ||
}; | ||
|
||
Ok(s) | ||
}}; | ||
} | ||
|
||
// It's not possible to do array.value($row).to_string() for &[u8], let's format it as hex | ||
macro_rules! make_string_hex { | ||
($array_type:ty, $column: ident, $row: ident) => {{ | ||
|
@@ -334,17 +358,55 @@ pub fn array_value_to_string(column: &array::ArrayRef, row: usize) -> Result<Str | |
DataType::Float32 => make_string!(array::Float32Array, column, row), | ||
DataType::Float64 => make_string!(array::Float64Array, column, row), | ||
DataType::Decimal128(..) => make_string_from_decimal(column, row), | ||
DataType::Timestamp(unit, _) if *unit == TimeUnit::Second => { | ||
make_string_datetime!(array::TimestampSecondArray, column, row) | ||
DataType::Timestamp(unit, tz_string_opt) if *unit == TimeUnit::Second => { | ||
match tz_string_opt { | ||
Some(tz_string) => make_string_datetime_with_tz!( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be nice to combine the make_string_datetime_with_tz and make_string_datetime macros together? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would also be awesome to avoid having to make a string at all (like return a |
||
array::TimestampSecondArray, | ||
tz_string, | ||
column, | ||
row | ||
), | ||
None => make_string_datetime!(array::TimestampSecondArray, column, row), | ||
} | ||
} | ||
DataType::Timestamp(unit, _) if *unit == TimeUnit::Millisecond => { | ||
make_string_datetime!(array::TimestampMillisecondArray, column, row) | ||
DataType::Timestamp(unit, tz_string_opt) if *unit == TimeUnit::Millisecond => { | ||
match tz_string_opt { | ||
Some(tz_string) => make_string_datetime_with_tz!( | ||
array::TimestampMillisecondArray, | ||
tz_string, | ||
column, | ||
row | ||
), | ||
None => { | ||
make_string_datetime!(array::TimestampMillisecondArray, column, row) | ||
} | ||
} | ||
} | ||
DataType::Timestamp(unit, _) if *unit == TimeUnit::Microsecond => { | ||
make_string_datetime!(array::TimestampMicrosecondArray, column, row) | ||
DataType::Timestamp(unit, tz_string_opt) if *unit == TimeUnit::Microsecond => { | ||
match tz_string_opt { | ||
Some(tz_string) => make_string_datetime_with_tz!( | ||
array::TimestampMicrosecondArray, | ||
tz_string, | ||
column, | ||
row | ||
), | ||
None => { | ||
make_string_datetime!(array::TimestampMicrosecondArray, column, row) | ||
} | ||
} | ||
} | ||
DataType::Timestamp(unit, _) if *unit == TimeUnit::Nanosecond => { | ||
make_string_datetime!(array::TimestampNanosecondArray, column, row) | ||
DataType::Timestamp(unit, tz_string_opt) if *unit == TimeUnit::Nanosecond => { | ||
match tz_string_opt { | ||
Some(tz_string) => make_string_datetime_with_tz!( | ||
array::TimestampNanosecondArray, | ||
tz_string, | ||
column, | ||
row | ||
), | ||
None => { | ||
make_string_datetime!(array::TimestampNanosecondArray, column, row) | ||
} | ||
} | ||
} | ||
DataType::Date32 => make_string_date!(array::Date32Array, column, row), | ||
DataType::Date64 => make_string_date!(array::Date64Array, column, row), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -370,13 +370,134 @@ mod tests { | |
}; | ||
} | ||
|
||
/// Generate an array with type $ARRAYTYPE with a numeric value of | ||
/// $VALUE, and compare $EXPECTED_RESULT to the output of | ||
/// formatting that array with `pretty_format_batches` | ||
macro_rules! check_datetime_with_timezone { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think passing an optional timezone to check_datetime and calling On a related note, I don't think either need to be macros and therefore probably shouldn't be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As in "make them functions" 👍 |
||
($ARRAYTYPE:ident, $VALUE:expr, $TZ_STRING:expr, $EXPECTED_RESULT:expr) => { | ||
let mut builder = $ARRAYTYPE::builder(10); | ||
builder.append_value($VALUE); | ||
builder.append_null(); | ||
let array = builder.finish(); | ||
let array = array.with_timezone($TZ_STRING); | ||
|
||
let schema = Arc::new(Schema::new(vec![Field::new( | ||
"f", | ||
array.data_type().clone(), | ||
true, | ||
)])); | ||
let batch = RecordBatch::try_new(schema, vec![Arc::new(array)]).unwrap(); | ||
|
||
let table = pretty_format_batches(&[batch]) | ||
.expect("formatting batches") | ||
.to_string(); | ||
|
||
let expected = $EXPECTED_RESULT; | ||
let actual: Vec<&str> = table.lines().collect(); | ||
|
||
assert_eq!(expected, actual, "Actual result:\n\n{:#?}\n\n", actual); | ||
}; | ||
} | ||
|
||
#[test] | ||
#[cfg(features = "chrono-tz")] | ||
fn test_pretty_format_timestamp_second_with_utc_timezone() { | ||
let expected = vec![ | ||
"+---------------------------+", | ||
"| f |", | ||
"+---------------------------+", | ||
"| 1970-05-09T14:25:11+00:00 |", | ||
"| |", | ||
"+---------------------------+", | ||
]; | ||
check_datetime_with_timezone!( | ||
TimestampSecondArray, | ||
11111111, | ||
"UTC".to_string(), | ||
expected | ||
); | ||
} | ||
|
||
#[test] | ||
#[cfg(features = "chrono-tz")] | ||
fn test_pretty_format_timestamp_second_with_non_utc_timezone() { | ||
let expected = vec![ | ||
"+---------------------------+", | ||
"| f |", | ||
"+---------------------------+", | ||
"| 1970-05-09T22:25:11+08:00 |", | ||
"| |", | ||
"+---------------------------+", | ||
]; | ||
check_datetime_with_timezone!( | ||
TimestampSecondArray, | ||
11111111, | ||
"Asia/Taipei".to_string(), | ||
expected | ||
); | ||
} | ||
|
||
#[test] | ||
fn test_pretty_format_timestamp_second_with_fixed_offset_timezone() { | ||
let expected = vec![ | ||
"+---------------------------+", | ||
"| f |", | ||
"+---------------------------+", | ||
"| 1970-05-09T22:25:11+08:00 |", | ||
"| |", | ||
"+---------------------------+", | ||
]; | ||
check_datetime_with_timezone!( | ||
TimestampSecondArray, | ||
11111111, | ||
"+08:00".to_string(), | ||
expected | ||
); | ||
} | ||
|
||
#[test] | ||
fn test_pretty_format_timestamp_second_with_incorrect_fixed_offset_timezone() { | ||
let expected = vec![ | ||
"+-------------------------------------------------+", | ||
"| f |", | ||
"+-------------------------------------------------+", | ||
"| 1970-05-09T14:25:11 (Unknown Time Zone '08:00') |", | ||
"| |", | ||
"+-------------------------------------------------+", | ||
]; | ||
check_datetime_with_timezone!( | ||
TimestampSecondArray, | ||
11111111, | ||
"08:00".to_string(), | ||
expected | ||
); | ||
} | ||
|
||
#[test] | ||
fn test_pretty_format_timestamp_second_with_unknown_timezone() { | ||
let expected = vec![ | ||
"+---------------------------------------------------+", | ||
"| f |", | ||
"+---------------------------------------------------+", | ||
"| 1970-05-09T14:25:11 (Unknown Time Zone 'Unknown') |", | ||
"| |", | ||
"+---------------------------------------------------+", | ||
]; | ||
check_datetime_with_timezone!( | ||
TimestampSecondArray, | ||
11111111, | ||
"Unknown".to_string(), | ||
expected | ||
); | ||
} | ||
|
||
#[test] | ||
fn test_pretty_format_timestamp_second() { | ||
let expected = vec![ | ||
"+---------------------+", | ||
"| f |", | ||
"+---------------------+", | ||
"| 1970-05-09 14:25:11 |", | ||
"| 1970-05-09T14:25:11 |", | ||
"| |", | ||
"+---------------------+", | ||
]; | ||
|
@@ -389,7 +510,7 @@ mod tests { | |
"+-------------------------+", | ||
"| f |", | ||
"+-------------------------+", | ||
"| 1970-01-01 03:05:11.111 |", | ||
"| 1970-01-01T03:05:11.111 |", | ||
"| |", | ||
"+-------------------------+", | ||
]; | ||
|
@@ -402,7 +523,7 @@ mod tests { | |
"+----------------------------+", | ||
"| f |", | ||
"+----------------------------+", | ||
"| 1970-01-01 00:00:11.111111 |", | ||
"| 1970-01-01T00:00:11.111111 |", | ||
"| |", | ||
"+----------------------------+", | ||
]; | ||
|
@@ -415,7 +536,7 @@ mod tests { | |
"+-------------------------------+", | ||
"| f |", | ||
"+-------------------------------+", | ||
"| 1970-01-01 00:00:00.011111111 |", | ||
"| 1970-01-01T00:00:00.011111111 |", | ||
"| |", | ||
"+-------------------------------+", | ||
]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to show an example of going from a
chrono
timezone (likeFixedOffset
) to a Arrow::tz (though maybe that is already available -- I didn't check)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb
I can't add it here as inner data is private now
arrow-rs/arrow-array/src/timezone.rs
Lines 249 to 251 in 87ac05b
as #2909 is to add a timezone abstraction, do we still encourage user to use chrono api directly?
@tustvold do you have any comments?