New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DictionaryArray in temporal kernels #2623
Conversation
$iter.into_iter().for_each(|value| { | ||
if let Some(value) = value { | ||
match $using(value) { | ||
Some(dt) => $builder.append_value($convert(dt.$extract_fn())), | ||
None => $builder.append_null(), | ||
} | ||
} else { | ||
$builder.append_null(); | ||
} | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change basically rewrites the macro using ArrayAccessor API. Otherwise the logic is the same.
Previously the macro can call value_as_datetime
or value_as_datetime_with_tz
APIs on temporal array, but now we need to call as_datetime
on the value instead.
cc @sunchao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could keep the signature the same, but provide a more generic version with a suffix, e.g. _generic
. Otherwise this is an ergonomic step-back for the case of a non-dictionary encoded array, which is likely by far the common case
|h| h | ||
) | ||
} else { | ||
// No timezone available. Calling `to_string` on the datatime value simply. | ||
extract_component_from_array!(array, builder, to_string, value_as_datetime, |h| h) | ||
let iter = ArrayIter::new(array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ArrayIter
needed? The macro calls into_iter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea
$builder.append_null(); | ||
} else { | ||
match $array.$using(i) { | ||
($iter:ident, $builder:ident, $extract_fn:ident, $using:expr, $convert:expr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a really hard time following this macro, I really wonder if we can't replace it with generics or at the very least simplify it... Something for a future PR I suspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. It'd be better to improve the readability here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I will find some time trying to simplify it. 😅
pub fn year<T, A: ArrayAccessor<Item = T::Native>>(array: A) -> Result<Int32Array> | ||
where | ||
T: ArrowTemporalType + ArrowNumericType, | ||
T::Native: ArrowNativeType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be implied by the definition of ArrowPrimitiveType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
{ | ||
match array.data_type().clone() { | ||
DataType::Dictionary(_, value_type) => { | ||
num_days_from_monday_internal::<T, A>(array, value_type.as_ref().clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this clone
really necessary? Can't num_days_from_monday_internal take &DataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, &DataType
is okay. Removed clone
.
@@ -516,7 +832,7 @@ mod tests { | |||
let a: PrimitiveArray<Date64Type> = | |||
vec![Some(1514764800000), None, Some(1550636625000)].into(); | |||
|
|||
let b = hour(&a).unwrap(); | |||
let b = hour::<Date64Type, _>(&a).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loss of automatic type hinting is a bit unfortunate, it feels like a step-back ergonomically for the common case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored.
Yea, I was hesitating between adding I will add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. Do you know whether arrow-rs
follows Gregorian calendar or Proleptic Gregorian calendar in terms of handling timestamp prior to 1582-10-15?
$builder.append_null(); | ||
} else { | ||
match $array.$using(i) { | ||
($iter:ident, $builder:ident, $extract_fn:ident, $using:expr, $convert:expr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. It'd be better to improve the readability here.
} | ||
|
||
/// Extracts the hours of a given temporal array as an array of integers | ||
pub fn hour_generic<T, A: ArrayAccessor<Item = T::Native>>(array: A) -> Result<Int32Array> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this hour_generic
method? I feel we just need hour
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #2623 (comment). I began with a generic version of hour
to replace existing one. But it will be loss of automatic type hinting as we always need to specify at least T
when calling the generic hour
. Seems a step back ergonomically for the common case.
So currently I provide a generic version and keep original hour
signature untouched.
arrow/src/compute/kernels/cast.rs
Outdated
builder, | ||
to_string, | ||
|value| as_datetime::<T>( | ||
<i64 as From<<T as ArrowPrimitiveType>::Native>>::from(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can just use <i64 as From<_>>::from(value)
and let type inference do the work
@@ -171,335 +172,747 @@ pub fn using_chrono_tz_and_utc_naive_date_time( | |||
.ok() | |||
} | |||
|
|||
/// Extracts the hours of a given temporal array as an array of integers | |||
/// Extracts the hours of a given temporal primitive array as an array of integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add comments explaining what exactly are the returned integers, are they within range [0, 24)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, [0, 23].
} | ||
|
||
Ok(b.finish()) | ||
} | ||
|
||
/// Extracts the quarter of a given temporal array as an array of integers | ||
/// Extracts the quarter of a given temporal primitive array as an array of integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: what do the returned integers represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the quarter within the range of [1, 4].
It is not explicitly mentioned in this crate. But as datetime operation in arrow-rs uses chrono crate which follows ISO 8601. I think it follows Proleptic Gregorian calendar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except minor nit on documentation
} | ||
|
||
Ok(b.finish()) | ||
} | ||
|
||
/// Extracts the month of a given temporal array as an array of integers | ||
/// Extracts the month of a given temporal primitive array as an array of integers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is the returned integers 0-based or 1-based? might add a comment. We should follow the methods like num_days_from_monday
which have some nice documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 1-based. I will add a comment.
Thanks for review. |
Benchmark runs are scheduled for baseline = d73d78f and contender = c8bf1ca. c8bf1ca is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2622.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?