Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overflow-checking variants of arithmetic scalar dyn kernels #2713

Merged
merged 4 commits into from Sep 14, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 12, 2022

Which issue does this PR close?

Closes #2712.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 12, 2022
@viirya viirya force-pushed the overflow_arithmetic_scalar_dyn branch from 7a10d91 to 54d44d2 Compare September 12, 2022 19:53
@viirya
Copy link
Member Author

viirya commented Sep 12, 2022

cc @sunchao

///
/// This detects overflow and returns an `Err` for that. For an non-overflow-checking variant,
/// use `add_scalar_dyn` instead.
pub fn add_scalar_checked_dyn<T>(array: &dyn Array, scalar: T::Native) -> Result<ArrayRef>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: do we have benchmark to track how much slower add_scalar_checked_dyn is comparing to add_scalar_dyn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is anything like the non-scalar kernels, it is about 10x slower. Aside from the branching costs, it prevents LLVM from vectorising it correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I wonder if we should point that out in the doc of this method, in case it's not obvious to the users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it will be much slower. As by default (ansi-mode disabled) in our case, non-checked kernels will be used. So most of time users will use faster one, except they have special need to use checked kernels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm going to add a few lines mentioning that.

@@ -834,12 +834,34 @@ where
/// Add every value in an array by a scalar. If any value in the array is null then the
/// result is also null. The given array must be a `PrimitiveArray` of the type same as
/// the scalar, or a `DictionaryArray` of the value type same as the scalar.
///
/// This doesn't detect overflow. Once overflowing, the result will wrap around.
/// For an overflow-checking variant, use `add_scalar_checked_dyn` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps explain a bit when it will return Err

F: Fn(T::Native) -> Result<T::Native>,
{
downcast_dictionary_array! {
array => try_unary_dict::<_, F, T>(array, op),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm how do we know the dictionary value type matches T?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, there is no type-bound for the dictionary value type. Just do a simple test. At runtime downcast_ref will fail in unary_dict. I will address it in other PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally as the op is provided by users, I suppose that users know dictionary value is same type as the scalar. But it is good to return a meaningful Err instead of runtime panic. I will do it in a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up sounds fine to me. Perhaps we can just check the type here:

    downcast_dictionary_array! {
        array => if array.values().data_type() == &T::DATA_TYPE {
            try_unary_dict::<_, F, T>(array, op)
        } else {
            // throw error
        },
        t => {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, actually I thought to handle it at try_unary_dict. But this fix looks okay as try_unary_dict is currently used here and not public. I may fix at try_unary_dict at another followup.

F: Fn(T::Native) -> Result<T::Native>,
{
downcast_dictionary_array! {
array => try_unary_dict::<_, F, T>(array, op),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up sounds fine to me. Perhaps we can just check the type here:

    downcast_dictionary_array! {
        array => if array.values().data_type() == &T::DATA_TYPE {
            try_unary_dict::<_, F, T>(array, op)
        } else {
            // throw error
        },
        t => {

@viirya viirya merged commit 7594db6 into apache:master Sep 14, 2022
@viirya
Copy link
Member Author

viirya commented Sep 14, 2022

Thanks.

@ursabot
Copy link

ursabot commented Sep 15, 2022

Benchmark runs are scheduled for baseline = 2a0fc77 and contender = 7594db6. 7594db6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add overflow-checking variants of arithmetic scalar dyn kernels
4 participants