Overflow-checking variant of arithmetic scalar kernels #2650

viirya · 2022-09-05T07:12:24Z

Which issue does this PR close?

Closes #2651.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2022-09-05T10:39:42Z

arrow/src/compute/kernels/arity.rs

+    //      ~60% speedup
+    //  Soundness
+    //      `values` is an iterator with a known size because arrays are sized.
+    let buffer = unsafe { Buffer::from_trusted_len_iter(values.into_iter()) };


try_from_trusted_len_iter would allow avoiding collecting into a Vec

tustvold · 2022-09-05T10:41:31Z

arrow/src/compute/kernels/arithmetic.rs

+///
+/// This detects overflow and returns an `Err` for that. For an non-overflow-checking variant,
+/// use `divide_scalar` instead.
+pub fn divide_scalar_checked<T>(


I'm not sure of the value of a separate checked scalar kernel for division, given it only elides a single check of the scalar divisor. I would be tempted to leave it out for now, at least until #2647 is resolved

Sure, I've thought about it and wondered maybe #2647 can be resolved quickly. 😄

Let me remove division first.

…ic_scalar

liukun4515 · 2022-09-06T03:12:28Z

arrow/src/compute/kernels/arity.rs

@@ -83,6 +84,41 @@ where
    PrimitiveArray::<O>::from(data)
 }

+/// A overflow-checking variant of `unary`.
+pub(crate) fn unary_checked<I, F, O>(


I think this should be try_unary and it's general method for primitive array

Are you suggesting to rename it to try_unary?

My thought is to align with the method unary.
This is the reason I give the suggestion like https://github.com/apache/arrow-rs/pull/2650/files#r963221429

I am not sure what you mean to "align with the method unary". The F of unary doesn't return Option or Result.

Based on these comments, let me try to guess what you are suggesting, are you suggesting to change returning type of F function parameter to Result, move the overflow ArrowError::ComputeError to the arithmetic scalar kernels, to make this unary_checked not only for the these arithmetic kernels?

I'm okay for the change, but where do you think the unary_checked will also be used?

Sorry for the confused comments for your PR.
In the pr #2661, the refactor also need a function which can convert the
primitivearray to result< primitivearray>

I think the unary_checked or try_unary can be used like unary where the result need to be Result.

liukun4515 · 2022-09-06T03:30:33Z

arrow/src/compute/kernels/arity.rs

+    O: ArrowPrimitiveType,
+    F: Fn(I::Native) -> Option<O::Native>,
+    I::Native: ArrowNativeTypeOp,


Suggested change

O: ArrowPrimitiveType,

F: Fn(I::Native) -> Option<O::Native>,

I::Native: ArrowNativeTypeOp,

O: ArrowPrimitiveType,

F: Fn(I::Native) -> Result<O::Native>,

I::Native: ArrowNativeTypeOp,

?? There is a reason that the return type of F is Option. It is not arbitrary.

The result of add_check is Option, you use the Option as the result of F.
But the in the below logic, you will check the Option.
If the option is None, will return a error.

liukun4515 · 2022-09-06T03:44:28Z

arrow/src/compute/kernels/arity.rs

+    let values = array.values().iter().map(|v| {
+        let result = op(*v);
+        if let Some(r) = result {
+            Ok(r)
+        } else {
+            // Overflow
+            Err(ArrowError::ComputeError(format!(
+                "Overflow happened on: {:?}",
+                *v
+            )))
+        }
+    });


Suggested change

let values = array.values().iter().map(|v| {

let result = op(*v);

if let Some(r) = result {

Ok(r)

} else {

// Overflow

Err(ArrowError::ComputeError(format!(

"Overflow happened on: {:?}",

*v

)))

}

});

let values = array.values().iter().map(|v| {

op(*v)

});

liukun4515 · 2022-09-06T04:01:47Z

arrow/src/compute/kernels/arithmetic.rs

 {
-    Ok(unary(array, |value| value + scalar))
+    unary_checked(array, |value| value.add_checked(scalar))


after this https://github.com/apache/arrow-rs/pull/2650/files#r963221429
you can define characteristic error message for this closures

tustvold · 2022-09-06T08:51:46Z

arrow/src/compute/kernels/arity.rs

+    F: Fn(I::Native) -> Option<O::Native>,
+    I::Native: ArrowNativeTypeOp,
+{
+    let values = array.values().iter().map(|v| {


I'm not sure this is correct, as it will evaluate for null slots which might have arbitrary values. I think you have to consult the null mask...

#2666 contains a more "correct" version of this

I think the into_primitive_array_data will add the null mask

It will add a null mask yes, but as the operation is fallible it can't be blindly called on slots without first checking those slots aren't null

I think the default bytes or '0' will be inserted into the null slot.

The default value or '0' may cause fail

It will add a null mask yes, but as the operation is fallible it can't be blindly called on slots without first checking those slots aren't null

Got it.

👍 @tustvold

The null slot may have any value, not just 0. Consider the case where a null is added to a non-null value of 200, the resulting null slot will now have a value 200 + whatever was in the null slot. The only guarantee is it is uninitalized, the actual value is arbitrary.

Hmm, good point. I was thought that into_primitive_array_data adds the null buffer back to the result array like unary does. However, I missed that point that value on null slots might cause wrongly failure on calling op here.

viirya · 2022-09-11T01:53:45Z

I will rebase this once #2666 is merged.

…ic_scalar

viirya · 2022-09-11T20:17:45Z

@tustvold Updated to use try_unary.

viirya · 2022-09-12T07:43:41Z

Thanks.

ursabot · 2022-09-12T07:51:25Z

Benchmark runs are scheduled for baseline = e646ae8 and contender = e1f8ed8. e1f8ed8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Overflow-checking variant of arithmetic scalar kernels

5a48454

github-actions bot added the arrow Changes to the arrow crate label Sep 5, 2022

tustvold reviewed Sep 5, 2022

View reviewed changes

viirya added 3 commits September 5, 2022 10:42

Remove division scalar change for now.

be51743

Merge remote-tracking branch 'upstream/master' into overflow_arithmet…

bd55844

…ic_scalar

Merge remote-tracking branch 'upstream/master' into overflow_arithmet…

44b9202

…ic_scalar

liukun4515 reviewed Sep 6, 2022

View reviewed changes

This was referenced Sep 6, 2022

support CastOption for casting numeric #2649

Merged

optimize the numeric_cast_with_error #2661

Merged

liukun4515 reviewed Sep 6, 2022

View reviewed changes

tustvold reviewed Sep 6, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/master' into overflow_arithmet…

f4a02a0

…ic_scalar

liukun4515 approved these changes Sep 11, 2022

View reviewed changes

viirya merged commit e1f8ed8 into apache:master Sep 12, 2022

alamb mentioned this pull request Sep 16, 2022

Add overflow-checking variant of arithmetic scalar kernels #2651

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overflow-checking variant of arithmetic scalar kernels #2650

Overflow-checking variant of arithmetic scalar kernels #2650

viirya commented Sep 5, 2022 •

edited

tustvold Sep 5, 2022

tustvold Sep 5, 2022

viirya Sep 5, 2022

liukun4515 Sep 6, 2022 •

edited

viirya Sep 6, 2022

liukun4515 Sep 6, 2022

viirya Sep 6, 2022

liukun4515 Sep 6, 2022

liukun4515 Sep 6, 2022

viirya Sep 6, 2022

liukun4515 Sep 6, 2022 •

edited

liukun4515 Sep 6, 2022

liukun4515 Sep 6, 2022

tustvold Sep 6, 2022

tustvold Sep 6, 2022

liukun4515 Sep 6, 2022

tustvold Sep 6, 2022

liukun4515 Sep 6, 2022

liukun4515 Sep 6, 2022

tustvold Sep 6, 2022

viirya Sep 6, 2022

viirya commented Sep 11, 2022

viirya commented Sep 11, 2022

viirya commented Sep 12, 2022

ursabot commented Sep 12, 2022

Overflow-checking variant of arithmetic scalar kernels #2650

Overflow-checking variant of arithmetic scalar kernels #2650

Conversation

viirya commented Sep 5, 2022 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 Sep 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 Sep 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Sep 11, 2022

viirya commented Sep 11, 2022

viirya commented Sep 12, 2022

ursabot commented Sep 12, 2022

viirya commented Sep 5, 2022 •

edited

liukun4515 Sep 6, 2022 •

edited

liukun4515 Sep 6, 2022 •

edited