Add overflow-checking variants of arithmetic dyn kernels #2740

viirya · 2022-09-15T23:23:21Z

Which issue does this PR close?

Closes #2739.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

viirya · 2022-09-15T23:24:20Z

cc @sunchao

viirya · 2022-09-16T04:32:42Z

The CI failure looks unrelated.

…d_dyn_kernels

arrow/src/compute/kernels/arithmetic.rs

sunchao · 2022-09-17T00:07:10Z

arrow/src/compute/kernels/arithmetic.rs

+                math_checked_op_dict
+            )
+        }
+        DataType::Date32 => {


nit: I wonder if the handling of Date32 and Date64 can be extracted as a separate function so it can be shared with add_dyn, but feel free to ignore if this is not easy.

Hmm, let me leave it as it is now. I'm wonder if we should do checking/non-checking behavior on Data32/Date64.

viirya · 2022-09-17T00:22:41Z

arrow/src/compute/kernels/arithmetic.rs

@@ -522,67 +548,86 @@ macro_rules! typed_dict_math_op {
    }};
 }

-/// Helper function to perform math lambda function on values from two dictionary arrays, this
-/// version does not attempt to use SIMD explicitly (though the compiler may auto vectorize)
-macro_rules! math_dict_op {


This macro is unnecessary and can be inlined. Removed it.

arrow/src/compute/kernels/arithmetic.rs

arrow/src/compute/kernels/arity.rs

…d_dyn_kernels

viirya · 2022-09-17T07:52:35Z

arrow/src/compute/kernels/arity.rs

-    if a.null_count() == 0 && b.null_count() == 0 {
-        let values = a.values().iter().zip(b.values()).map(|(l, r)| op(*l, *r));


This latest optimization cause a bit trouble on the change from PrimitiveArray to ArrayAccessor as function parameter.

I can only add one new trait ArrayValuesAccessor to coordinate it.

I would rewrite this as external iteration, i.e. for i in 0..array.len() this is easier for the compiler, tends to optimise better and is simpler to understand. MutableBuffer::push_unchecked will behave equivalently

The trusted len iterators are not necessary imo, and are probably best avoided in general.

I see. Sounds good to do. So I can get rid of ArrayValuesAccessor which is just needed to get values.

tustvold

I would move away from using trusted len iter, it's basically a hack and internal iteration will optimise better or the same

FYI @Dandandan

tustvold · 2022-09-17T08:16:03Z

arrow/src/array/array.rs

+
+    /// Returns a values accessor [`ArrayValuesAccessor`] for this [`ArrayAccessor`] if
+    /// it supports. Returns [`None`] if it doesn't support accessing values directly.
+    fn get_values_accessor(&self) -> Option<&dyn ArrayValuesAccessor<Item = Self::Item>> {


What is the point of the dyn indirection here?

I don't want to return a Option<&[Self::Item]> directly as the semantics look weird (i.e. some ArrayAccessor doesn't provide values). This is to return self as ArrayValuesAccessor so the caller can call values.

But I think I will remove it based on #2740 (comment).

tustvold · 2022-09-17T08:18:21Z

arrow/src/compute/kernels/arity.rs

-    if a.null_count() == 0 && b.null_count() == 0 {
-        let values = a.values().iter().zip(b.values()).map(|(l, r)| op(*l, *r));


I would rewrite this as external iteration, i.e. for i in 0..array.len() this is easier for the compiler, tends to optimise better and is simpler to understand. MutableBuffer::push_unchecked will behave equivalently

The trusted len iterators are not necessary imo, and are probably best avoided in general.

tustvold · 2022-09-17T08:21:32Z

arrow/src/compute/kernels/arithmetic.rs

+/// This is similar to `math_op` as it performs given operation between two input primitive arrays.
+/// But the given operation can return `Err` if overflow is detected. For the case, this function
+/// returns an `Err`.
+fn math_checked_op<LT, RT, F>(


This does not seem to be necessary?

It is for type bound only.

Changed the math_checked_op to try_binary, got:

error[E0284]: type annotations needed: cannot satisfy `<_ as native::ArrowPrimitiveType>::Native == i8` --> arrow/src/compute/kernels/arithmetic.rs:848:21 | 848 | try_binary(left, right, |a, b| a.add_checked(b)).map(|a| Arc::new(a) as ArrayRef) | ^^^^^^^^^^ cannot satisfy `<_ as native::ArrowPrimitiveType>::Native == i8`

Is it not just a matter of providing the necessary type hint?

It is wrapped in downcast_primitive_array macro call. It seems no way to add type hint.

Oh I see, it is because the return type needs to be type hinted. Darn...

tustvold · 2022-09-17T08:26:23Z

arrow/src/compute/kernels/arithmetic.rs

-            )));
-        }
+/// Perform given operation on two `DictionaryArray`s.
+/// Returns an error if the two arrays have different value type


The performance of this will be pretty terrible, and results in a huge amount of codegen. I'm not entirely sure of the use-case tbh... Perhaps it is worth exploring putting this behind a feature flag

Edit: I wouldn't be surprised if hydrating the dictionary to its values, performing the operation and casting back was faster

Hm, math_op_dict? The change just inlines macro math_dict_op into the function math_op_dict. So I suppose you are not meaning the change but math_op_dict itself?

This can be done in follow-up to optimize it. The current change diff is quite big.

Fair enough, the compile times are currently despair inducing and I have a chip on my shoulder about them 😅

viirya · 2022-09-17T17:07:55Z

arrow/src/compute/kernels/arity.rs

+        buffer.append_n_zeroed(len);
+        let slice = buffer.as_slice_mut();
+
+        for idx in 0..len {


Rewrite it with a for loop.

tustvold

Could we run the arithmetic benchmarks possibly?

tustvold · 2022-09-17T17:13:06Z

arrow/src/compute/kernels/arity.rs

-        //      `values` is an iterator with a known size from a PrimitiveArray
-        return Ok(unsafe { build_primitive_array(len, buffer, 0, None) });
+        let mut buffer = BufferBuilder::<O::Native>::new(len);
+        buffer.append_n_zeroed(len);


It would be faster to reserve the correct capacity and use push_unchecked. Avoids zero allocating.
As written I suspect this is a performance regression as this if block is effectively identical to the one below now

Hmm, if I get it correctly, I did something like:

for idx in 0..len { unsafe { buffer.push_unchecked(op(a.value_unchecked(idx), b.value_unchecked(idx))?); }; }

But still see regression.

Shall I roll back to the ArrayValuesAccessor way? I benchmarked it and the performance keeps the same.

I'll have a brief play tomorrow, I would really like to avoid adding further APIs, especially ones that aren't a generalisable abstraction, if we can possibly avoid it

Tried another way to iterate the values from both side. But still the regression.

I think when we call value_unchecked to get values iteratively, it is an indirect access pattern compared with values where we simply process on a slice. It makes LLVM hard to optimize it as vectorized access.

Slice iterators boil down to the same thing, in fact the external iteration is harder for llvm. Something else is going on here.

So this appears to boil down to the whims of LLVMs inlining heuristics, try_from_trusted_len_iter doesn't appear to get fully inlined and therefore gets optimised properly (although I have a really hard time understanding the generated code). Potentially rustc is doing something to help LLVM here.

However, when the loop is defined in the function body of try_binary LLVM doesn't optimize it properly. If you split it out into a free function with inline(never), it optimises correctly and is actually marginally faster than the iterator.

Annoyingly using try_from_trusted_len_iter but doing something like (0..a.len()).map(|idx| ...) to construct the iterator also doesn't work, unless split into a free function.

I would therefore suggest splitting the no null variant into a free function marked inline never, as this appears to work...

FYI @Dandandan as I wonder if we're running into this elsewhere.

Hmm, interesting...let me split it out to a free function and benchmark again.

Yea, the suggested never inline function works.

tustvold · 2022-09-17T17:15:24Z

arrow/src/compute/kernels/arithmetic.rs

+/// This is similar to `math_op` as it performs given operation between two input primitive arrays.
+/// But the given operation can return `Err` if overflow is detected. For the case, this function
+/// returns an `Err`.
+fn math_checked_op<LT, RT, F>(


Is it not just a matter of providing the necessary type hint?

tustvold · 2022-09-17T17:16:04Z

arrow/src/compute/kernels/arithmetic.rs

-            )));
-        }
+/// Perform given operation on two `DictionaryArray`s.
+/// Returns an error if the two arrays have different value type


Fair enough, the compile times are currently despair inducing and I have a chip on my shoulder about them 😅

tustvold · 2022-09-17T17:19:02Z

arrow/src/compute/kernels/arithmetic.rs

 {
-    math_dict_op!(left, right, op, PrimitiveArray<T>)
+    let left = left.downcast_dict::<PrimitiveArray<T>>().unwrap();


This possibly needs to check the actual type, currently it will panic

This is called by a macro which matches the value types of both sides, so it should be matched. But let me add one check.

Dandandan · 2022-09-17T17:27:17Z

I would move away from using trusted len iter, it's basically a hack and internal iteration will optimise better or the same

FYI @Dandandan

Sure - if we get a similar performance or better I've no problems with that.
I am not sure I would call it a hack though - the Rust standard library uses the same technique.

tustvold · 2022-09-17T17:39:28Z

Agreed, iterators can be made to perform the same, but it requires a significant amount of trickery and is subject to the whims of LLVM. For the purposes of arrow kernels, its just unnecessary imo.

https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb may also be of interest, though sadly it isnt't possible to implement try_for_each currently.

viirya · 2022-09-17T19:24:19Z

Could we run the arithmetic benchmarks possibly?

Yea, checked kernels with no null values show obvious regression.

jhorstmann · 2022-09-19T08:35:11Z

arrow/src/compute/kernels/arity.rs

+    O: ArrowPrimitiveType,
+    F: Fn(A::Item, B::Item) -> Result<O::Native>,
+{
+    let mut buffer = MutableBuffer::new(len);


The length parameter for MutableBuffer::new is in bytes, so I think this needs to be multiplied by O::get_byte_width()

Missed it. Thanks for catching it.

tustvold · 2022-09-20T10:25:22Z

arrow/src/compute/kernels/arithmetic.rs

+/// This is similar to `math_op` as it performs given operation between two input primitive arrays.
+/// But the given operation can return `Err` if overflow is detected. For the case, this function
+/// returns an `Err`.
+fn math_checked_op<LT, RT, F>(


Oh I see, it is because the return type needs to be type hinted. Darn...

ursabot · 2022-09-20T10:41:45Z

Benchmark runs are scheduled for baseline = 3bf6eb9 and contender = 9599178. 9599178 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

viirya · 2022-09-20T16:54:07Z

Thanks for review!

viirya added 4 commits September 14, 2022 09:39

Init

0c05bab

More

ac376a1

More

7f570d1

Add tests

97c7094

github-actions bot added the arrow Changes to the arrow crate label Sep 15, 2022

viirya added 2 commits September 16, 2022 09:22

Merge remote-tracking branch 'upstream/master' into arithmetic_checke…

90b3b45

…d_dyn_kernels

Fix clippy

9d526c9

sunchao reviewed Sep 17, 2022

View reviewed changes

Remove macro

04066c0

viirya commented Sep 17, 2022

View reviewed changes

viirya added 2 commits September 16, 2022 17:41

Update doc

c69c48b

Fix clippy

dc6077f

HaoYang670 reviewed Sep 17, 2022

View reviewed changes

arrow/src/compute/kernels/arithmetic.rs Outdated Show resolved Hide resolved

arrow/src/compute/kernels/arithmetic.rs Outdated Show resolved Hide resolved

arrow/src/compute/kernels/arity.rs Show resolved Hide resolved

viirya added 3 commits September 16, 2022 18:49

Remove length check

83dcff1

Merge remote-tracking branch 'upstream/master' into arithmetic_checke…

0be33e0

…d_dyn_kernels

Tweak try_binary to coordinate latest optimization

4394ff1

viirya commented Sep 17, 2022

View reviewed changes

Fix clippy

0f8a5bb

tustvold reviewed Sep 17, 2022

View reviewed changes

Use for loop

d81924e

viirya commented Sep 17, 2022

View reviewed changes

tustvold reviewed Sep 17, 2022

View reviewed changes

Split non-null variant into never inline function

4eb0fe4

Add value type check

38de665

jhorstmann reviewed Sep 19, 2022

View reviewed changes

Multiply by get_byte_width of output type.

4d06826

tustvold approved these changes Sep 20, 2022

View reviewed changes

tustvold merged commit 9599178 into apache:master Sep 20, 2022

alamb mentioned this pull request Sep 30, 2022

Add overflow-checking variants of arithmetic dyn kernels #2739

Closed

		if a.null_count() == 0 && b.null_count() == 0 {
		let values = a.values().iter().zip(b.values()).map(\|(l, r)\| op(l, r));

Add overflow-checking variants of arithmetic dyn kernels #2740

Add overflow-checking variants of arithmetic dyn kernels #2740

Conversation

viirya commented Sep 15, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

viirya commented Sep 15, 2022

viirya commented Sep 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Sep 17, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold left a comment

Choose a reason for hiding this comment

tustvold Sep 17, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Sep 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dandandan commented Sep 17, 2022

tustvold commented Sep 17, 2022 • edited

viirya commented Sep 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ursabot commented Sep 20, 2022

viirya commented Sep 20, 2022

tustvold Sep 17, 2022 •

edited

tustvold Sep 17, 2022 •

edited

tustvold Sep 18, 2022 •

edited

tustvold commented Sep 17, 2022 •

edited