Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update try_binary and checked_ops, and remove math_checked_op #2717

Merged
merged 8 commits into from Sep 16, 2022

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Sep 13, 2022

Signed-off-by: remzi 13716567376yh@gmail.com

Which issue does this PR close?

Closes #2715.

Rationale for this change

  1. try_binary should not panic on unequaled array length
  2. The checked ops (checked_add, checked_mul ...) is better to return a Result. Because
    • we can return the error directly in try_binary and other try_... kernels
    • We dont need to wrap the ops in the compute kernel, which can allow us to do some simplification on those private functions
  3. Remove math_checked_op because it is the same as try_binary
  4. Add tests for float division

What changes are included in this PR?

Are there any user-facing changes?

delete math_checked_op
update the return type of checked ops

Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 13, 2022
@HaoYang670
Copy link
Contributor Author

Mark this as a draft now because I also need to update the math_checked_divide_op

@HaoYang670 HaoYang670 marked this pull request as draft September 13, 2022 04:52
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 self-assigned this Sep 13, 2022
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as ready for review September 13, 2022 09:59
@HaoYang670 HaoYang670 removed their assignment Sep 13, 2022
@HaoYang670
Copy link
Contributor Author

Hmm, tests failed on SIMD. Let me find a good solution.

Comment on lines +217 to +220
fn div_checked(self, rhs: Self) -> Result<Self> {
if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
Copy link
Member

@viirya viirya Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh, I don't think this is correct. Your proposal #2719 is only checking division by zero error for integer division. But it doesn't match the behavior of C++ DivideChecked kernel, and Spark divide expression. We have clear distinction between checked and non-checked kernels. The checked kernel should catch such invalid inputs at the runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't match the behavior of C++ DivideChecked kernel

Thank you @viirya, I find the code in C++: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/base_arithmetic_internal.h#L410-L419 and it does the check for float point values.

If we follow the c++ behaviour, maybe we should add some docs such as

Float point Division by zero will return an error in `checked_div` but not panic when using `unchecked_div`

Copy link
Member

@viirya viirya Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

divide_checked doc has mentioned that it returns "ArrowError::DivideByZero if any right side value is zero". divide doc also describes that for floating point types, the result of dividing by zero follows normal floating point rules. Only for other numeric types, division by zero will panic.

I think it already said what you meant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'll update the issue and the PR

@HaoYang670 HaoYang670 marked this pull request as draft September 14, 2022 01:46
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
/// This is similar to `math_op` as it performs given operation between two input primitive arrays.
/// But the given operation can return `None` if overflow is detected. For the case, this function
/// returns an `Err`.
fn math_checked_op<LT, RT, F>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this function any more because this is the same as try_binary after we updating the try_binary and the type of op

{
if left.len() != right.len() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remove this function but met some error from the type system. I will file a follow-up ticket to remove this.

@@ -2026,31 +1967,57 @@ mod tests {

#[test]
#[should_panic(expected = "DivideByZero")]
fn test_primitive_array_divide_by_zero_with_checked() {
fn test_int_array_divide_by_zero_with_checked() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the test case to make it clearer.

}

fn mul_wrapping(self, rhs: Self) -> Self {
self * rhs
}

fn div_checked(self, rhs: Self) -> Option<Self> {
Some(self / rhs)
fn div_checked(self, rhs: Self) -> Result<Self> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for float type, we also need to check the DivideByZero error, make this as the default behaviour.

@HaoYang670 HaoYang670 marked this pull request as ready for review September 14, 2022 10:05
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670
Copy link
Contributor Author

@viirya @liukun4515 @tustvold PTAL, thank you.

@tustvold
Copy link
Contributor

Currently running the benchmarks for this, just in case, LLVM is a fickle beast

@tustvold
Copy link
Contributor

The benchmarks regularly fluctuate by +- 25%, this is not all that surprising given we're talking microseconds, but importantly we are not seeing anything greater which would imply a failure to vectorise correctly or similar 👍

@tustvold tustvold merged commit f572ec1 into apache:master Sep 16, 2022
@ursabot
Copy link

ursabot commented Sep 16, 2022

Benchmark runs are scheduled for baseline = 43d912c and contender = f572ec1. f572ec1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

typed_dict_math_op!(
left,
right,
|a, b| a.div_checked(b),
Copy link
Member

@viirya viirya Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err...why you changed it to use div_checked...
divide_dyn is no-overflow checking kernel.

I'm adding checked variant in another PR.

Copy link
Contributor

@tustvold tustvold Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this fixes #2647 perhaps? I have to confess to being a little lost now, I had thought we'd collectively agreed that division didn't make sense to have an unchecked variant as Rust always checks division 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I meant overflow-checking. I think this change doesn't change division by zero behavior of divide_dyn kernel. It returns ArrowError::DivideByZero with/without this PR.

Copy link
Contributor

@tustvold tustvold Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would division overflow?

Copy link
Contributor

@tustvold tustvold Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're fine the definition of div_checked for integral types is

if rhs.is_zero() {
    Err(ArrowError::DivideByZero)
} else {
    self.checked_div(rhs).ok_or_else(|| {
        ArrowError::ComputeError(format!(
            "Overflow happened on: {:?} / {:?}",
            self, rhs
        ))
    })
}

The checked_div is technically redundant as we've already checked that the value isn't zero, i.e. this code could be simplified to

self.checked_div(rhs).ok_or_else(|| {
    ArrowError::DivideByZero
})

The definition for floating points is

if rhs.is_zero() {
    Err(ArrowError::DivideByZero)
} else {
    Ok(self / rhs)
}

Which will not check for overflow, as it isn't meaningful to do so for floats?

TLDR: I don't think overflow checking for division makes sense? The only types that could overlow, i.e. floats, already have a way to represent that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "checked" kernel, I meant overflow checking behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Op, I am wrong, sorry.
The divide_dyn just checked the dividebyzero error, but not the overflow error.

Copy link
Member

@viirya viirya Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I've fixed it in the PR to add checked variant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add another divide ops in the ArrowNativeTypeOp that only checks the dividebyzero error ? For example:

            // check dividebyzero and overflow
            fn div_fully_checked(self, rhs: Self) -> Result<Self> {
                if rhs.is_zero() {
                    Err(ArrowError::DivideByZero)
                } else {
                    self.checked_div(rhs).ok_or_else(|| {
                        ArrowError::ComputeError(format!(
                            "Overflow happened on: {:?} / {:?}",
                            self, rhs
                        ))
                    })
                }
            }
            // only check the dividebyzero error
           fn div_checked_divideByZero(self, rhs: Self) -> Result<Self> {}
            // no check
            fn div_wrapping(self, rhs: Self) -> Self {
                self.wrapping_div(rhs)
            }

Copy link
Member

@viirya viirya Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think divide_dyn is the kernel which only checks division by zero error.

edit: Oh, I misread your comment. But I don't think it is good idea to add too many APIs there. It is pretty easy to do extra division by zero check + wrapping division.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

try_binary should not panic on unequaled array length.
4 participants