Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arithmetic and comparison kernels from DF + small modifications #2770

Closed

Conversation

kmitchener
Copy link

@kmitchener kmitchener commented Sep 22, 2022

Which issue does this PR close?

Closes #1200.
Closes #2766.

Rationale for this change

Per issues, these kernels belong in this project, not in DF. The goal is to get these into arrow-rs where they belong and then do further iteration to improve Decimal128 support.

What changes are included in this PR?

  • Copies the decimal arithmetic and comparison kernels
  • Small (but useful) changes to the decimal division so that it uses the num crate's BigInt to do division, instead of f64, which aren't suitable for decimal division.

Are there any user-facing changes?

addition of these kernels to the public API

@kmitchener
Copy link
Author

kmitchener commented Sep 22, 2022

I'll note that I seem to have picked the worst time to try to move these kernels here, as things aren't working correctly right now. However, I tested this change with DF and this revision of arrow-rs https://github.com/apache/arrow-rs/tree/5e83ef9cc7e426171f4cb9451fa004c55c7c95be and everything compiled and all tests passed in both DF and arrow-rs.

@liukun4515 for your review. I'll add a similar draft PR for DataFusion so you can see both sides of the move.

Other half of this PR is in apache/datafusion#3590

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 22, 2022
@kmitchener
Copy link
Author

Never mind, everything is working fine. I've moved all the DF decimal kernel code here and copied the tests over into the correct locations. Ready for review.

@kmitchener kmitchener marked this pull request as ready for review September 22, 2022 18:10
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you. A couple of thoughts

  • Some doc comments would be nice
  • The overflow semantics of these are a bit confused, the operations themselves are not checked but with_precision_and_scale checks for overflow in the result (Decimal Precision Validation #2387 is likely related)
  • Some benchmarks would be good
  • We should probably look to use the binary and try_binary kernels as the performance will be significantly better (might need Replace DecimalArray with PrimitiveArray #2637 or some other shenanigans)

Ok(())
}

// #[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these tests commented out, I presume because they aren't working as expected?

@tustvold
Copy link
Contributor

I think this is only part of #1200 as it doesn't hook up the kernels in the dyn kernels

) -> Result<Decimal128Array> {
let divide = 10_i128.pow(left.scale() as u32);
let array = arith_decimal_scalar(left, right, |left, right| {
let result = BigInt::from(left) * BigInt::from(right) / divide;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use i128 for this? The performance should be better than a variable width encoding?

@liukun4515
Copy link
Contributor

I will review it tomorrow, please hold this pr.

@kmitchener
Copy link
Author

kmitchener commented Sep 23, 2022

As it is, I've only copied over what was in DataFusion, but I think that's not suitable for arrow-rs's public APIs. Specifically, the math takes rust primitives (i128) with no checks on precision or scale or any indication what the precision or scale might or should be for the scalar.

Also, as a user of arrow-rs, you'd expect to be able to use the same functions to do math on decimals as you do all other number types, but you can't, because they're fundamentally different types here.

Is there a way to make DecimalArray a PrimitiveArray?

@tustvold
Copy link
Contributor

tustvold commented Sep 23, 2022

Is there a way to make DecimalArray a PrimitiveArray?

Yes (see #2637), and this would definitely be the best path forward if it is tractable. I can try to find some time over the weekend to have a play around with this. I have some ideas, but I'm loathe to subject someone else to my potentially half-baked ideas 😅

@liukun4515
Copy link
Contributor

Is there a way to make DecimalArray a PrimitiveArray?

Yes (see #2637), and this would definitely be the best path forward if it is tractable. I can try to find some time over the weekend to have a play around with this. I have some ideas, but I'm loathe to subject someone else to my potentially half-baked ideas 😅

Do we need to hold this pr and implement them by the proposal in #2637?

@tustvold
Copy link
Contributor

My personal preference would to hold off on this if the consensus is to move forward with making DecimalArray a PrimitiveArray. However, this may not be ready in time for this weeks release, so if there is urgency to add this functionality I don't wish to block it.

@kmitchener
Copy link
Author

No, there's no urgency in that way. I'll close this PR in favor of the more complete plan for decimal implementation in the other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
3 participants