Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast: should get the round result for decimal to a decimal with smaller scale #3139

Merged
merged 2 commits into from Nov 25, 2022

Conversation

liukun4515
Copy link
Contributor

Which issue does this PR close?

Closes #3137

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 19, 2022
@liukun4515
Copy link
Contributor Author

liukun4515 commented Nov 19, 2022

Now it just implement the case of decimal128 to decimal128.
If the method of implementation looks good to all, I will fill out other case and add more test cases

cc @viirya @tustvold

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should consistently use wrapping or checked add, neg, div, rem, etc... This not only is consistent with other kernels, but avoids differences between release and debug builds

Comment on lines 1968 to 1969
let d = v / div;
let r = v % div;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let d = v / div;
let r = v % div;
let d = v.wrapping_div(div);
let r = v.wrapping_rem(div);

let d = v / div;
let r = v % div;
if v >= 0 && r >= half {
d + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d + 1
d.wrapping_add(1)

if v >= 0 && r >= half {
d + 1
} else if v < 0 && r <= neg_half {
d - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d - 1
d.wrapping_sub(1)

@@ -1955,12 +1956,26 @@ fn cast_decimal_to_decimal_safe<const BYTE_WIDTH1: usize, const BYTE_WIDTH2: usi
// For example, input_scale is 4 and output_scale is 3;
// Original value is 11234_i128, and will be cast to 1123_i128.
let div = 10_i128.pow((input_scale - output_scale) as u32);
let half = div / 2;
let neg_half = half.neg();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let neg_half = half.neg();
let neg_half = half.wrapping_neg();

As we've divided by 2 this can't overflow

Comment on lines 2010 to 2011
// TODO: it's better to implement the neg
let neg_half = half * i256::from_i128(-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: it's better to implement the neg
let neg_half = half * i256::from_i128(-1);
let neg_half = half.wrapping_neg();

@liukun4515
Copy link
Contributor Author

liukun4515 commented Nov 22, 2022

I think this should consistently use wrapping or checked add, neg, div, rem, etc... This not only is consistent with other kernels, but avoids differences between release and debug builds

The changes i have done will not overflow.
It's good to make consistent between debug and release

@tustvold
Copy link
Contributor

Do you intend to switch to explicitly using wrapping / checked operations to ensure consistent behaviour across debug and release, and to be consistent with the other kernels?

@liukun4515
Copy link
Contributor Author

Do you intend to switch to explicitly using wrapping / checked operations to ensure consistent behaviour across debug and release, and to be consistent with the other kernels?

@tustvold

Sorry for the late reply, i forgot to push the changes.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a logical conflict with one of the tests for negative scales

@tustvold tustvold merged commit 187bf61 into apache:master Nov 25, 2022
@ursabot
Copy link

ursabot commented Nov 25, 2022

Benchmark runs are scheduled for baseline = 2c86895 and contender = 187bf61. 187bf61 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@liukun4515
Copy link
Contributor Author

I think there is a logical conflict with one of the tests for negative scales

hi @tustvold Can you give an example to explain the conflict.

From #3152, I know negative scala is supported in the Arrow.
Before this, I have not known the usage of negative scale.

@liukun4515 liukun4515 deleted the decimal_round_#3137 branch November 26, 2022 00:52
@liukun4515
Copy link
Contributor Author

liukun4515 commented Nov 26, 2022

Maybe I got your thought from this commit 2abbf89

But i need time to get the behavior of negative scale when we do cast in other system.

@liukun4515
Copy link
Contributor Author

the decimal(10,-1) with the 128-bit integer (123), the string of the value is 1230, if we cast it to the decimal(10,-2), what the 128-bit integer of result should be? @tustvold @viirya

@tustvold
Copy link
Contributor

123

@liukun4515
Copy link
Contributor Author

123

I am confused about this, if the data type is decimal(10,-2) and the 128-bit integer is 123, it represent the value of 12300, and the value has been changed after casting.

I think the 128-bit integer should be 12 after casted to decimal(10,-2).

From the doc: https://arrow.apache.org/docs/python/generated/pyarrow.decimal128.html#pyarrow-decimal128

decimal128(5, -3) can exactly represent the number 12345000 (encoded internally as the 128-bit integer 12345), but neither 123450000 nor 1234500.

@tustvold
Copy link
Contributor

Apologies I misread your example, if the integer value was 1230 casting would yield an integer value of 123, with the same string value. Casting an integer value of 123 with a corresponding string value of 1230 I would expect to result in an error, although #3203 would suggest something isn't quite right here yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should be the rounding vs truncation when cast decimal to smaller scale
4 participants