support decimal scalar value #1394

liukun4515 · 2021-12-03T02:29:21Z

Which issue does this PR close?

Closes #1393

From #122 In order to support the decimal data type in the datafusion, we should add decimal scalar value first.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

liukun4515 · 2021-12-03T04:01:57Z

#[allow(clippy::box_vec)]
I am confused about the failed ci

liukun4515 · 2021-12-03T04:02:40Z

@Dandandan @alamb @houqp PTAL

alamb · 2021-12-03T15:28:54Z

I am confused about the failed ci

this is likely due to a new rust version being released with more stringent clippy lint rules

Thankfully @xudong963 has a PR up to get it working again: #1395

alamb

This is looking good @liukun4515 -- thank you. I had a few suggestions, but overall this looks just about perfect.

alamb · 2021-12-03T15:31:02Z

datafusion/src/scalar.rs

@@ -43,6 +43,8 @@ pub enum ScalarValue {
    Float32(Option<f32>),
    /// 64bit float
    Float64(Option<f64>),
+    /// 128bit decimal, using the i128 to represent the decimal
+    Decimal128(Option<i128>, Option<usize>, Option<usize>),


🤔 it seems like we always need the precision and scale (to know what type the Decimal is).

Thus, perhaps this could be changed to be

Suggested change

Decimal128(Option<i128>, Option<usize>, Option<usize>),

Decimal128(Option<i128>, usize, usize),

nice point.
I almost forgot this.

alamb · 2021-12-03T15:35:48Z

datafusion/src/scalar.rs

+        scale: usize,
+    ) -> Result<Self> {
+        // make sure the precision and scale is valid
+        // TODO const the max precision and min scale


The arrow spec doesn't seem to say anything about min/max valid scales: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L181-L190

I poked a little around in arrow-rs and I also didn't find any limits on the precision or scale either

If we use 128bit to represent a decimal value, the max precision is 38.
The scale is always greater or equal to 0 and less than or equal to precision.
In the arrow-rs, the decimal is represented by i128 with precision and scale.
So we can add the MAX_PRECISION = 38 in the datafusion.

Just to make sure I got this correctly.
The spec doesn't say anything about the the precision/scale of DECIMAL, but we're implementing DECIMAL128 here so 38 should be the max precision.
After apache/arrow-rs#131 is implemented, we can implement DECIMAL256 type in datafusion and the max precision will be 76.

Yes, now in the arrow-rs the decimal256 has not been implemented.
In the arrow-go, it is also not implemented too.
Just in the arrow-java and arrow-c++, the bitwith of 256 has been implemented.
@capkurmagati

we can make this const in other pull request

alamb · 2021-12-03T15:40:42Z

datafusion/src/scalar.rs

@@ -100,6 +102,12 @@ impl PartialEq for ScalarValue {
        // any newly added enum variant will require editing this list
        // or else face a compile error
        match (self, other) {
+            (Decimal128(v1, p1, s1), Decimal128(v2, p2, s2)) => {
+                v1.eq(v2) && p1.eq(p2) && s1.eq(s2)
+                // TODO how to handle this case: decimal(123,10,1) with decimal(1230,10,2)


Int8(Some(100)) is not equal to Int64(Some(100)) with this implementation either , so I think it is consistent with the rest of this comparison of decimal(123,10,1) and decimal(1230,10,2)` are different --

yep.
If we want to compare two values with the different data types, you should convert them to the same data type and then compare them.

alamb · 2021-12-03T15:41:58Z

datafusion/src/scalar.rs

@@ -171,6 +179,17 @@ impl PartialOrd for ScalarValue {
        // any newly added enum variant will require editing this list
        // or else face a compile error
        match (self, other) {
+            // TODO decimal type, we just compare the values which have the same precision and scale.


I think the code in this PR is correct -- two decimal's can be compared if they have the same precision / scale and they can't be compared (returns None) if they have different precision/scales).

alamb · 2021-12-03T15:42:29Z

datafusion/src/scalar.rs

+                p.hash(state);
+                s.hash(state)


I think we could probably skip hashing the precision and scale - and just hash the value

If two decimal values have the same value and diff precision or scale, the hash value is the same.
But if we hash all data in the decimal, the above case will not happen.
It is better to hash all data in decimal.

That is a good point 👍

alamb · 2021-12-03T15:42:56Z

datafusion/src/scalar.rs

+            ScalarValue::Decimal128(_, _, _) => {
+                // TODO add the default precision and scale for this case
+                // DataType::Decimal(38, 0)
+                panic!("The Decimal Scalar value with invalid precision or scale.");


I think if you changed Decimal as suggested above to always have precision and scale, this code would not be possible

alamb · 2021-12-03T15:55:13Z

datafusion/src/scalar.rs

+                ScalarValue::Decimal128(v1, _, _) => v1,
+                _ => unreachable!(),
+            })
+            .collect::<Vec<Option<i128>>>();


Using DecimalBuilder in this way is awkward. I wonder if we could add a function to create Decimal values from an array of i128 (likely it would go in arrow-rs). Something like

let arr = DecimalArray::from_iter_and_scale(array.into_iter(), precision, scale)

Yes, we can add the from_iter_and_scale function in arrow-rs later and replace this in the followup pull request.
Do you agree this? @alamb

related pr: apache/arrow-rs#1009

alamb · 2021-12-03T15:56:35Z

datafusion/src/scalar.rs

+        scale: &usize,
+    ) -> ScalarValue {
+        let array = array.as_any().downcast_ref::<DecimalArray>().unwrap();
+        // TODO add checker: the precision and scale are same with array


this would be a good assert! type check to add, though I don't think it would ever happen unless there is some bug

…mal_scalar

liukun4515 · 2021-12-05T06:20:36Z

@alamb I have addressed all comments.
If there are no other comments, please help to merge this pr.
I will propose other pull requests about decimal, for example aggregate with decimal data type.

alamb

This looks great -- thank you @liukun4515

alamb · 2021-12-06T23:48:53Z

datafusion/src/scalar.rs

@@ -43,6 +43,8 @@ pub enum ScalarValue {
    Float32(Option<f32>),
    /// 64bit float
    Float64(Option<f64>),
+    /// 128bit decimal, using the i128 to represent the decimal
+    Decimal128(Option<i128>, usize, usize),


alamb · 2021-12-06T23:49:18Z

datafusion/src/scalar.rs

+                p.hash(state);
+                s.hash(state)


That is a good point 👍

github-actions bot added the datafusion Changes in the datafusion crate label Dec 3, 2021

liukun4515 force-pushed the only_support_decimal_scalar branch from ce0150f to 9aa8f11 Compare December 3, 2021 02:37

liukun4515 marked this pull request as ready for review December 3, 2021 03:31

liukun4515 force-pushed the only_support_decimal_scalar branch from 9aa8f11 to 0c3506d Compare December 3, 2021 03:31

alamb reviewed Dec 3, 2021

View reviewed changes

liukun4515 mentioned this pull request Dec 5, 2021

Add creator from Iterator of i128 to get the decimalarray apache/arrow-rs#1009

Closed

liukun4515 added 2 commits December 5, 2021 10:49

support decimal scalar value

76c224c

Merge remote-tracking branch 'upstream/master' into only_support_deci…

bbc0a8f

…mal_scalar

liukun4515 force-pushed the only_support_decimal_scalar branch from 0c3506d to bbc0a8f Compare December 5, 2021 03:00

liukun4515 requested a review from alamb December 6, 2021 13:56

alamb approved these changes Dec 6, 2021

View reviewed changes

alamb merged commit 7f24a79 into apache:master Dec 6, 2021

liukun4515 mentioned this pull request Apr 11, 2022

Implement DECIMAL type #122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support decimal scalar value #1394

support decimal scalar value #1394

liukun4515 commented Dec 3, 2021 •

edited

liukun4515 commented Dec 3, 2021

liukun4515 commented Dec 3, 2021

alamb commented Dec 3, 2021

alamb left a comment

alamb Dec 3, 2021

liukun4515 Dec 5, 2021

alamb Dec 3, 2021

liukun4515 Dec 5, 2021

capkurmagati Dec 5, 2021

liukun4515 Dec 5, 2021

liukun4515 Dec 6, 2021

alamb Dec 3, 2021

liukun4515 Dec 5, 2021

alamb Dec 3, 2021

alamb Dec 3, 2021

liukun4515 Dec 5, 2021

alamb Dec 6, 2021

alamb Dec 3, 2021

alamb Dec 3, 2021

liukun4515 Dec 5, 2021

liukun4515 Dec 5, 2021

alamb Dec 3, 2021

liukun4515 commented Dec 5, 2021

alamb left a comment

alamb Dec 6, 2021

alamb Dec 6, 2021

	Decimal128(Option<i128>, Option<usize>, Option<usize>),
	Decimal128(Option<i128>, usize, usize),

support decimal scalar value #1394

support decimal scalar value #1394

Conversation

liukun4515 commented Dec 3, 2021 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

liukun4515 commented Dec 3, 2021

liukun4515 commented Dec 3, 2021

alamb commented Dec 3, 2021

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 commented Dec 5, 2021

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 commented Dec 3, 2021 •

edited