Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Simplify decimal (#2440) #2477

Merged
merged 7 commits into from Aug 18, 2022
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 17, 2022

Which issue does this PR close?

Closes #2440

Rationale for this change

Whilst reviewing #2439 I felt something was a bit off, this was my quick attempt to simplify things. If we are going to do this, we should probably get it in before the release on Thursday.

What changes are included in this PR?

Rather than using const generic, we use named type structs. This is the same approach we use elsewhere in the codebase, with Int32Type, etc...

Are there any user-facing changes?

Yes, this changes the low-level details of how decimal arrays are handled

@tustvold tustvold requested review from viirya and liukun4515 and removed request for viirya August 17, 2022 10:46
@tustvold tustvold mentioned this pull request Aug 17, 2022
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 17, 2022
@tustvold tustvold requested review from viirya and alamb August 17, 2022 10:47
`DecimalIter` iterates `Decimal128` values as i128 values. \
This is kept mostly for back-compatibility purpose. Suggests to use `Decimal128Array.iter()` \
that returns `Decimal128Iter`.")]
pub struct DecimalIter<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to reuse this name, as BasicDecimal is confusing. It isn't vital we remove it, but I think it is better to not be stuck on a stale name

}

impl<const BYTE_WIDTH: usize> BasicDecimal<BYTE_WIDTH> {
#[allow(clippy::type_complexity)]
const MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE: (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the main thing that felt a bit off, we constrain the permitted constants here, but then also seal the trait in #2439. The whole thing just felt a little esoteric and hard to follow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing I find is that, the rust compiler does the constant evaluation lazily. For example,

BasicDecimal<1>::try_new(...);

will cause a compiler error: "invalid byte length" because the constants we defined in the impl BasicDecimal are used in the try_new function. However, hackers can run

BasicDecimal<1>::new(...);

without neither compiler error nor runtime error, because no constant is used in this method, so the compiler will not evaluate these constant.

In a word, a hacker can successfully use an invalid decimal type as long as they never touch the constants defined in the impl BasicDecimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is part of what makes me feel that perhaps using const generics in this way doesn't provide the best UX

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the truth is that, the rust compiler compiles all generic types lazily.
It will only compile the types and methods that are used to avoid large binary output.

@HaoYang670
Copy link
Contributor

Personally, I think what makes implementing Decimal elegantly a hard thing is that the behavior of Decimal is in the middle of trait bound and generic type:

Trait (Different structure, same behaviour).
/\
 |
\/
Decimal128/256  (very similar structure, but not similar behaviour)  
/\
 |
\/
Generic Type (silmilar structure, similar behaviour)

So far, a consensus is that whatever the choice is, we have to implement some part of Decimal128 and Decimal256 separately (although macro can help to simplify this). (For example, the validation function) As currently, Decimal128 is binding to i128 and Decimal256 is binding to [u8; 32] or BigInt.

@tustvold
Copy link
Contributor Author

tustvold commented Aug 17, 2022

we have to implement some part of Decimal128 and Decimal256 separately

Perhaps, although the lower you push the differences the more code can be shared. To ground this concretely in what this PR does, the validation logic could be placed on DecimalType, with everything else generic. Or to put it another way, this PR doesn't lose any flexibility over using const generics, but is simpler and more easily extensible.

As currently, Decimal128 is binding to i128 and Decimal256 is binding to [u8; 32] or BigInt.

This is actually a perfect example of why concrete types are a better fit imo than const generics, as the concrete binding type could be expressed as an associated type. This PR doesn't currently do this, as it doesn't need to, but we could easily

@HaoYang670
Copy link
Contributor

Thank you, @tustvold, I am learning your implementation carefully 😁.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR makes a of sense to me, but I haven't spent a lot of time in the DecimalArray area and I don't really have a strong usecase personally or professionally for this code.

Thus I would defer to @viirya @liukun4515 and @HaoYang670 who have worked in this area more recently or have more directly usecases

@@ -68,24 +71,24 @@ use crate::util::decimal::{BasicDecimal, Decimal256};
/// assert_eq!(6, decimal_array.scale());
/// ```
///
pub type Decimal128Array = BasicDecimalArray<16>;
pub type Decimal128Array = DecimalArray<Decimal128Type>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly a more consistent with the other Array types

@liukun4515
Copy link
Contributor

I will review this today later

@@ -455,6 +459,68 @@ impl Date64Type {
}
}

mod private {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, use this method to seal the decimal data type.
Other type can't implement the DecimalType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a suggestion or a statement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just statement not suggestion

pub use self::array_decimal::Decimal128Array;
pub use self::array_decimal::Decimal256Array;
pub use self::array_decimal::DecimalArray;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the generic representation of the array, akin to PrimitiveArray

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
It is a great PR.
Thanks, @tustvold

@liukun4515
Copy link
Contributor

If this merged, I can go on this #2357 work

Copy link
Contributor

@HaoYang670 HaoYang670 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I couldn't find even a better way to implement Decimal, but you have done it @tustvold. Amazing work.

Just a nit.
Maybe we could add more docs to explain the relationship between Decimal, DecimalType, Decimal128Type, Decimal256Type and NativeDecimalType, so that other developers can take less time to understand the code.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a more consistent way to align with existing ones like PrimitiveArray. Nice move.

@tustvold tustvold merged commit 15f42b2 into apache:master Aug 18, 2022
@tustvold
Copy link
Contributor Author

Thank you all for reviewing 😄

@ursabot
Copy link

ursabot commented Aug 18, 2022

Benchmark runs are scheduled for baseline = e60eef3 and contender = 15f42b2. 15f42b2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

There is no compiler error when using an invalid Decimal type.
6 participants