Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Decimal128 to DataType::is_numeric and clean the numeric casting code #2621

Closed
wants to merge 2 commits into from

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Sep 1, 2022

Which issue does this PR close?

Closes #2611.

Rationale for this change

We try to match the is_numeric function in the Datafusion (aka, adding Float16 and Decimal128). However, as Float16 is not supported by some upstream libraries (such as lexical_core::FromLexical), we can't add it now. (This is not a big deal because Float16 is a minor use case). Decimal128 could be added successfully.

What changes are included in this PR?

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 1, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure about this, as I'm not entirely sure how these methods are being used. But Decimal is distinct from Float32, Int16, etc... in that it isn't a PrimitiveArray. I'm not sure if this is or isn't an issue

@HaoYang670
Copy link
Contributor Author

HaoYang670 commented Sep 2, 2022

Unfortunately, I don't find a definition of which types are numeric in the Arrow format: https://github.com/apache/arrow/blob/master/format/Schema.fbs

We have a trait ArrowNumbericType in arrow-rs, but there is no doc to describe what is it and why we need it. (Maybe @jorgecarleitao could provide more background knowledge)

In datafusion, the definition of is_numeric is here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/expr/src/binary_rule.rs#L392-L414. It covers the types in DataType::is_numeric and Float16 and Decimal128.

This PR is trying to provide a consistent definition (whatever the definition is) of DataType::is_numeric and is_numeric in the Datafusion, so that we could remove the is_numeric in Datafusion and developers will not get confused when choosing between the two.

@tustvold
Copy link
Contributor

tustvold commented Sep 2, 2022

so that we could remove the is_numeric in Datafusion

In the context of DataFusion what is it used to determine?

@HaoYang670
Copy link
Contributor Author

@liukun4515
Copy link
Contributor

so that we could remove the is_numeric in Datafusion

In the context of DataFusion what is it used to determine?

Datafusion and arrow-rs are two different system, in the datafusion some cases or operations are not supported now, but that operations maybe supported in the arrow-rs.

We just can consider the arrow-rs, and make the consistent in the arrow-rs.
@HaoYang670 @tustvold

@HaoYang670
Copy link
Contributor Author

Datafusion and arrow-rs are two different system, in the datafusion some cases or operations are not supported now, but that operations maybe supported in the arrow-rs.

Thank you, @liukun4515. At least, I think we should rename the is_numeric in the Datafusion, so that developers would not misuse the two functions.

@kmitchener
Copy link

FWIW, I think this PR is correct and it would be nice to get in the next release. DataFusion should depend on these functions and remove it's own similarly-named functions. I'd be happy to do a follow-on issue for DF to clean that up.

In DF, these functions are only used for determining type to type coercion rules, but then it uses arrow-rs's cast functions to do the actual conversion .. shouldn't the coercion rules also be in arrow-rs which also manages the types, the type implementations, and the casting? There's a can_cast_types() which is either what we need already, or a similar function like can_cast_types_without_loss().

@tustvold
Copy link
Contributor

tustvold commented Sep 23, 2022

I think my major reservation with this as it stands is that at least currently the decimal support in arrow-rs is extremely limited, I think once we have all the operations for decimals as other "numerics" I would be more happy with this change. Otherwise I think it is confusing to label decimals as numeric types, when they aren't properly supported?

@alamb
Copy link
Contributor

alamb commented Oct 3, 2022

What ever happed with this PR? Is there consensus on next steps?

@alamb alamb marked this pull request as draft October 12, 2022 19:36
@alamb
Copy link
Contributor

alamb commented Oct 12, 2022

Converting to draft until we decide on next steps

@tustvold
Copy link
Contributor

Closed by #3121

@tustvold tustvold closed this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataType::is_numeric should match the is_numeric function in Datafusion.
5 participants