Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Dyn Scalar Kernels #2837

Closed
tustvold opened this issue Oct 6, 2022 · 6 comments · Fixed by #4732
Closed

Inconsistent Dyn Scalar Kernels #2837

tustvold opened this issue Oct 6, 2022 · 6 comments · Fixed by #4732
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Oct 6, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The _dyn_scalar comparison kernels accept a scalar value implementing num::ToPrimitive and then downcast the array, coercing the the scalar to the appropriate type. There are then further dyn_utf8_scalar and dyn_binary_scalar kernels to handle non-primitive arrays, these only handle arrays or dictionaries of arrays of the corresponding type

The _scalar_dyn arithmetic kernels are instead explicitly typed on ArrowNumericType, which is ArrowPrimitiveType with some SIMD gubbins, and accept the corresponding T::Native as a scalar argument. They then downcast to the expected PrimitiveArray or a DictionaryArray containing the corresponding PrimitiveArray

Not only is the naming inconsistent, the primitive scalar comparison kernels will perform coercion of primitive scalars, whereas the arithmetic kernels and other comparison kernels will not.

Describe the solution you'd like

I think the approach of the arithmetic kernels is the least surprising, and as an added bonus is significantly simpler to implement.

I would therefore like to propose adding new [eq | lt_eq | ...]_dyn_primitive_scalar comparison kernels, and deprecating the old [eq | lt_eq | ...]_dyn_scalar kernels, before removing them in a future release.

Describe alternatives you've considered

Additional context

The current use of ToPrimitive will likely complicate adding comparison support for decimal array (#2637) as ToPrimitive doesn't have a to_i256 method.

This may also help reduce compile times for the comparison kernels #2365 #1858

Thoughts @alamb @viirya

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Oct 6, 2022
@viirya
Copy link
Member

viirya commented Oct 6, 2022

The _scalar_dyn arithmetic kernels are instead explicitly typed on ArrowNumericType
I would therefore like to propose adding new [eq | lt_eq | ...]_dyn_primitive_scalar comparison kernels,

I think that one thing is after following _scalar_dyn arithmetic kernels such _dyn_primitive_scalar kernels will need type bound when calling. As the scalar is T:Native, and the compiler cannot infer T from the scalar type.

I think it is minor so just mention it here.

@alamb
Copy link
Contributor

alamb commented Oct 7, 2022

Renaming the kernels sounds good to me 👍

I would therefore like to propose adding new [eq | lt_eq | ...]_dyn_primitive_scalar comparison kernels, and deprecating the old [eq | lt_eq | ...]_dyn_scalar kernels, before removing them in a future release.

Would it be better to simply not have the dyn_primitive_scalar kernels and instead use docstrings or something else to show how to the kernels with primitives?

Here is where @matthewmturner and I cooked up the original API: #1074 which also gives some background on why we didn't go with the T::Native approach

@alamb
Copy link
Contributor

alamb commented Oct 7, 2022

@tustvold
Copy link
Contributor Author

tustvold commented Oct 7, 2022

Would it be better to simply not have the dyn_primitive_scalar kernels and instead use docstrings or something else to show how to the kernels with primitives?

The issue isn't documentation, but that the kernels are inconsistent in their behaviour with respect to the non-scalar kernels, and the arithmetic scalar kenels. If I try to add an Int32Array to a Float32Array I will get an error, however, currently if I try to add a f32 to an Int32Array it will coerce the float to an integer and add it. This at best rather surprising, at worst a subtle source of strange bugs.

Further the encoding using ToPrimitive cannot be generalised to types other than those supported by ToPrimitive, i.e. Rust built-in types. This effectively blocks implementing these kernels for i256, i.e. Decimal256.

which also gives some background on why we didn't go with the T::Native approach

The key comment appears to be #1074 (comment). Is there any possibility I might be able to shift your feeling on this matter? Perhaps we could have a synchronous chat? Looking at DataFusion, ScalarValue is already concretely typed to match the ArrowPrimitiveType and not ArrowNativeType, i.e. ScalarValue::TimestampMillisecond instead of ScalarValue::I32(_), and so this change shouldn't materially impact its complexity - it already knows what the concrete type should be

@alamb
Copy link
Contributor

alamb commented Oct 7, 2022

If #1074 (comment) causes issues for the rest of the implementation, I don't feel strongly about it

@tustvold
Copy link
Contributor Author

tustvold commented Oct 7, 2022

Alternative proposal in #2842 - FWIW this would allow removing a lot of scalar dispatch logic from DataFusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants