Inconsistent Dyn Scalar Kernels #2837

tustvold · 2022-10-06T16:22:10Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The _dyn_scalar comparison kernels accept a scalar value implementing num::ToPrimitive and then downcast the array, coercing the the scalar to the appropriate type. There are then further dyn_utf8_scalar and dyn_binary_scalar kernels to handle non-primitive arrays, these only handle arrays or dictionaries of arrays of the corresponding type

The _scalar_dyn arithmetic kernels are instead explicitly typed on ArrowNumericType, which is ArrowPrimitiveType with some SIMD gubbins, and accept the corresponding T::Native as a scalar argument. They then downcast to the expected PrimitiveArray or a DictionaryArray containing the corresponding PrimitiveArray

Not only is the naming inconsistent, the primitive scalar comparison kernels will perform coercion of primitive scalars, whereas the arithmetic kernels and other comparison kernels will not.

Describe the solution you'd like

I think the approach of the arithmetic kernels is the least surprising, and as an added bonus is significantly simpler to implement.

I would therefore like to propose adding new [eq | lt_eq | ...]_dyn_primitive_scalar comparison kernels, and deprecating the old [eq | lt_eq | ...]_dyn_scalar kernels, before removing them in a future release.

Describe alternatives you've considered

Additional context

The current use of ToPrimitive will likely complicate adding comparison support for decimal array (#2637) as ToPrimitive doesn't have a to_i256 method.

This may also help reduce compile times for the comparison kernels #2365 #1858

Thoughts @alamb @viirya

The text was updated successfully, but these errors were encountered:

viirya · 2022-10-06T17:54:50Z

The _scalar_dyn arithmetic kernels are instead explicitly typed on ArrowNumericType
I would therefore like to propose adding new [eq | lt_eq | ...]_dyn_primitive_scalar comparison kernels,

I think that one thing is after following _scalar_dyn arithmetic kernels such _dyn_primitive_scalar kernels will need type bound when calling. As the scalar is T:Native, and the compiler cannot infer T from the scalar type.

I think it is minor so just mention it here.

alamb · 2022-10-07T10:04:20Z

Renaming the kernels sounds good to me 👍

I would therefore like to propose adding new [eq | lt_eq | ...]_dyn_primitive_scalar comparison kernels, and deprecating the old [eq | lt_eq | ...]_dyn_scalar kernels, before removing them in a future release.

Would it be better to simply not have the dyn_primitive_scalar kernels and instead use docstrings or something else to show how to the kernels with primitives?

Here is where @matthewmturner and I cooked up the original API: #1074 which also gives some background on why we didn't go with the T::Native approach

alamb · 2022-10-07T10:04:44Z

Other related PRs: https://github.com/apache/arrow-rs/pulls?q=is%3Apr+author%3Amatthewmturner+is%3Aclosed

tustvold · 2022-10-07T10:30:31Z

Would it be better to simply not have the dyn_primitive_scalar kernels and instead use docstrings or something else to show how to the kernels with primitives?

The issue isn't documentation, but that the kernels are inconsistent in their behaviour with respect to the non-scalar kernels, and the arithmetic scalar kenels. If I try to add an Int32Array to a Float32Array I will get an error, however, currently if I try to add a f32 to an Int32Array it will coerce the float to an integer and add it. This at best rather surprising, at worst a subtle source of strange bugs.

Further the encoding using ToPrimitive cannot be generalised to types other than those supported by ToPrimitive, i.e. Rust built-in types. This effectively blocks implementing these kernels for i256, i.e. Decimal256.

which also gives some background on why we didn't go with the T::Native approach

The key comment appears to be #1074 (comment). Is there any possibility I might be able to shift your feeling on this matter? Perhaps we could have a synchronous chat? Looking at DataFusion, ScalarValue is already concretely typed to match the ArrowPrimitiveType and not ArrowNativeType, i.e. ScalarValue::TimestampMillisecond instead of ScalarValue::I32(_), and so this change shouldn't materially impact its complexity - it already knows what the concrete type should be

alamb · 2022-10-07T10:38:17Z

If #1074 (comment) causes issues for the rest of the implementation, I don't feel strongly about it

tustvold · 2022-10-07T11:41:07Z

Alternative proposal in #2842 - FWIW this would allow removing a lot of scalar dispatch logic from DataFusion

tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Oct 6, 2022

tustvold mentioned this issue Oct 6, 2022

Add NaN handling in dyn scalar comparison kernels #2830

Merged

tustvold mentioned this issue Oct 7, 2022

RFC: Encode Scalars as dyn Any in Scalar dyn Kernels #2842

Closed

tustvold mentioned this issue Apr 13, 2023

feat: Support dyn_compare_scalar for Decimal256 #4084

Merged

tustvold mentioned this issue Jun 1, 2023

Combine _utf8 and _binary kernels #4334

Closed

tustvold mentioned this issue Jun 9, 2023

Add Scalar/Datum abstraction (#1047) #4393

Merged

tustvold mentioned this issue Aug 24, 2023

Datum based like kernels (#4595) #4732

Merged

tustvold closed this as completed in #4732 Aug 25, 2023

tustvold mentioned this issue Sep 5, 2023

DynScalar abstraction (something that makes it easy to create scalar Datums) #4781

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Dyn Scalar Kernels #2837

Inconsistent Dyn Scalar Kernels #2837

tustvold commented Oct 6, 2022 •

edited

viirya commented Oct 6, 2022

alamb commented Oct 7, 2022

alamb commented Oct 7, 2022 •

edited

tustvold commented Oct 7, 2022 •

edited

alamb commented Oct 7, 2022

tustvold commented Oct 7, 2022

Inconsistent Dyn Scalar Kernels #2837

Inconsistent Dyn Scalar Kernels #2837

Comments

tustvold commented Oct 6, 2022 • edited

viirya commented Oct 6, 2022

alamb commented Oct 7, 2022

alamb commented Oct 7, 2022 • edited

tustvold commented Oct 7, 2022 • edited

alamb commented Oct 7, 2022

tustvold commented Oct 7, 2022

tustvold commented Oct 6, 2022 •

edited

alamb commented Oct 7, 2022 •

edited

tustvold commented Oct 7, 2022 •

edited