Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Encode Scalars as dyn Any in Scalar dyn Kernels #2842

Closed
tustvold opened this issue Oct 7, 2022 · 4 comments
Closed

RFC: Encode Scalars as dyn Any in Scalar dyn Kernels #2842

tustvold opened this issue Oct 7, 2022 · 4 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Oct 7, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As expanded upon in #2837 currently the encoding of scalars in dyn kernels is inconsistent. The arithmetic kernels must be explicitly type hinted with the ArrowPrimitiveType, whilst the comparison kernels coerce the scalar type to the desired type.

Taking a step back, users just want to give a kernel an array and a scalar of the correct type and have it work, with all the dispatch logic handled for it.

Describe the solution you'd like

I would like to suggest encoding scalars as & dyn std::any::Any, for example,

pub fn eq_scalar_dyn(
    left: &dyn Array,
    right: &dyn std::any::Any,
) -> Result<BooleanArray> {
    downcast_primitive_array! {
        left => match right.downcast_ref() {
            Some(r) => eq_scalar(left, *r),
            None => Err(ArrowError::ComputeError(format!("Failed to downcast scalar for {}", left.data_type())))
        },
        DataType::Boolean => match right.downcast_ref() {
            Some(r) => eq_bool_scalar(as_boolean_array(left), *r),
            None => Err(ArrowError::ComputeError(format!("Failed to downcast scalar for {}", left.data_type())))
        }
        DataType::Utf8 => match right.downcast_ref() {
            Some(r) => eq_utf8_scalar(as_string_array(left), *r),
            None => Err(ArrowError::ComputeError(format!("Failed to downcast scalar for {}", left.data_type())))
        }
        DataType::Dictionary(_, _) => downcast_dictionary_array! {
            left => {
                let dict_compare = eq_scalar_dyn(left.values().as_ref(), right)?;
                unpack_dict_comparison(left, dict_compare)
            }
            _ => unreachable!()
        }
        t => Err(ArrowError::ComputeError(format!("Unsupported datatype for eq_dyn_scalar_any {}", t)))
    }
}

Users can then write

eq_scalar_dyn(array, &false)?;
eq_scalar_dyn(array, &3_i32)?;
eq_scalar_dyn(array, &5.2_f32)?;
eq_scalar_dyn(array, &"hello")?;

This has the following advantages:

  • The dyn kernels don't require any type hinting
  • The kernels don't perform any implicit type coercion

The only major downside is that users will need to be careful to correctly hint scalar types, i.e. &1_i64.

Describe alternatives you've considered

We could not do this

Additional context

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Oct 7, 2022
@alamb
Copy link
Contributor

alamb commented Oct 7, 2022

I think this sounds like a great idea. I didn't realize Any was implemented for the basic types

@viirya
Copy link
Member

viirya commented Oct 7, 2022

Hm, will this affect performance?

@tustvold
Copy link
Contributor Author

tustvold commented Oct 7, 2022

I wouldn't expect it to have a material performance impact. It will perform a downcast of the scalar value once when it downcasts the array, and then proceed as normal

@tustvold
Copy link
Contributor Author

tustvold commented Apr 1, 2023

One wrinkle the comes up here is that we need a way to encode the data type of the scalar, e.g. to allow adding both interval and duration scalars of varying types to temporal types such as dates and timestamps. We may need to introduce a Scalar trait 🤔

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants