Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Binary Dictionary Operations Optional #4386

Closed
tustvold opened this issue Nov 27, 2022 · 1 comment · Fixed by #4999
Closed

Make Binary Dictionary Operations Optional #4386

tustvold opened this issue Nov 27, 2022 · 1 comment · Fixed by #4999
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@tustvold
Copy link
Contributor

tustvold commented Nov 27, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The dyn_cmp_dict and dyn_arith_dict features of arrow enable binary operations involving dictionary arrays and other dictionary arrays or scalar arrays.

They are, however, extremely expensive both from a compilation time, and code size perspective. As the kernels must be generated for the combinatorial explosion of all dictionary key and value types. See apache/arrow-rs#2596 and apache/arrow-rs#2760.

They are also exceedingly rare in practice, as almost all queries instead use the scalar variant, i.e. add scalar value to dictionary, compare dictionary against scalar value, etc...

Describe the solution you'd like

I would like a feature flag, e.g. binary_dict_op, that is not enabled by default, and enables the arrow features. The three or so tests that happen to need this, can then be gated on this feature flag or possibly updated to not need it.

Describe alternatives you've considered

We could not do this

Additional context

The feature was initially enabled as a dev dependency in #3363 by @avantgardnerio

This was then updated as physical-expr dev dependency in #4163 by @isidentical

It was then enabled as a non-dev dependency in #4168 by @retikulum possibly unintentionally?

@tustvold tustvold added the enhancement New feature or request label Nov 27, 2022
@retikulum
Copy link
Contributor

Hi. I added this on purpose (but without knowing it is extremely expensive) to pass test_dictionary_type_to_array_coersion test case. The following error was generated before enabling it:

Error: ArrowError(CastError("Comparing array of type Dictionary(Int32, Utf8) with array of type Dictionary(Int32, Utf8) requires \"dyn_cmp_dict\" feature"))

It seems great to me for enabling the feature flag rather than enabling it by default. Thanks for noticing me.

@tustvold tustvold added help wanted Extra attention is needed good first issue Good for newcomers labels Nov 30, 2022
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jan 20, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jan 20, 2023
tustvold added a commit that referenced this issue Jan 24, 2023
* Add dictionary_expresions feature (#4386)

* Toml format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants