Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use take for dictionary like comparisons #3313

Merged
merged 4 commits into from Dec 9, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 9, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

Other comparison kernels perform an operation on the underlying child array, before taking the result based on the dictionary indexes. Especially for more expensive operations like those involving strings, this can yield significant returns. As an added bonus it results in less codegen

eq dictionary[10] string[4])
                        time:   [358.35 µs 358.43 µs 358.53 µs]
                        change: [-0.9058% -0.7716% -0.5611%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

eq_dyn_utf8_scalar dictionary[10] string[4])
                        time:   [53.265 µs 53.284 µs 53.306 µs]
                        change: [-12.029% -11.781% -11.537%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

gt_eq_dyn_utf8_scalar scalar dictionary[10] string[4])
                        time:   [113.34 µs 113.36 µs 113.39 µs]
                        change: [-5.2402% -4.9959% -4.7432%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

like_utf8_scalar_dyn dictionary[10] string[4])
                        time:   [53.307 µs 53.326 µs 53.347 µs]
                        change: [-80.849% -80.811% -80.757%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

ilike_utf8_scalar_dyn dictionary[10] string[4])
                        time:   [53.972 µs 53.990 µs 54.013 µs]
                        change: [-97.799% -97.796% -97.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

@tustvold tustvold requested a review from viirya December 9, 2022 17:18
@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 9, 2022
@@ -214,7 +215,10 @@ pub fn like_utf8_scalar_dyn(
DataType::Dictionary(_, _) => {
downcast_dictionary_array!(
left => {
like_dict_scalar(left, right)
let dict_comparison = like_utf8_scalar_dyn(left.values().as_ref(), right)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication is slightly unfortunate, but will hopefully get cleaned up as part of #3296

@@ -42,6 +42,7 @@ arrow-buffer = { version = "28.0.0", path = "../arrow-buffer" }
arrow-data = { version = "28.0.0", path = "../arrow-data" }
arrow-schema = { version = "28.0.0", path = "../arrow-schema" }
arrow-array = { version = "28.0.0", path = "../arrow-array" }
arrow-select = { version = "28.0.0", path = "../arrow-select" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat unfortunate, but arrow-select is fairly foundational so I'm not too worried

@tustvold tustvold changed the title Use take for like comparisons Use take for dictionary like comparisons Dec 9, 2022
/// [`StringArray`]/[`LargeStringArray`] and a scalar.
///
/// See the documentation on [`like_utf8`] for more details.
fn like_dict_scalar<K: ArrowPrimitiveType>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tustvold tustvold merged commit f078aed into apache:master Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants