refactor: Merge similar functions `ilike_scalar` and `nilike_scalar` #3303

askoa · 2022-12-08T22:46:56Z

Which issue does this PR close?

Part of #3296

tustvold

Might be worth just double checking the benchmarks for LLVM derp, but I'd be very surprised if this has any impact.

Thank you 😀

askoa · 2022-12-08T23:12:20Z

Might be worth just double checking the benchmarks for LLVM derp, but I'd be very surprised if this has any impact.

Do benchmarks run in PR checks? Or should I run it locally?

tustvold · 2022-12-08T23:13:26Z

Locally, I'm also happy to run them tomorrow morning

askoa · 2022-12-08T23:28:54Z

Locally, I'm also happy to run them tomorrow morning

That would be great!

tustvold · 2022-12-09T11:06:14Z

$ git checkout master
$ RUSTFLAGS="-C target-cpu=native" cargo bench --features="test_utils,dyn_cmp_dict" --bench comparison_kernels -- --save-baseline master ".*ilike"
...
$ git checkout like-duplication
$ RUSTFLAGS="-C target-cpu=native" cargo bench --features="test_utils,dyn_cmp_dict" --bench comparison_kernels -- --baseline master ".*ilike"

Resulted in

ilike_utf8 scalar equals
                        time:   [2.3886 ms 2.3895 ms 2.3904 ms]
                        change: [+6.2307% +6.2869% +6.3448%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

ilike_utf8 scalar contains
                        time:   [4.3580 ms 4.3595 ms 4.3610 ms]
                        change: [+5.0782% +5.1287% +5.1773%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

ilike_utf8 scalar ends with
                        time:   [2.3695 ms 2.3705 ms 2.3715 ms]
                        change: [+0.9025% +0.9541% +1.0075%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

ilike_utf8 scalar starts with
                        time:   [2.4795 ms 2.4811 ms 2.4827 ms]
                        change: [+5.6493% +5.7240% +5.8024%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking ilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, enable flat sampling, or reduce sample count to 50.
ilike_utf8 scalar complex
                        time:   [1.8226 ms 1.8233 ms 1.8240 ms]
                        change: [-4.7478% -4.5718% -4.3951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

nilike_utf8 scalar equals
                        time:   [2.4453 ms 2.4466 ms 2.4479 ms]
                        change: [+1.8160% +1.8795% +1.9439%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

nilike_utf8 scalar contains
                        time:   [4.4117 ms 4.4137 ms 4.4159 ms]
                        change: [+3.6819% +3.7352% +3.7913%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

nilike_utf8 scalar ends with
                        time:   [2.4475 ms 2.4486 ms 2.4498 ms]
                        change: [+2.9289% +2.9881% +3.0444%] (p = 0.00 < 0.05)
                        Performance has regressed.

nilike_utf8 scalar starts with
                        time:   [2.4008 ms 2.4014 ms 2.4021 ms]
                        change: [+0.7831% +0.8124% +0.8463%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking nilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
nilike_utf8 scalar complex
                        time:   [1.8052 ms 1.8061 ms 1.8071 ms]
                        change: [-6.1191% -5.9827% -5.7985%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

So this does represent a performance regression, I will investigate why

tustvold · 2022-12-09T11:24:14Z

Switching to use BooleanArray::from_unary made the regression go away, as well as simplifying the code 🎉

tustvold · 2022-12-09T11:26:56Z

arrow-string/src/like.rs

    } else if right.starts_with('%')
        && right.ends_with('%')
+        && !right.ends_with("\\%")


This appears to have been an omission from #2743

merge functions ilike_scalar and nilike_scalar

80ff584

github-actions bot added the arrow Changes to the arrow crate label Dec 8, 2022

tustvold approved these changes Dec 8, 2022

View reviewed changes

viirya approved these changes Dec 8, 2022

View reviewed changes

Use from_unary

82a0218

tustvold reviewed Dec 9, 2022

View reviewed changes

tustvold merged commit d11da24 into apache:master Dec 9, 2022

askoa deleted the like-duplication branch December 9, 2022 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Merge similar functions `ilike_scalar` and `nilike_scalar` #3303

refactor: Merge similar functions `ilike_scalar` and `nilike_scalar` #3303

askoa commented Dec 8, 2022

tustvold left a comment

askoa commented Dec 8, 2022

tustvold commented Dec 8, 2022

askoa commented Dec 8, 2022

tustvold commented Dec 9, 2022

tustvold commented Dec 9, 2022

tustvold Dec 9, 2022

refactor: Merge similar functions ilike_scalar and nilike_scalar #3303

refactor: Merge similar functions ilike_scalar and nilike_scalar #3303

Conversation

askoa commented Dec 8, 2022

Which issue does this PR close?

tustvold left a comment

Choose a reason for hiding this comment

askoa commented Dec 8, 2022

tustvold commented Dec 8, 2022

askoa commented Dec 8, 2022

tustvold commented Dec 9, 2022

tustvold commented Dec 9, 2022

tustvold Dec 9, 2022

Choose a reason for hiding this comment

refactor: Merge similar functions `ilike_scalar` and `nilike_scalar` #3303

refactor: Merge similar functions `ilike_scalar` and `nilike_scalar` #3303