Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve regex related kernels by upto 85% #3192

Merged
merged 1 commit into from Nov 25, 2022

Conversation

psvri
Copy link
Contributor

@psvri psvri commented Nov 25, 2022

Which issue does this PR close?

NA

Rationale for this change

Improves regex related kernels by a lot.

What changes are included in this PR?

The regex crate was complied without the perf flag. Due to this we were missing a lot of performance.

It seems to be an unintended side effect of #1876 .

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 25, 2022
@psvri
Copy link
Contributor Author

psvri commented Nov 25, 2022

Like kernel improvements

like_utf8 scalar equals time:   [379.59 µs 379.62 µs 379.66 µs]
                      change: [-0.2632% -0.1372% -0.0478%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low mild
1 (1.00%) high mild
8 (8.00%) high severe

like_utf8 scalar contains
                      time:   [1.9998 ms 2.0014 ms 2.0031 ms]
                      change: [+0.1456% +0.2614% +0.3704%] (p = 0.00 < 0.05)
                      Change within noise threshold.

like_utf8 scalar ends with
                      time:   [358.22 µs 358.30 µs 358.40 µs]
                      change: [+0.0299% +0.0737% +0.1209%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
4 (4.00%) high mild
7 (7.00%) high severe

like_utf8 scalar starts with
                      time:   [379.94 µs 380.12 µs 380.34 µs]
                      change: [+0.0506% +0.1122% +0.1742%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

Benchmarking like_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
like_utf8 scalar complex
                      time:   [1.2768 ms 1.2770 ms 1.2772 ms]
                      change: [-85.872% -85.868% -85.865%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe

nlike_utf8 scalar equals
                      time:   [380.18 µs 380.32 µs 380.51 µs]
                      change: [-0.0281% +0.0227% +0.0706%] (p = 0.39 > 0.05)
                      No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe

nlike_utf8 scalar contains
                      time:   [2.0066 ms 2.0083 ms 2.0099 ms]
                      change: [+0.6815% +0.7869% +0.8975%] (p = 0.00 < 0.05)
                      Change within noise threshold.

nlike_utf8 scalar ends with
                      time:   [379.45 µs 379.65 µs 379.91 µs]
                      change: [-0.1908% -0.0220% +0.1077%] (p = 0.81 > 0.05)
                      No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
3 (3.00%) high mild
9 (9.00%) high severe

nlike_utf8 scalar starts with
                      time:   [379.84 µs 380.10 µs 380.46 µs]
                      change: [+0.0558% +0.1208% +0.1876%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe

Benchmarking nlike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
nlike_utf8 scalar complex
                      time:   [1.2763 ms 1.2765 ms 1.2768 ms]
                      change: [-85.881% -85.878% -85.874%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

ilike_utf8 scalar equals
                      time:   [2.8738 ms 2.8751 ms 2.8762 ms]
                      change: [+0.0368% +0.0849% +0.1312%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low severe
2 (2.00%) high mild
1 (1.00%) high severe

ilike_utf8 scalar contains
                      time:   [4.4475 ms 4.4509 ms 4.4543 ms]
                      change: [-1.9928% -1.9116% -1.8345%] (p = 0.00 < 0.05)
                      Performance has improved.

ilike_utf8 scalar ends with
                      time:   [2.8618 ms 2.8633 ms 2.8645 ms]
                      change: [-0.1859% -0.1240% -0.0760%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) low severe
1 (1.00%) low mild

ilike_utf8 scalar starts with
                      time:   [2.8435 ms 2.8444 ms 2.8454 ms]
                      change: [-0.2926% -0.2414% -0.1961%] (p = 0.00 < 0.05)
                      Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

ilike_utf8 scalar complex
                      time:   [2.4073 ms 2.4078 ms 2.4084 ms]
                      change: [-78.295% -78.290% -78.284%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

nilike_utf8 scalar equals
                      time:   [2.9062 ms 2.9072 ms 2.9083 ms]
                      change: [-1.7614% -1.7197% -1.6777%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low severe
3 (3.00%) low mild
8 (8.00%) high mild
2 (2.00%) high severe

nilike_utf8 scalar contains
                      time:   [4.4728 ms 4.4755 ms 4.4784 ms]
                      change: [-2.0144% -1.9452% -1.8747%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

nilike_utf8 scalar ends with
                      time:   [2.9045 ms 2.9055 ms 2.9066 ms]
                      change: [-0.0610% -0.0224% +0.0171%] (p = 0.29 > 0.05)
                      No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high severe

nilike_utf8 scalar starts with
                      time:   [2.8983 ms 2.9002 ms 2.9025 ms]
                      change: [-0.0080% +0.0594% +0.1436%] (p = 0.16 > 0.05)
                      No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
1 (1.00%) high mild
5 (5.00%) high severe

nilike_utf8 scalar complex
                      time:   [2.4668 ms 2.4672 ms 2.4677 ms]
                      change: [-77.898% -77.892% -77.885%] (p = 0.00 < 0.05)
                      Performance has improved.

Regex improvements

egexp_matches_utf8 scalar starts with
                      time:   [1.3229 ms 1.3238 ms 1.3247 ms]
                      change: [-58.274% -58.244% -58.215%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Benchmarking egexp_matches_utf8 scalar ends with: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
egexp_matches_utf8 scalar ends with
                      time:   [1.3368 ms 1.3372 ms 1.3376 ms]
                      change: [-81.771% -81.762% -81.754%] (p = 0.00 < 0.05)
                      Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe

@Dandandan
Copy link
Contributor

Nice catch @psvri !

@tustvold tustvold merged commit 14e6212 into apache:master Nov 25, 2022
@psvri psvri deleted the regex-improvements branch November 25, 2022 16:54
@ursabot
Copy link

ursabot commented Nov 25, 2022

Benchmark runs are scheduled for baseline = cbe5af0 and contender = 14e6212. 14e6212 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants