Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2: Improve matching #484

Merged
merged 3 commits into from Feb 1, 2022
Merged

s2: Improve matching #484

merged 3 commits into from Feb 1, 2022

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Feb 1, 2022

Improve end-of-buffer speed.

Add goamd64_v3 version with small improvement for matching. For now set as tag to enable.

For now guarded by build tags to not duplicate all code.

benchmark                                                                 old ns/op      new ns/op      delta
BenchmarkTwainEncode1e1/default-32                                        8.32           8.28           -0.49%
BenchmarkTwainEncode1e1/better-32                                         8.36           8.32           -0.53%
BenchmarkTwainEncode1e1/snappy-default-32                                 8.34           8.32           -0.17%
BenchmarkTwainEncode1e1/snappy-better-32                                  8.31           8.31           +0.00%
BenchmarkTwainEncode1e2/default-32                                        94.4           93.8           -0.70%
BenchmarkTwainEncode1e2/better-32                                         273            269            -1.36%
BenchmarkTwainEncode1e2/snappy-default-32                                 94.7           93.6           -1.17%
BenchmarkTwainEncode1e2/snappy-better-32                                  273            268            -1.58%
BenchmarkTwainEncode1e3/default-32                                        872            867            -0.62%
BenchmarkTwainEncode1e3/better-32                                         2416           2403           -0.54%
BenchmarkTwainEncode1e3/snappy-default-32                                 869            862            -0.84%
BenchmarkTwainEncode1e3/snappy-better-32                                  2415           2402           -0.54%
BenchmarkTwainEncode1e4/default-32                                        10080          9862           -2.16%
BenchmarkTwainEncode1e4/better-32                                         24173          23778          -1.63%
BenchmarkTwainEncode1e4/snappy-default-32                                 10038          9900           -1.37%
BenchmarkTwainEncode1e4/snappy-better-32                                  24088          23655          -1.80%
BenchmarkTwainEncode1e5/default-32                                        208338         204080         -2.04%
BenchmarkTwainEncode1e5/better-32                                         400069         382699         -4.34%
BenchmarkTwainEncode1e5/snappy-default-32                                 207783         200382         -3.56%
BenchmarkTwainEncode1e5/snappy-better-32                                  388589         378026         -2.72%
BenchmarkTwainEncode1e6/default-32                                        2305542        2251826        -2.33%
BenchmarkTwainEncode1e6/better-32                                         4023332        3904791        -2.95%
BenchmarkTwainEncode1e6/snappy-default-32                                 2300992        2179567        -5.28%
BenchmarkTwainEncode1e6/snappy-better-32                                  3938222        3879487        -1.49%
BenchmarkTwainEncode1e7/default-32                                        23717990       22395276       -5.58%
BenchmarkTwainEncode1e7/better-32                                         42845300       42508469       -0.79%
BenchmarkTwainEncode1e7/snappy-default-32                                 23335686       22315622       -4.37%
BenchmarkTwainEncode1e7/snappy-better-32                                  42227550       41652074       -1.36%

Improve end-of-buffer speed.

Add `GOAMD64_v3` version with small improvement for matching.

For now guarded by build tags to not duplicate all code.

```
benchmark                                                                 old ns/op      new ns/op      delta
BenchmarkTwainEncode1e1/default-32                                        8.32           8.28           -0.49%
BenchmarkTwainEncode1e1/better-32                                         8.36           8.32           -0.53%
BenchmarkTwainEncode1e1/best-32                                           8.31           8.32           +0.02%
BenchmarkTwainEncode1e1/snappy-default-32                                 8.34           8.32           -0.17%
BenchmarkTwainEncode1e1/snappy-better-32                                  8.31           8.31           +0.00%
BenchmarkTwainEncode1e1/snappy-best-32                                    8.31           8.29           -0.20%
BenchmarkTwainEncode1e1/snappy-ref-noasm-32                               7.61           7.62           +0.22%
BenchmarkTwainEncode1e2/default-32                                        94.4           93.8           -0.70%
BenchmarkTwainEncode1e2/better-32                                         273            269            -1.36%
BenchmarkTwainEncode1e2/best-32                                           76827          75007          -2.37%
BenchmarkTwainEncode1e2/snappy-default-32                                 94.7           93.6           -1.17%
BenchmarkTwainEncode1e2/snappy-better-32                                  273            268            -1.58%
BenchmarkTwainEncode1e2/snappy-best-32                                    72735          72867          +0.18%
BenchmarkTwainEncode1e2/snappy-ref-noasm-32                               471            469            -0.25%
BenchmarkTwainEncode1e3/default-32                                        872            867            -0.62%
BenchmarkTwainEncode1e3/better-32                                         2416           2403           -0.54%
BenchmarkTwainEncode1e3/best-32                                           128772         128589         -0.14%
BenchmarkTwainEncode1e3/snappy-default-32                                 869            862            -0.84%
BenchmarkTwainEncode1e3/snappy-better-32                                  2415           2402           -0.54%
BenchmarkTwainEncode1e3/snappy-best-32                                    94544          92615          -2.04%
BenchmarkTwainEncode1e3/snappy-ref-noasm-32                               2317           2328           +0.47%
BenchmarkTwainEncode1e4/default-32                                        10080          9862           -2.16%
BenchmarkTwainEncode1e4/better-32                                         24173          23778          -1.63%
BenchmarkTwainEncode1e4/best-32                                           638221         632676         -0.87%
BenchmarkTwainEncode1e4/snappy-default-32                                 10038          9900           -1.37%
BenchmarkTwainEncode1e4/snappy-better-32                                  24088          23655          -1.80%
BenchmarkTwainEncode1e4/snappy-best-32                                    336750         334551         -0.65%
BenchmarkTwainEncode1e4/snappy-ref-noasm-32                               25050          24941          -0.44%
BenchmarkTwainEncode1e5/default-32                                        208338         204080         -2.04%
BenchmarkTwainEncode1e5/better-32                                         400069         382699         -4.34%
BenchmarkTwainEncode1e5/best-32                                           5249363        5374492        +2.38%
BenchmarkTwainEncode1e5/snappy-default-32                                 207783         200382         -3.56%
BenchmarkTwainEncode1e5/snappy-better-32                                  388589         378026         -2.72%
BenchmarkTwainEncode1e5/snappy-best-32                                    2889378        2781338        -3.74%
BenchmarkTwainEncode1e5/snappy-ref-noasm-32                               487332         484808         -0.52%
BenchmarkTwainEncode1e6/default-32                                        2305542        2251826        -2.33%
BenchmarkTwainEncode1e6/better-32                                         4023332        3904791        -2.95%
BenchmarkTwainEncode1e6/best-32                                           53576955       54518800       +1.76%
BenchmarkTwainEncode1e6/snappy-default-32                                 2300992        2179567        -5.28%
BenchmarkTwainEncode1e6/snappy-better-32                                  3938222        3879487        -1.49%
BenchmarkTwainEncode1e6/snappy-best-32                                    30057235       30808837       +2.50%
BenchmarkTwainEncode1e6/snappy-ref-noasm-32                               4890432        4866709        -0.49%
BenchmarkTwainEncode1e7/default-32                                        23717990       22395276       -5.58%
BenchmarkTwainEncode1e7/better-32                                         42845300       42508469       -0.79%
BenchmarkTwainEncode1e7/best-32                                           1113607500     1111374800     -0.20%
BenchmarkTwainEncode1e7/snappy-default-32                                 23335686       22315622       -4.37%
BenchmarkTwainEncode1e7/snappy-better-32                                  42227550       41652074       -1.36%
BenchmarkTwainEncode1e7/snappy-best-32                                    410723367      421980100      +2.74%
BenchmarkTwainEncode1e7/snappy-ref-noasm-32                               51418814       51197981       -0.43%
```
@klauspost klauspost merged commit a1a9cfc into master Feb 1, 2022
@klauspost klauspost deleted the s2-update-matching branch February 1, 2022 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant