Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruleguard/textmatch: an abstraction on top of regexp for performance #281

Merged
merged 1 commit into from Oct 14, 2021

Conversation

quasilyte
Copy link
Owner

textmatch.Compile() takes a regexp pattern and tries to recognize
it, returning the matcher that can match the input strings faster
than real *regexp.Regexp would. If it can't recognize the pattern,
it returns a normal *regexp.Regexp.

Right now we only optimize the simplest patterns, but it's a
first step to prove that we can still use regexp in ruleguard
rules and avoid big performance loses.

name                       old time/op    new time/op    delta
Match/^\p{Lu}_0-8             153ns ± 4%      11ns ± 1%  -92.81%  (p=0.008 n=5+5)
Match/^\p{Lu}_1-8             140ns ± 2%      11ns ± 0%  -92.13%  (p=0.008 n=5+5)
Match/^\p{Ll}_0-8             152ns ± 1%      11ns ± 1%  -92.77%  (p=0.008 n=5+5)
Match/^\p{Ll}_1-8             140ns ± 2%      11ns ± 3%  -92.04%  (p=0.008 n=5+5)
Match/foo$_0-8                174ns ± 1%      13ns ± 1%  -92.26%  (p=0.008 n=5+5)
Match/foo$_1-8               83.4ns ± 2%    13.4ns ± 6%  -83.96%  (p=0.008 n=5+5)
Match/^foo_0-8                135ns ± 0%      10ns ± 1%  -92.33%  (p=0.016 n=4+5)
Match/^foo_1-8                108ns ± 4%      11ns ± 4%  -89.78%  (p=0.008 n=5+5)
Match/simpleIdent_0-8         243ns ± 2%      18ns ± 1%  -92.51%  (p=0.008 n=5+5)
Match/simpleIdent_1-8        92.7ns ± 1%    26.5ns ± 1%  -71.43%  (p=0.008 n=5+5)
Match/.*simpleIdent.*_0-8    1.59µs ± 2%    0.02µs ± 1%  -98.86%  (p=0.008 n=5+5)
Match/.*simpleIdent.*_1-8    1.70µs ± 1%    0.03µs ± 1%  -98.46%  (p=0.008 n=5+5)
Match/simpleIdent_0#01-8      237ns ± 1%      14ns ± 1%  -94.03%  (p=0.008 n=5+5)
Match/simpleIdent_1#01-8      247ns ± 1%      24ns ± 3%  -90.42%  (p=0.008 n=5+5)
[Geo mean]                    211ns           15ns       -93.00%

`textmatch.Compile()` takes a regexp pattern and tries to recognize
it, returning the matcher that can match the input strings faster
than real `*regexp.Regexp` would. If it can't recognize the pattern,
it returns a normal `*regexp.Regexp`.

Right now we only optimize the simplest patterns, but it's a
first step to prove that we can still use regexp in ruleguard
rules and avoid big performance loses.

```
name                       old time/op    new time/op    delta
Match/^\p{Lu}_0-8             153ns ± 4%      11ns ± 1%  -92.81%  (p=0.008 n=5+5)
Match/^\p{Lu}_1-8             140ns ± 2%      11ns ± 0%  -92.13%  (p=0.008 n=5+5)
Match/^\p{Ll}_0-8             152ns ± 1%      11ns ± 1%  -92.77%  (p=0.008 n=5+5)
Match/^\p{Ll}_1-8             140ns ± 2%      11ns ± 3%  -92.04%  (p=0.008 n=5+5)
Match/foo$_0-8                174ns ± 1%      13ns ± 1%  -92.26%  (p=0.008 n=5+5)
Match/foo$_1-8               83.4ns ± 2%    13.4ns ± 6%  -83.96%  (p=0.008 n=5+5)
Match/^foo_0-8                135ns ± 0%      10ns ± 1%  -92.33%  (p=0.016 n=4+5)
Match/^foo_1-8                108ns ± 4%      11ns ± 4%  -89.78%  (p=0.008 n=5+5)
Match/simpleIdent_0-8         243ns ± 2%      18ns ± 1%  -92.51%  (p=0.008 n=5+5)
Match/simpleIdent_1-8        92.7ns ± 1%    26.5ns ± 1%  -71.43%  (p=0.008 n=5+5)
Match/.*simpleIdent.*_0-8    1.59µs ± 2%    0.02µs ± 1%  -98.86%  (p=0.008 n=5+5)
Match/.*simpleIdent.*_1-8    1.70µs ± 1%    0.03µs ± 1%  -98.46%  (p=0.008 n=5+5)
Match/simpleIdent_0#01-8      237ns ± 1%      14ns ± 1%  -94.03%  (p=0.008 n=5+5)
Match/simpleIdent_1#01-8      247ns ± 1%      24ns ± 3%  -90.42%  (p=0.008 n=5+5)
[Geo mean]                    211ns           15ns       -93.00%
```
@quasilyte quasilyte merged commit 7b21d77 into master Oct 14, 2021
@quasilyte quasilyte deleted the quasilyte/textmatch branch October 14, 2021 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant