Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regex dependency optional #385

Merged
merged 3 commits into from
Feb 4, 2021
Merged

Conversation

Kixunil
Copy link
Contributor

@Kixunil Kixunil commented Jan 13, 2021

regex was used only in four trivial cases that could be implemented
more simply, either naively or using memchr, without losing performance.
As such the dependency needlessly increases build time, size of binary
and attack surface.

This change makes regex optional and defaults to naive/memchr
implementations. This improves performance a bit. The dependency
could've been removed entirely but was kept in case regression is
discovered on another platform and to make comparing the performance
easier. It can be removed in the future if the code is proven to be
reliable.

Benchmarked on x86_64. Thumbs up for providing convenient benches. :)

`regex` was used only in four trivial cases that could be implemented
more simply, either naively or using memchr, without losing performance.
As such the dependency needlessly increases build time, size of binary
and attack surface.

This change makes `regex` optional and defaults to `naive`/`memchr`
implementations. This *improves* performance a bit. The dependency
could've been removed entirely but was kept in case regression is
discovered on another platform and to make comparing the performance
easier. It can be removed in the future if the code is proven to be
reliable.

Signed-off-by: Martin Habovstiak <martin.habovstiak@gmail.com>
@mxinden
Copy link
Contributor

mxinden commented Jan 14, 2021

Oh, I wasn't aware of memchr, thanks for the pointer. From a first look this seems reasonable to me, but I will do a proper review in the next couple of days.

In the meantime, would you mind posting the difference of the text_encode benchmark before and after here? critcmp might proof useful here.

@Kixunil
Copy link
Contributor Author

Kixunil commented Jan 14, 2021

Great! Forgot to mention: memchr is also a dependency of regex so this doesn't replace dependencies, just reduces. :)

Attempted to use critcmp but hit an error

$ cargo +stable bench --features=regex --bench text_encoder -- --save-baseline regex
$ cargo +stable bench --bench text_encoder -- --save-baseline noregex

Outputs:

regex

Benchmarking text_encoder_without_escaping: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.1s, or reduce sample count to 40.
text_encoder_without_escaping                                                                            
                        time:   [92.321 ms 98.873 ms 106.25 ms]
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

Benchmarking text_encoder_with_escaping: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 26.4s, or reduce sample count to 10.
text_encoder_with_escaping                                                                            
                        time:   [134.25 ms 140.53 ms 147.40 ms]
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

noregex

Benchmarking text_encoder_without_escaping: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, or reduce sample count to 50.
text_encoder_without_escaping                                                                            
                        time:   [84.216 ms 90.620 ms 97.664 ms]
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe

Benchmarking text_encoder_with_escaping: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 26.6s, or reduce sample count to 10.
text_encoder_with_escaping                                                                            
                        time:   [125.06 ms 131.48 ms 138.58 ms]
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

BTW I noticed even bigger difference in desc before. Can check again if you want.

Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes regex optional and defaults to naive/memchr implementations. This improves performance a bit. The dependency could've been removed entirely but was kept in case regression is discovered on another platform and to make comparing the performance easier. It can be removed in the future if the code is proven to be reliable.

I would advocate for removing regex entirely. In case it does turn out to be less performant than using its parent crate (regex) on some platforms, it is not exposed on the API surface and thus can be switched back without a breaking change if we don't find the actual cause.

Benchmarks look great to me.

$ critcmp regex no_regex
group                            no_regex                               regex
-----                            --------                               -----
text_encoder_with_escaping       1.00     88.4±2.01ms        ? B/sec    1.08     95.7±1.79ms        ? B/sec
text_encoder_without_escaping    1.00     59.4±1.22ms        ? B/sec    1.18     70.2±1.98ms        ? B/sec

$ cat /proc/cpuinfo | grep "model name" | head -1
model name      : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

Attempted to use critcmp but hit an error

Haven't seen the issue before even though I am running the same version.

$ critcmp --version
critcmp 0.1.4

$ cargo tree | grep criterion
├── criterion v0.3.3
│   ├── criterion-plot v0.4.3

src/desc.rs Outdated Show resolved Hide resolved
src/desc.rs Outdated Show resolved Hide resolved
src/desc.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Contributor

mxinden commented Feb 1, 2021

Sorry for the long delays here @Kixunil. Again, I would be in favor of removing the regex crate all together. Would you mind removing it in this pull request? Given that there haven't been any objections thus far, we can then go ahead and merge this pull request.

@Kixunil
Copy link
Contributor Author

Kixunil commented Feb 1, 2021

Sure, I wanted to do it sooner, just got distracted by ton of other stuff.

@Kixunil
Copy link
Contributor Author

Kixunil commented Feb 2, 2021

Done, also added two more comments that seemed like a good idea.

@Kixunil
Copy link
Contributor Author

Kixunil commented Feb 2, 2021

Hmm, something must have confused DCO. I added the line into the commit yet it fails.

@lucab
Copy link
Member

lucab commented Feb 2, 2021

@Kixunil the bot needs to see the sign-off on every single commit in the PR. To that extent, it looks like the first commit is fine while the remaining ones are missing that.

Co-authored-by: Max Inden <mail@max-inden.de>

Signed-off-by: Martin Habovstiak <martin.habovstiak@gmail.com>
This removes `regex` as suggested in the PR.

Signed-off-by: Martin Habovstiak <martin.habovstiak@gmail.com>
@Kixunil
Copy link
Contributor Author

Kixunil commented Feb 2, 2021

@lucab ah, I thought Co-authored-by serves as replacement. Fixed.

@mxinden mxinden merged commit 021d07b into tikv:master Feb 4, 2021
@mxinden
Copy link
Contributor

mxinden commented Feb 4, 2021

Thanks @Kixunil for bearing with us!

@Kixunil Kixunil deleted the optional_regex branch February 4, 2021 17:41
@Kixunil
Copy link
Contributor Author

Kixunil commented Feb 4, 2021

Thank you too! Any specific plans for the date of next release?

@mxinden
Copy link
Contributor

mxinden commented Feb 4, 2021

Nothing planned as far as I know, happy to do one though, if it can be of some help. I still need to review #387 which seems worth including. What do you think?

@Kixunil
Copy link
Contributor Author

Kixunil commented Feb 4, 2021

Definitely worth including!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants