Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #8327] Add new cop Style/SelectByRegexp. #10082

Merged
merged 3 commits into from Sep 28, 2021

Conversation

dvandersluis
Copy link
Member

@dvandersluis dvandersluis commented Sep 14, 2021

This cop suggests using array.grep(regexp) instead of array.select { |x| x.match?(regexp) } and array.grep_v(regexp) instead of array.reject { |x| x.match?(regexp) } on ruby >= 3.0. Previously grep and grep_v were less performant so this is not recommended, but it was fixed in https://bugs.ruby-lang.org/issues/17030.

I re-ran the benchmark from #8327 with some more possibilities:

select vs grep benchmark
Warming up --------------------------------------
select { String.match?(Regexp })
                       173.463k i/100ms
select { Regexp.match?(String })
                       175.141k i/100ms
select { String =~ Regexp }
                        70.109k i/100ms
select { Regexp =~ String }
                        69.561k i/100ms
                grep   178.889k i/100ms
Calculating -------------------------------------
select { String.match?(Regexp })
                          1.739M (± 0.7%) i/s -      8.847M in   5.086550s
select { Regexp.match?(String })
                          1.764M (± 0.9%) i/s -      8.932M in   5.063852s
select { String =~ Regexp }
                        682.806k (± 4.9%) i/s -      3.435M in   5.044940s
select { Regexp =~ String }
                        674.551k (± 6.9%) i/s -      3.408M in   5.085230s
                grep      1.753M (± 6.3%) i/s -      8.766M in   5.023221s

Comparison:
select { Regexp.match?(String }):  1764051.3 i/s
                            grep:  1753027.1 i/s - same-ish: difference falls within error
select { String.match?(Regexp }):  1739303.8 i/s - same-ish: difference falls within error
select { String =~ Regexp }:       682806.4 i/s - 2.58x  (± 0.00) slower
select { Regexp =~ String }:       674550.9 i/s - 2.62x  (± 0.00) slower

Since performance of grep is the same or better than all 4 options, they are all covered by this cop. I did not explicitly handle negative comparisons (ie. select { |x| x !~ REGEXP } => grep_v(REGEXP)) as Style/InverseMethods can update it and then this cop will take over from there.

I also benchmarked reject vs grep_v, and while grep_v is on average slightly slower, I think it's still ok.

reject vs grep_v benchmark
Warming up --------------------------------------
reject { String.match?(Regexp) }
                       178.577k i/100ms
reject { Regexp.match?(String) }
                       177.626k i/100ms
reject { String =~ Regexp }
                        70.001k i/100ms
reject { Regexp =~ String }
                        44.087k i/100ms
              grep_v   148.860k i/100ms
Calculating -------------------------------------
reject { String.match?(Regexp) }
                          1.743M (± 4.0%) i/s -      8.750M in   5.029890s
reject { Regexp.match?(String) }
                          1.784M (± 5.2%) i/s -      9.059M in   5.094364s
reject { String =~ Regexp }
                        702.016k (± 5.6%) i/s -      3.500M in   5.003366s
reject { Regexp =~ String }
                        644.082k (± 8.8%) i/s -      3.218M in   5.040798s
              grep_v      1.624M (±11.7%) i/s -      8.187M in   5.115310s

Comparison:
reject { Regexp.match?(String) }:  1783688.7 i/s
reject { String.match?(Regexp) }:  1742531.8 i/s - same-ish: difference falls within error
                          grep_v:  1624448.0 i/s - same-ish: difference falls within error
reject { String =~ Regexp }:       702016.2 i/s - 2.54x  (± 0.00) slower
reject { Regexp =~ String }:       644081.6 i/s - 2.77x  (± 0.00) slower

Fixes #8327.


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

@bbatsov
Copy link
Collaborator

bbatsov commented Sep 15, 2021

I agree that the proposal results in more readable code. Might be a good idea to mention this in the style guide as well. I have to admit I keep forgetting about grep and grep_v all the time.

@dvandersluis
Copy link
Member Author

@marcandre requested this a long time ago and I came across it in my trek through the old issues 😁

I opened rubocop/ruby-style-guide#884.

@marcandre
Copy link
Contributor

PR looks good 👍. Does it detect x.select { &1.match? /xxx/ }?

@dvandersluis
Copy link
Member Author

@marcandre do you mean _1 instead of &1 (ie. numblocks)? If so, I keep forgetting they exist, and I'll update it for them.

@marcandre
Copy link
Contributor

@marcandre do you mean _1 instead of &1 (ie. numblocks)? If so, I keep forgetting they exist, and I'll update it for them.

Ah ah ah, yes, that's what I meant. It sows I'm doing a lot of elixir these days 😅

@marcandre
Copy link
Contributor

Also, should we consider enabling it for all versions of Ruby? It's a Style cop, not a Performance one...

@dvandersluis
Copy link
Member Author

Up to @bbatsov but, despite being Style I think we shouldn't recommend something we know will be slower, and likely the average user would have no idea. It'd be cool if we could have the cop enabled or disabled by default based on target ruby version but then overridable...

@dvandersluis
Copy link
Member Author

dvandersluis commented Sep 15, 2021

I've updated this cop to be able to handle numblocks; however, it requires rubocop/rubocop-ast#209 to be released first to make send_node.block_node work for numblocks, so the numblock tests will fail in the meantime.

@bbatsov
Copy link
Collaborator

bbatsov commented Sep 16, 2021

I think it's fine to suggest this on all supported Ruby versions with some disclaimer in the docs. It's OK for style cops to suggest something slower, but more readable.

@bbatsov
Copy link
Collaborator

bbatsov commented Sep 27, 2021

@dvandersluis Let's try to wrap this up this week, so we can include in the next RuboCop release.

@dvandersluis
Copy link
Member Author

Gladly, but I need rubocop/rubocop-ast#209 to be released in order to do so.

@dvandersluis
Copy link
Member Author

@bbatsov rubocop-ast 1.12.0 was released, and this PR was updated to use it, and remove the target ruby version check. 🚀

@bbatsov bbatsov merged commit eed36a1 into rubocop:master Sep 28, 2021
@bbatsov
Copy link
Collaborator

bbatsov commented Sep 28, 2021

Excellent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prefer grep/grep_v
3 participants