Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #8708] Fix bad regexp recognition in Lint/OutOfRangeRegexpRef when there are multiple regexps #8844

Merged
merged 2 commits into from Oct 26, 2020

Conversation

dvandersluis
Copy link
Member

Lint/OutOfRangeRegexpRef was using the wrong regexp to determine how many capturing groups there are, in a few cases, that are now fixed. I also updated the error message to be a bit more straightforward because I found it hard to understand, but please let me know if you'd like me to reword or revert.

Also I have no idea what the plural of regexp should be. 😅

Fixes #8708.


Before submitting the PR make sure the following are checked:

  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.

@marcandre
Copy link
Contributor

Update: I started reviewing, love the message changes 👍
It would be quite nice to have a after_send callback of some kind, instead of messing around with ignore things... I'm checking a few things on the Commissionner side.

@dvandersluis
Copy link
Member Author

@marcandre thanks! Let me know if you want me to change something.

@marcandre
Copy link
Contributor

marcandre commented Oct 8, 2020

Could you check if it also has this issue? #8708

@dvandersluis
Copy link
Member Author

I'm not sure what you mean? This PR is for that issue haha 😅

@dvandersluis
Copy link
Member Author

Oh I assume you meant #8862 / #8863 ? That's a different cop so there shouldn't be a conflict.

@dvandersluis
Copy link
Member Author

Updated to fix the changelog now that 0.93 is out.

Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I assume you meant #8862 / #8863 ? That's a different cop so there shouldn't be a conflict.

Ooops, yes, my double mistake 🤦‍♂️, need coffee apparently

lib/rubocop/cop/lint/out_of_range_regexp_ref.rb Outdated Show resolved Hide resolved
@marcandre
Copy link
Contributor

So I checked about adding on_after_xxx and it's sadly in a hotspot, it would cost a 5% slowdown which is pretty sad 😿
I could add it only just for send, wouldn't be so bad, or I could introduce some API to add a callback on the fly, but that feels a bit ugly...
So I'm tempted to just leave this as is (after the conflict is resolved...) even though the logic is quite convoluted just because we don't currently have a way do processing after a particular node's children are processed...

@dvandersluis
Copy link
Member Author

@marcandre fixed the conflict, happy to make other changes if you want, just let me know!

@dvandersluis dvandersluis force-pushed the issue/8708 branch 2 times, most recently from 15e88e3 to 243222b Compare October 9, 2020 17:14
marcandre added a commit that referenced this pull request Oct 10, 2020
…boCop, at around 8% of overall processing.

This is because we currently build a new `Commissionner` for every file (could be optimized but not easy as config can change) and a commissioner must know which cops need responding to which node types.

Previous method was thus `O(files * cops * types)` where `types` is the number of node types encountered in a typical file.
Testing on `files == 200` typical RuboCop files, I calculated `types ~22.5`. Currently `cops == 422`.

This new algorithm is the same `O` but `types` is instead the number of types that a cop responds to (on average). For the cops that we run that is ~1.8.

The optimized cache building is almost 6x faster, for an *overall gain of ~6.6%*

It also has the advantage that adding new callbacks has zero impact on the cache building. Note: I'd like to add `on_after_send` etc (see #8844).

The only downside is that the list of callbacks is (by default) cached per Cop class.
This means that any Cop that somehow relies on *adding* callbacks it responds to *at runtime* is incompatible.

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods `on_send`, etc., even though they would be overriden by the extending modules), or overriding `Cop::Base#callbacks_needed` (although I marked the api as private for now).

How I tested performance:
```
$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#cops_callbacks_for'
  samples:  1039 self (7.5%)  /   1104 total (8.0%)

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#initialize_callbacks'
  samples:    25 self (0.2%)  /    180 total (1.4%)

```
marcandre added a commit that referenced this pull request Oct 10, 2020
…boCop, at around 8% of overall processing.

This is because we currently build a new `Commissionner` for every file (could be optimized but not easy as config can change) and a commissioner must know which cops need responding to which node types.

Previous method was thus `O(files * cops * types)` where `types` is the number of node types encountered in a typical file.
Testing on `files == 200` typical RuboCop files, I calculated `types ~22.5`. Currently `cops == 422`.

This new algorithm is the same `O` but `types` is instead the number of types that a cop responds to (on average). For the cops that we run that is ~1.8.

The optimized cache building is almost 6x faster, for an *overall gain of ~6.6%*

It also has the advantage that adding new callbacks has zero impact on the cache building. Note: I'd like to add `on_after_send` etc (see #8844).

The only downside is that the list of callbacks is (by default) cached per Cop class.
This means that any Cop that somehow relies on *adding* callbacks it responds to *at runtime* is incompatible.

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods `on_send`, etc., even though they would be overriden by the extending modules), or overriding `Cop::Base#callbacks_needed` (although I marked the api as private for now).

How I tested performance:
```
$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#cops_callbacks_for'
  samples:  1039 self (7.5%)  /   1104 total (8.0%)

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#initialize_callbacks'
  samples:    25 self (0.2%)  /    180 total (1.4%)

```
@marcandre
Copy link
Contributor

Update: I opened #8889; I'm waiting a day or two to see if the latest bug fix release is stable and I'll merge that, which would give you the on_after_send that would simplify the processing I believe.

marcandre added a commit that referenced this pull request Oct 16, 2020
…boCop, at around 8% of overall processing.

This is because we currently build a new `Commissionner` for every file (could be optimized but not easy as config can change) and a commissioner must know which cops need responding to which node types.

Previous method was thus `O(files * cops * types)` where `types` is the number of node types encountered in a typical file.
Testing on `files == 200` typical RuboCop files, I calculated `types ~22.5`. Currently `cops == 422`.

This new algorithm is the same `O` but `types` is instead the number of types that a cop responds to (on average). For the cops that we run that is ~1.8.

The optimized cache building is almost 6x faster, for an *overall gain of ~6.6%*

It also has the advantage that adding new callbacks has zero impact on the cache building. Note: I'd like to add `on_after_send` etc (see #8844).

The only downside is that the list of callbacks is (by default) cached per Cop class.
This means that any Cop that somehow relies on *adding* callbacks it responds to *at runtime* is incompatible.

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods `on_send`, etc., even though they would be overriden by the extending modules), or overriding `Cop::Base#callbacks_needed` (although I marked the api as private for now).

How I tested performance:
```
$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#cops_callbacks_for'
  samples:  1039 self (7.5%)  /   1104 total (8.0%)

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#initialize_callbacks'
  samples:    25 self (0.2%)  /    180 total (1.4%)

```
mergify bot pushed a commit that referenced this pull request Oct 16, 2020
…boCop, at around 8% of overall processing.

This is because we currently build a new `Commissionner` for every file (could be optimized but not easy as config can change) and a commissioner must know which cops need responding to which node types.

Previous method was thus `O(files * cops * types)` where `types` is the number of node types encountered in a typical file.
Testing on `files == 200` typical RuboCop files, I calculated `types ~22.5`. Currently `cops == 422`.

This new algorithm is the same `O` but `types` is instead the number of types that a cop responds to (on average). For the cops that we run that is ~1.8.

The optimized cache building is almost 6x faster, for an *overall gain of ~6.6%*

It also has the advantage that adding new callbacks has zero impact on the cache building. Note: I'd like to add `on_after_send` etc (see #8844).

The only downside is that the list of callbacks is (by default) cached per Cop class.
This means that any Cop that somehow relies on *adding* callbacks it responds to *at runtime* is incompatible.

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods `on_send`, etc., even though they would be overriden by the extending modules), or overriding `Cop::Base#callbacks_needed` (although I marked the api as private for now).

How I tested performance:
```
$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#cops_callbacks_for'
  samples:  1039 self (7.5%)  /   1104 total (8.0%)

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#initialize_callbacks'
  samples:    25 self (0.2%)  /    180 total (1.4%)

```
@marcandre
Copy link
Contributor

after_send is now available...

@dvandersluis
Copy link
Member Author

@marcandre cool thanks, I’ll probably Be able to update this tomorrow!

@marcandre
Copy link
Contributor

marcandre commented Oct 20, 2020 via email

@dvandersluis
Copy link
Member Author

@marcandre updated to use after_send, thanks that really simplified it!

CHANGELOG.md Outdated Show resolved Hide resolved
@dvandersluis
Copy link
Member Author

@bbatsov made all the requested changes, thanks!

@bbatsov
Copy link
Collaborator

bbatsov commented Oct 25, 2020

Can you also address the changelog conflict?

@marcandre
Copy link
Contributor

Can you also address the changelog conflict?

You can use rake changelog:fix and al. in the future 😄

@dvandersluis
Copy link
Member Author

dvandersluis commented Oct 26, 2020

@bbatsov @marcandre moved the changelog item in changelog/

@marcandre marcandre merged commit 3ebc79f into rubocop:master Oct 26, 2020
@marcandre
Copy link
Contributor

Outstanding work @dvandersluis 👍
Thanks!

@dvandersluis dvandersluis deleted the issue/8708 branch January 18, 2021 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lint/OutOfRangeRegexpRef picks wrong regex
3 participants