Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Commissioner callbacks #8882

Merged
merged 4 commits into from Oct 16, 2020
Merged

Optimize Commissioner callbacks #8882

merged 4 commits into from Oct 16, 2020

Conversation

marcandre
Copy link
Contributor

Callback building was the single most time consuming method in all RuboCop, at around 8% of overall processing.

This is because we currently build a new Commissionner for every file (could be optimized but not easy as config can change) and a commissioner must know which cops need responding to which node types.

Previous method was thus O(files * cops * types) where types is the number of node types encountered in a typical file.
Testing on files == 200 typical RuboCop files, I calculated types ~22.5. Currently cops == 422.

This new algorithm is the same O but types is instead the number of types that a cop responds to (on average). For the cops that we run that is ~1.8.

The optimized cache building is almost 6x faster, for an overall gain of ~6.6%

It also has the advantage that adding new callbacks has zero impact on the cache building. Note: I'd like to add on_after_send etc (see #8844).

The only downside is that the list of callbacks is (by default) cached per Cop class.
This means that any Cop that somehow relies on adding callbacks it responds to at runtime is incompatible.

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods on_send, etc., even though they would be overriden by the extending modules), or overriding Cop::Base#callbacks_needed (although I marked the api as private for now).

How I tested performance:

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#cops_callbacks_for'
  samples:  1039 self (7.5%)  /   1104 total (8.0%)

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#initialize_callbacks'
  samples:    25 self (0.2%)  /    180 total (1.4%)

@marcandre marcandre force-pushed the optimize_callbacks branch 2 times, most recently from 6f3c833 to 79eabca Compare October 10, 2020 05:52
@bbatsov
Copy link
Collaborator

bbatsov commented Oct 10, 2020

Great work!

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods on_send, etc., even though they would be overriden by the extending modules), or overriding Cop::Base#callbacks_needed (although I marked the api as private for now).

I think it's fine to just assume cops shouldn't do this. As you've mentioned yourself - only cop does this, so no need to overthink it.

@marcandre
Copy link
Contributor Author

I should probably wait until after we make the next bugfix release to merge this

@bbatsov
Copy link
Collaborator

bbatsov commented Oct 10, 2020

Agreed.

marcandre added a commit that referenced this pull request Oct 13, 2020
I doubled check that this has basically no performance impact (less than ~0.1% overall) after #8882
@bbatsov
Copy link
Collaborator

bbatsov commented Oct 15, 2020

You can update the changelog and merge this.

@marcandre marcandre changed the base branch from dont_over_extend to master October 16, 2020 02:55
@marcandre
Copy link
Contributor Author

@bbatsov talking about ChangeLog conflicts, I've been through all stages of adding changelog entries, merging them (with an existing entry) and release in rubocop-ast... May I copy that over here too or you would rather wait?

…boCop, at around 8% of overall processing.

This is because we currently build a new `Commissionner` for every file (could be optimized but not easy as config can change) and a commissioner must know which cops need responding to which node types.

Previous method was thus `O(files * cops * types)` where `types` is the number of node types encountered in a typical file.
Testing on `files == 200` typical RuboCop files, I calculated `types ~22.5`. Currently `cops == 422`.

This new algorithm is the same `O` but `types` is instead the number of types that a cop responds to (on average). For the cops that we run that is ~1.8.

The optimized cache building is almost 6x faster, for an *overall gain of ~6.6%*

It also has the advantage that adding new callbacks has zero impact on the cache building. Note: I'd like to add `on_after_send` etc (see #8844).

The only downside is that the list of callbacks is (by default) cached per Cop class.
This means that any Cop that somehow relies on *adding* callbacks it responds to *at runtime* is incompatible.

There's a single cop that does this (see #8881). Solutions include: not doing that (as in the PR), or only modifying the callbacks (in this case it could have been done by adding empty methods `on_send`, etc., even though they would be overriden by the extending modules), or overriding `Cop::Base#callbacks_needed` (although I marked the api as private for now).

How I tested performance:
```
$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#cops_callbacks_for'
  samples:  1039 self (7.5%)  /   1104 total (8.0%)

$ stackprof tmp/stackprof.dump --text --method 'RuboCop::Cop::Commissioner#initialize_callbacks'
  samples:    25 self (0.2%)  /    180 total (1.4%)

```
@mergify mergify bot merged commit 13df023 into master Oct 16, 2020
@marcandre marcandre deleted the optimize_callbacks branch October 16, 2020 03:13
@bbatsov
Copy link
Collaborator

bbatsov commented Oct 16, 2020

May I copy that over here too or you would rather wait?

You can copy it over if it seems robust to you at this point. Just don't forget to update the contribution docs as well, otherwise. Maybe we should also add some note to the top of the changelog that it's preferrable to not update it directly.

marcandre added a commit to marcandre/rubocop that referenced this pull request Oct 18, 2020
I doubled check that this has basically no performance impact (less than ~0.1% overall) after rubocop#8882
marcandre added a commit that referenced this pull request Oct 20, 2020
I doubled check that this has basically no performance impact (less than ~0.1% overall) after #8882
mergify bot pushed a commit that referenced this pull request Oct 20, 2020
I doubled check that this has basically no performance impact (less than ~0.1% overall) after #8882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants