Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving auto correction's API #7868

Merged
merged 1 commit into from Jun 22, 2020
Merged

Conversation

marcandre
Copy link
Contributor

@marcandre marcandre commented Apr 12, 2020

I am writing my first cop and I feel the current autocorrect API could be improved.

This PR is just a demo and not meant for merging. A few cops were refactored using the proposed API.

When to specify corrections

My main observation is that the very best moment to do an autocorrection is when the offense is detected. Currently the autocorrection and the offense detection are (officially) decoupled. When an offense is detected, we know exactly what it is (location, type, etc.) yet we are unable to make the correction there and then.

When we are later asked to make the autocorrection, we are passed the offending "node". All the state that we might have created to analyze the offense has to be rebuilt at that point, and if we detect different style of offenses, we even have to deduct again which offense this is and correct accordingly.

Current workarounds

Some cops simply redo the detection work twice

Some cops resort to "cheats" for the current state, like AccessModifierIndentation temporarily storing @column_delta. This works only provided that each autocorrect calls are made after each add_offense. Also, a naive approach wouldn't work reliably:

def auto_correct(node)
  lambda do |corrector|
    corrector.insert_after(node, ' ' * @column_delta) # Would be the last value for all corrections
  end
end

Overly complex de-dupping

The processing currently has two de-duping being done.

Firstly, extra offenses on the same range are ignored. I can see how this could be useful for certain cops.

Secondly, if offenses are on different ranges but the same node, the autocorrector is called only for the first one. The only builtin cop that uses this one is SpaceInsideReferenceBrackets: it will add one or two offenses on the opening '[' and/or closing ']', and autocorrect will correct the node (detecting again which of the opening or closing brackets need to be corrected) and relies on the dedupping not to double-correct if both the opening and closing brackets were wrong. Clearly, this is an example where this de-dupping isn't needed but instead the correction should be specifiable when detected...

That dedupping is the only reason I can see why the API of add_offense officially asks for a Node and optionally a location. There's no good reason why an offenses would be attached to a node rather than a range. Indeed a few pass a range instead (like EmptyLineAfterMagicComment) and cheat by repeating location: <same location>.

I propose that the newer API simply asks for a range (or node/comment and uses its loc.expression) and dedupes only on the range.

Alternative to returning lambdas / false

Finally I personally dislike returning lambdas. Ruby makes it trivial to capture blocks which feels to me a much more natural way to proceed.

It also wasn't clear at first that autocorrect can also return false when no correction can be made. This complicates processing a bit... Before you want to autocorrect, you have to check that you can autocorrect.
Moreover, there is also redundancy and it permits a cop writer to return a lambda that nevertheless wouldn't make any correction.

This trouble should be avoided and deduced automatically. Technically the issue is that we would need a way to build the corrections, check if there are any, and only later merge them.

In this "demo" I use a hack. I simply run the block twice, first on a fake corrector that simply records if it is called and the usual processing will call it later on Parser's TreeRewriter. The plan would be not to use a hack but I would instead add the merging functionality into TreeRewriter (I am its author btw). Before I spend time doing it though, I'd like to see if an agreement can reached on the add_offense API.

Proposed API - simple case

I would propose an improved (and fully compatible) API.

# Before:
def on_something(node)
  # ...
  add_offense(node)
end

def auto_correct(node)
  lambda do |corrector|
    corrector.wrap(node, '(', ')')
  end
end

# After:
include Autocorrector

def on_something(node)
  # ...
  add_offense(node) do |corrector|
    corrector.wrap(node, '(', ')')
  end
end

Post processing

One complication is that add_offense already yields (unless the offense is disabled) for some rare cases where post processing is needed. I noticed that 5% of the builtin cops use this post processing vs 45% that do autocorrections; most with post processing also have autocorrection. There are many solutions to this, but the simple case is simply to use the same block:

# Before:
def on_something(node)
  # ...
  add_offense(node) do
    # ... do only if not disabled
  end
end

def auto_correct(node)
  return false unless can_correct_this_one?(node)
  lambda do |corrector|
    corrector.wrap(node, '(', ')')
    # ...
  end
end

# After:
include Autocorrector

def on_something(node)
   # ...
  add_offense(node) do |corrector|
    # ... do only if not disabled

    if can_correct_this_one?(node)
      corrector.wrap(node, '(', ')')
      # ...
    end
end

Compatiblity

What I'm proposing should be compatible with existing cops.

The new API is opt-in for cops that decide to include Autocorrector. I'm proposing that the location: key isn't supported in the new API (pass the range directly instead of the node in those case).

It's not clear to what extent compatibility is required, in particular with Cop#corrections array. I wrote a CorrectionsProxy so that corrections could still be used. Actually the rest of the code still uses it for now (Team, ...), this seemed like a good way to test compatiblity. I intend on cleaning things up though so that the core doesn't use Cop#corrections...

@marcandre marcandre changed the title Autocorrector: Proof of concept Improving auto correction's API Apr 12, 2020
@marcandre
Copy link
Contributor Author

marcandre commented Apr 15, 2020

The new parser has the required changes to the TreeRewriter (couldn't resist implementing them). I'll update this PR but the API proposal remains the same.

PS: I think I have an even cleaner proposal. Will reopen when updated.

@marcandre marcandre closed this Apr 15, 2020
@bbatsov
Copy link
Collaborator

bbatsov commented Apr 16, 2020

Sounds good to me! Sorry for the radio silence, but I didn't find the time to take a careful look at your proposal. Looking forward to the improved version!

@marcandre marcandre reopened this Apr 20, 2020
@marcandre
Copy link
Contributor Author

Updated the code and tweaked the API. I removed the syntax sugar part as it conflicts with the strict rules of RuboCop (no multiple block chaining). I believe it's not actually necessary and both the corrections and the code-to-execute-only-if-enabled can be in the same block. The revised API is thus simpler.

I never spelled out the downside of my proposal: it's potentially slower. For example, if RuboCop is run without the --autocorrect flag, the block is still call so the calculations for the autocorrections will still be run. My guess is that the time spent to calculated the corrections should typically be negligible compared to parsing and finding the offenses, so I don't think it will make a noticeable difference.

@owst owst mentioned this pull request Apr 25, 2020
8 tasks
@bbatsov
Copy link
Collaborator

bbatsov commented May 10, 2020

I'll try to find some time soon to review your proposal more carefully. In the mean time it'd be great if you ran some benchmarks regarding the performance impact.

@marcandre
Copy link
Contributor Author

benchmarks regarding the performance impact.

Did you have anything in mind?

I probably should not have mentioned the theoretical performance impact, because this PR affects only the correction part, which is typically negligible to the inspection part.

I think the performance impact will be undetectable:
case
when no offenses? then no speed difference
when many offenses for cops that can't autocorrect? then no difference
when many offenses for autocorrection cops and running with auto-correction? then no speed difference
when many offenses for autocorrection cops and running with auto-correction and cops don't spend an inordinate amount of time figuring out what the correction would be? then no speed difference
else
maybe a slowdown
end

I ran the test suite before and after the changes and got similar speeds (1m12 before vs 1m7 after, no I don't think it's noticeably faster either)

@marcandre
Copy link
Contributor Author

Question remaining to be answered:

Is compatiblity with Cop#corrections array required? It's not clear to me why a cop would fiddle with that. Still, I wrote a CorrectionsProxy that should be mostly compatible

@marcandre
Copy link
Contributor Author

Actually 1.0 is coming up real soon, which is great timing. I propose to include my CorrectionsProxy solution with deprecation warnings, in case any code out there uses the array directly.

I will try to finalize this PR (although the proposed API will remain unchanged) and think it would make a great addition to the 1.0 release.

@bbatsov
Copy link
Collaborator

bbatsov commented May 12, 2020

Is compatiblity with Cop#corrections array required? It's not clear to me why a cop would fiddle with that. Still, I wrote a CorrectionsProxy that should be mostly compatible

I don't think so. I can't even remember why we introduced this. Maybe @koic or @jonas054 do.

I went over your proposal more carefully and I like it quite a lot overall. I guess we'll gradually be shifting the auto-corrections to the new API as times goes on. The problem with big codebases is that after some point in time it gets really hard to keep the uniform - e.g. we never migrated all the code to the NodePattern, after it was introduced, we never adopted everywhere all the node extensions and so on.

@marcandre
Copy link
Contributor Author

Well, the corrections array was needed to hold the corrections because they might not be applied afterall (cops can exclude other cops). In an ideal world though it would have been part of a CopRunner or of the Team class, not of the Cop class that is meant to be extended.

It's going to be fun de refactor the existing cops to use the new API 😄; I'll focus first on cleaning up the Team.

@marcandre
Copy link
Contributor Author

marcandre commented May 17, 2020

I've made some progress on this. What I've done is modified cop/cop to reflect the proposed API, so anyone looking at that file for documentation would see the "modern" API / implementation. The compatibility with the legacy behavior is secluded in two legacy/..._support.rb files.

I have not yet changed the Team/Commissionner implementation, nor most of the specs, as a check that the behavior for the internal legacy API is correctly handled (e.g. Cop#corrections).

When I'm satisfied with this, the last step would be to have those use the modern API and add warning to the internal legacy API (e.g. Cop#corrections, but not the add_offense with the legacy API).

There's basically one spec that fails right now that has to do with error handling. Currently, if an error occurs when a Cop analyses a file, it is handled differently than if an error occurs when a Cop autocorrects it. In the former case, it logs the error or reraises it depending on the :raise option. In the latter it always raises. Compare the specs for the analysis part and for the autocorrection part. Note that both tests say "records Team#errors" but that's not what the second spec actually tests! BTW, we are are not talking about ClobberringErrors here, but any other StandardError.

Does anyone know the reason for this difference in behavior? Is there a potential issue in having the same behavior (i.e. raise only if the :raise setting is on)?

@bbatsov
Copy link
Collaborator

bbatsov commented May 17, 2020

Likely that's just some oversight.

There's basically one spec that fails right now that has to do with error handling. Currently, if an error occurs when a Cop analyses a file, it is handled differently than if an error occurs when a Cop autocorrects it. In the former case, it logs the error or reraises it depending on the :raise option. In the latter it always raises.

Probably we can simplify things, as I don't see much reason for having different approaches.

We'll also have to document all of this in the development docs, which are pretty light on such details right now. That's all we have on implementing auto-correction https://docs.rubocop.org/en/latest/development/#auto-correct I guess everyone just learns the ropes from other cops. :-)

@marcandre
Copy link
Contributor Author

Probably we can simplify things, as I don't see much reason for having different approaches.

Great, thanks for the answer.

We'll also have to document all of this in the development docs, which are pretty light on such details right now. That's all we have on implementing auto-correction https://docs.rubocop.org/en/latest/development/#auto-correct

Right. I'll update and improve the docs.

I guess everyone just learns the ropes from other cops. :-)

Or trying things out to see if it passes or not, like this cop passing [[location], location] instead of a Node!

@marcandre
Copy link
Contributor Author

@kittiphophum18 your comment appears to be a quote of my entire message (I removed it for brevity)

koic added a commit to koic/rubocop-minitest that referenced this pull request Apr 5, 2021
Follow rubocop/rubocop#7868.

This PR uses `Cop::Base` API for `Minitest` department's cops.
koic added a commit to koic/rubocop-github that referenced this pull request Apr 29, 2021
Follow rubocop/rubocop#7868.

The legacy `Cop::Cop` API is soft deprecated and this PR use
new `Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API will be removed in
> RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html

This new API requires RuboCop 0.87 or higher.
https://github.com/rubocop/rubocop/blob/master/CHANGELOG.md#0870-2020-07-06

Denpendency version of `s.add_dependency "rubocop"` does not need to be
updated in the gemspec if github#86 has been merged first.
koic added a commit to koic/rubocop that referenced this pull request Jun 12, 2021
Follow rubocop#7868.

This PR uses `Cop::Base` API for `Layout/RedundantLineBreak`.
bbatsov pushed a commit that referenced this pull request Jun 13, 2021
Follow #7868.

This PR uses `Cop::Base` API for `Layout/RedundantLineBreak`.
koic added a commit to koic/rubocop-thread_safety that referenced this pull request May 18, 2022
Follow rubocop/rubocop#7868.

The legacy `Cop::Cop` API is soft deprecated and this PR use new `Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API will be removed in RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html

This new API requires RuboCop 0.87 or higher.
https://github.com/rubocop/rubocop/blob/master/CHANGELOG.md#0870-2020-07-06

That dependent version update will be resolved in rubocop#7, so there are no updates in this PR.
koic added a commit to koic/rubocop-thread_safety that referenced this pull request May 18, 2022
Follow rubocop/rubocop#7868.

The legacy `Cop::Cop` API is soft deprecated and this PR use new `Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API will be removed in RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html

This new API requires RuboCop 0.87 or higher.
https://github.com/rubocop/rubocop/blob/master/CHANGELOG.md#0870-2020-07-06

That dependent version update will be resolved in rubocop#7, so there are no updates in this PR.
koic added a commit to koic/rubocop-github that referenced this pull request Jun 27, 2022
Follow rubocop/rubocop#7868.

The legacy `Cop::Cop` API is soft deprecated and this PR use
new `Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API will be removed in
> RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html

This new API requires RuboCop 0.87 or higher.
https://github.com/rubocop/rubocop/blob/master/CHANGELOG.md#0870-2020-07-06

Denpendency version of `s.add_dependency "rubocop"` does not need to be
updated in the gemspec if github#86 has been merged first.
renawatson68 added a commit to renawatson68/performance-develop-rubyonrails that referenced this pull request Sep 23, 2022
This PR fixes the following build error.

```console
Failures:

  1) RuboCop::Cop::Performance::ChainArrayAllocation Methods that
     require an argument first
     Failure/Error: expect(cop.messages.empty?).to be(false)

       expected false
            got true
       # ./spec/rubocop/cop/performance/chain_array_allocation_spec.rb:39:in
       `block (3 levels) in <top (required)>'
```

https://circleci.com/gh/rubocop-hq/rubocop-performance/1775

rubocop/rubocop#7868 has detected this wrong
test case and this PR fixed it.
renawatson68 added a commit to renawatson68/performance-develop-rubyonrails that referenced this pull request Sep 23, 2022
Follow rubocop/rubocop#7868.

This PR uses `Cop::Base` API for `Performance` department's cops.
richardstewart0213 added a commit to richardstewart0213/performance-build-Performance-optimization-analysis- that referenced this pull request Nov 4, 2022
This PR fixes the following build error.

```console
Failures:

  1) RuboCop::Cop::Performance::ChainArrayAllocation Methods that
     require an argument first
     Failure/Error: expect(cop.messages.empty?).to be(false)

       expected false
            got true
       # ./spec/rubocop/cop/performance/chain_array_allocation_spec.rb:39:in
       `block (3 levels) in <top (required)>'
```

https://circleci.com/gh/rubocop-hq/rubocop-performance/1775

rubocop/rubocop#7868 has detected this wrong
test case and this PR fixed it.
richardstewart0213 added a commit to richardstewart0213/performance-build-Performance-optimization-analysis- that referenced this pull request Nov 4, 2022
Follow rubocop/rubocop#7868.

This PR uses `Cop::Base` API for `Performance` department's cops.
pirj pushed a commit to rubocop/rubocop-capybara that referenced this pull request Dec 29, 2022
The autocorrection API was changed in Rubocop v0.87.0 (pull request rubocop/rubocop#7868).

Here, I change the superclass of `RuboCop::Cop::RSpec::Cop` from
`::RuboCop::Cop::Cop` to `::RuboCop::Cop::Base`.

This has a few consequences:

- Our `#message` methods get called with a different argument than
  before. It *can* be customized by defining a `#callback_argument`
  method, but I think it is clearer to avoid callbacks altogether.
- Our `#autocorrect` methods don't get called anymore. Instead, we
  extend `Autocorrector`, and the `corrector` is being yielded when
  calling `#add_offense`, so the code is mostly moved in there. For some
  cases, this means that some code can be removed, which is nice. For
  some cases, it means that the methods get too long, or the code
  complexity gets too high, and in those cases I chose to just call out
  to an `#autocorrect` method anyway, but of course passing the
  `corrector` and any usable local variables along.

This also means we bump the dependency of RuboCop quite a bit, from
'>= 0.68.1' to '>= 0.87.0'.
koic added a commit to koic/rubocop-i18n that referenced this pull request Feb 9, 2023
Follow rubocop/rubocop#7868.

The legacy `Cop::Cop` API is soft deprecated and this PR use
new `Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API will be removed in
> RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html
deivid-rodriguez pushed a commit to rubygems/rubygems that referenced this pull request Feb 10, 2023
Follow rubocop/rubocop#7868.

The legacy `RuboCop::Cop::Cop` API is soft deprecated and
this PR makes custom cop inherit `RuboCop::Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API
> will be removed in RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html

I've confirmed that the following behavior is compatible:

```console
$ bundle exec util/rubocop -r ./util/cops/deprecations --only Rubygems/Deprecations
```
koic added a commit to koic/rubocop-i18n that referenced this pull request Feb 11, 2023
Follow rubocop/rubocop#7868.

The legacy `Cop::Cop` API is soft deprecated and this PR use
new `Cop::Base` API instead.

> maintain any RuboCop extensions, as the legacy API will be removed in
> RuboCop 2.0.

https://metaredux.com/posts/2020/10/21/rubocop-1-0.html
ydah pushed a commit to rubocop/rubocop-factory_bot that referenced this pull request Apr 13, 2023
The autocorrection API was changed in Rubocop v0.87.0 (pull request rubocop/rubocop#7868).

Here, I change the superclass of `RuboCop::Cop::RSpec::Cop` from
`::RuboCop::Cop::Cop` to `::RuboCop::Cop::Base`.

This has a few consequences:

- Our `#message` methods get called with a different argument than
  before. It *can* be customized by defining a `#callback_argument`
  method, but I think it is clearer to avoid callbacks altogether.
- Our `#autocorrect` methods don't get called anymore. Instead, we
  extend `Autocorrector`, and the `corrector` is being yielded when
  calling `#add_offense`, so the code is mostly moved in there. For some
  cases, this means that some code can be removed, which is nice. For
  some cases, it means that the methods get too long, or the code
  complexity gets too high, and in those cases I chose to just call out
  to an `#autocorrect` method anyway, but of course passing the
  `corrector` and any usable local variables along.

This also means we bump the dependency of RuboCop quite a bit, from
'>= 0.68.1' to '>= 0.87.0'.
MarttiCheng added a commit to MarttiCheng/Rubocop-Performance that referenced this pull request Sep 28, 2023
This PR fixes the following build error.

```console
Failures:

  1) RuboCop::Cop::Performance::ChainArrayAllocation Methods that
     require an argument first
     Failure/Error: expect(cop.messages.empty?).to be(false)

       expected false
            got true
       # ./spec/rubocop/cop/performance/chain_array_allocation_spec.rb:39:in
       `block (3 levels) in <top (required)>'
```

https://circleci.com/gh/rubocop-hq/rubocop-performance/1775

rubocop/rubocop#7868 has detected this wrong
test case and this PR fixed it.
MarttiCheng added a commit to MarttiCheng/Rubocop-Performance that referenced this pull request Sep 28, 2023
Follow rubocop/rubocop#7868.

This PR uses `Cop::Base` API for `Performance` department's cops.
SerhiiMisiura added a commit to SerhiiMisiura/Rubocop-Performance that referenced this pull request Oct 5, 2023
This PR fixes the following build error.

```console
Failures:

  1) RuboCop::Cop::Performance::ChainArrayAllocation Methods that
     require an argument first
     Failure/Error: expect(cop.messages.empty?).to be(false)

       expected false
            got true
       # ./spec/rubocop/cop/performance/chain_array_allocation_spec.rb:39:in
       `block (3 levels) in <top (required)>'
```

https://circleci.com/gh/rubocop-hq/rubocop-performance/1775

rubocop/rubocop#7868 has detected this wrong
test case and this PR fixed it.
SerhiiMisiura added a commit to SerhiiMisiura/Rubocop-Performance that referenced this pull request Oct 5, 2023
Follow rubocop/rubocop#7868.

This PR uses `Cop::Base` API for `Performance` department's cops.
ydah pushed a commit to rubocop/rubocop-rspec_rails that referenced this pull request Mar 27, 2024
The autocorrection API was changed in Rubocop v0.87.0 (pull request rubocop/rubocop#7868).

Here, I change the superclass of `RuboCop::Cop::RSpec::Cop` from
`::RuboCop::Cop::Cop` to `::RuboCop::Cop::Base`.

This has a few consequences:

- Our `#message` methods get called with a different argument than
  before. It *can* be customized by defining a `#callback_argument`
  method, but I think it is clearer to avoid callbacks altogether.
- Our `#autocorrect` methods don't get called anymore. Instead, we
  extend `Autocorrector`, and the `corrector` is being yielded when
  calling `#add_offense`, so the code is mostly moved in there. For some
  cases, this means that some code can be removed, which is nice. For
  some cases, it means that the methods get too long, or the code
  complexity gets too high, and in those cases I chose to just call out
  to an `#autocorrect` method anyway, but of course passing the
  `corrector` and any usable local variables along.

This also means we bump the dependency of RuboCop quite a bit, from
'>= 0.68.1' to '>= 0.87.0'.
ydah pushed a commit to rubocop/rubocop-rspec_rails that referenced this pull request Mar 27, 2024
The autocorrection API was changed in Rubocop v0.87.0 (pull request rubocop/rubocop#7868).

Here, I change the superclass of `RuboCop::Cop::RSpec::Cop` from
`::RuboCop::Cop::Cop` to `::RuboCop::Cop::Base`.

This has a few consequences:

- Our `#message` methods get called with a different argument than
  before. It *can* be customized by defining a `#callback_argument`
  method, but I think it is clearer to avoid callbacks altogether.
- Our `#autocorrect` methods don't get called anymore. Instead, we
  extend `Autocorrector`, and the `corrector` is being yielded when
  calling `#add_offense`, so the code is mostly moved in there. For some
  cases, this means that some code can be removed, which is nice. For
  some cases, it means that the methods get too long, or the code
  complexity gets too high, and in those cases I chose to just call out
  to an `#autocorrect` method anyway, but of course passing the
  `corrector` and any usable local variables along.

This also means we bump the dependency of RuboCop quite a bit, from
'>= 0.68.1' to '>= 0.87.0'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants