Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Lint/OutOfRangeRefInRegexp cop #7755 #8407

Merged

Conversation

sonalinavlakhe
Copy link
Contributor

@sonalinavlakhe sonalinavlakhe commented Jul 27, 2020

This cop looks for out of range referencing for Regexp, as while capturing groups out of range reference always returns nil.

  /(foo)bar/ =~ 'foobar'

  #   bad - always returns nil
  #   puts $2 # => nil

  #   good
  #   puts $1 # => foo

Issue: #7755

Before submitting the PR make sure the following are checked:

  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.

Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea :-)

I think this cop can only marked as unsafe though, as I don't think there's a way to be sure of the diagnostic.

lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
spec/rubocop/cop/lint/out_of_range_ref_in_regexp_spec.rb Outdated Show resolved Hide resolved
lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
lib/rubocop/cop/lint/out_of_range_ref_in_regexp.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much done. Must be rebased to resolve conflict in Changelog.

Naming: OutOfRangeRefInRegexp => OutOfRangeRegexpRef ?

@@ -1,4 +1,4 @@
= Installation
= Installation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably didn't mean to change this

@@ -1620,6 +1620,12 @@ Lint/OrderedMagicComments:
Enabled: true
VersionAdded: '0.53'

Lint/OutOfRangeRefInRegexp:
Description: 'Checks for out of range reference for Regep because it always returns nil.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regep => Regexp

@sonalinavlakhe sonalinavlakhe force-pushed the feature/out_of_range_ref_in_regexp branch from d9c62a4 to a3497ff Compare July 30, 2020 16:40
@sonalinavlakhe
Copy link
Contributor Author

@marcandre, I have resolved all the suggested changes, please have a look and let me know.

Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a tough reviewer, I hope you don't mind 😅.

There should be a test for encountering $4 before any regexp. You'll want to do an initialization in on_new_investigation

MSG = 'Do not use out of range reference for the Regexp.'

def on_regexp(node)
@valid_ref = cop_config['Count']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand cop_config['Count'], maybe you simply meant nil?

Comment on lines 52 to 53
named_capture += 1 if e.instance_of?(Regexp::Expression::Group::Named)
numbered_capture += 1 if e.instance_of?(Regexp::Expression::Group::Capture)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rely on e.type instead of the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can do it but we have to check with the token instead.

tree.each_expression do |e|
  named_capture += 1 if e.token == :named
  numbered_capture += 1 if e.token == :capture
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither seem great to me, but that might be the gem's fault. Could you use e.capturing?, and then e.respond_to?(:name), say?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check with e.capturing? but again we cant check further for e.respond_to?(:name).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcandre, e.capturing? will raise an error
undefined method capturing? for #<Regexp::Expression::CharacterType::Word:0x00007fb466af9cf0>
for all non-group type expressions so better we to use e.type?(:group) then e.respond_to?(:name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.type?(:group) && e.respond_to?(:name) 👍, I think it's much better than relying on the class, even if longer...

Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a tough reviewer, and I don't see everything from the get go, I hope you don't mind 😅

You should add a test if $4 is encountered before any regexp. You'll need define on_new_investigation too.

@sonalinavlakhe
Copy link
Contributor Author

Never mind @marcandre, for me, it's a huge learning opportunity 😄

What I understood from the below comment
" you should add a test if $4 is encountered before any regexp. You'll need to define on_new_investigation too"
that you must be referring below example

$4
/(foo)(bar)/ =~ "foobar"

Do I need to initialize @valid_ref to nil in the on_new_investigation method? if so then I am not able to understand the use case for the on_new_investigation method because we are not creating any offense if we got the reference before any regexp.
Please suggest.

@marcandre
Copy link
Contributor

Do I need to initialize @valid_ref to nil in the on_new_investigation method?

Yes

if so then I am not able to understand the use case for the on_new_investigation method because we are not creating any offense if we got the reference before any regexp.

First, shouldn't we create an offense then?
Second: in warning mode you would get a 'uninitialized instance variable' I think. Also running the cop on a second source would still hold the result form the previous one.

@sonalinavlakhe
Copy link
Contributor Author

First, shouldn't we create an offense then?

Please suggest we should consider this case to raise offense.

Second:` in warning mode you would get a 'uninitialized instance variable' I think. Also running the cop on a second source would still hold the result form the previous one.

I got this one. Thanks 👍

@marcandre
Copy link
Contributor

Please suggest we should consider this case to raise offense.

Sorry, I don't understand... If I'm not mistaken, def foo; $4; end should raise an offense, in that it can not be non nil, so maybe we should initialize @valid_ref to 0, no?

@sonalinavlakhe
Copy link
Contributor Author

sonalinavlakhe commented Aug 1, 2020

Sorry, I don't understand... If I'm not mistaken, def foo; $4; end should raise an offense, in that it can not be non nil, so maybe we should initialize @valid_ref to 0, no?

Yes, you are right. only for regular expression with non-literals, I have to make @valid_ref set to nil, so we will have a check to not raise an offense.

Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! The implementation looks really good to me 🎉

You're still missing a test for $3 without a regexp though...

Comment on lines 83 to 128
it 'does not register an offence when containing a ivar' do
expect_no_offenses(<<~'RUBY')
@var = '(\d+)'
/(?<foo>#{@var}*)/ =~ "12"
puts $1
puts $3
RUBY
end

it 'does not register an offence when containing a cvar' do
expect_no_offenses(<<~'RUBY')
@@var = '(\d+)'
/(?<foo>#{@@var}*)/ =~ "12"
puts $1
puts $4
RUBY
end

it 'does not register an offence when containing a gvar' do
expect_no_offenses(<<~'RUBY')
$var = '(\d+)'
/(?<foo>#{$var}*)/ =~ "12"
puts $1
puts $2
RUBY
end

it 'does not register an offence when containing a method' do
expect_no_offenses(<<~'RUBY')
def do_something
'(\d+)'
end
/(?<foo>#{do_something}*)/ =~ "12"
puts $1
puts $4
RUBY
end

it 'does not register an offence when containing a constant' do
expect_no_offenses(<<~'RUBY')
CONST = "12"
/(?<foo>#{CONST}*)/ =~ "12"
puts $1
puts $3
RUBY
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these tests can be removed, we only need to test one "dynamic" case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @marcandre but actually these specs are self-explanatory and devs who are reading these will get to understand the code better is what I think, please give an example of how this can be improved better with dynamic test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant to say is that the previous test (with a variable) is sufficient; a good implementation (like we have now) can not pass the variable test and fail the instance variable, the method call, etc... or vice versa. They all test the same idea:is the regexp dynamic or not. It's not about what is inside the #{...}, just that there is a #{...} or not. Otherwise to be "complete", we'd have to add all possibilities, for example method call with block, or with rescue, or literal, or ...

it 'does not register offense when using a Regexp cannot be processed by regexp_parser gem' do
expect_no_offenses(<<~'RUBY')
/data = ({"words":.+}}}[^}]*})/m
RUBY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add $3 or something. Hopefully the Regexp parser will handle this one day and that test will fail, but right now it would always pass.

Copy link
Contributor

@marcandre marcandre Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added $1 in last commit to fix this. please refer this

@sonalinavlakhe
Copy link
Contributor Author

@marcandre, Please review the updated changes. let me know all good or we can still improve 😄

Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe it or not, I found something else 🤣

module Lint
# This cops looks for out of range referencing for Regexp, as while capturing groups out of
# out of range reference always returns nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that the doc file is incomplete, because you have a missing # here


# @example
# /(foo)bar/ =~ 'foobar'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here...


# # bad - always returns nil
# puts $2 # => nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here. Might be a cop idea! 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, hope I fixed everything now 🤣 🤞

@sonalinavlakhe sonalinavlakhe force-pushed the feature/out_of_range_ref_in_regexp branch from 9d91d81 to c33997b Compare August 3, 2020 07:00
Copy link
Contributor

@marcandre marcandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last tweaks, I promise 😅

it 'does not register offense when using a Regexp cannot be processed by regexp_parser gem' do
expect_no_offenses(<<~'RUBY')
/data = ({"words":.+}}}[^}]*})/m
RUBY
Copy link
Contributor

@marcandre marcandre Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

module RuboCop
module Cop
module Lint
# This cops looks for out of range referencing for Regexp, as while capturing groups out of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a native speaker, but the "as while" sounds strange to me. How about?

"This cops looks for references of Regexp captures that are out of range and thus always returns nil."

@sonalinavlakhe
Copy link
Contributor Author

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in the last commit to fix this. please refer this

@marcandre
Copy link
Contributor

There's still no reason to raise an offense here once parsing is fixed, right? That's why I wrote to use $3 or something.

I have added $1 in the last commit to fix this.

Right, but the "unparseable" regexp has one captured group, so $1 is not out-of-range...

@sonalinavlakhe
Copy link
Contributor Author

Right, but the "unparseable" regexp has one captured group, so $1 is not out-of-range...

Sorry, I missed that one, when I tried to use other than $1 i.e $2 or $3 this test is failing. I checked more on this and conclude that regex_parser now able to parse this type of expression.
I checked with a series of examples - #8083 and ammar/regexp_parser#15. regex_parser parsing those without throwing an exception

irb(main):001:0> require 'regexp_parser'
=> true
irb(main):002:0> Regexp::Parser.parse('data = ({"words":.+}}}[^}]*})')
=> #<Regexp::Expression::Root:0x00007fa07a9c5c08 @type=:expression, @token=:root, @text="", @ts=0, @level=nil, @set_level=nil, @conditional_level=nil, @nesting_level=0, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b857bb8 @type=:literal, @token=:literal, @text="data = ", @ts=0, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}>, #<Regexp::Expression::Group::Capture:0x00007fa07b857b90 @type=:group, @token=:capture, @text="(", @ts=7, @level=0, @set_level=0, @conditional_level=0, @nesting_level=1, @quantifier=nil, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b8579d8 @type=:literal, @token=:literal, @text="{\"words\":", @ts=8, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>, #<Regexp::Expression::CharacterType::Any:0x00007fa07b8579b0 @type=:meta, @token=:dot, @text=".", @ts=17, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00007fa07b857938 @token=:one_or_more, @text="+", @mode=:greedy, @min=1, @max=-1>, @options={}>, #<Regexp::Expression::Literal:0x00007fa07b8578e8 @type=:literal, @token=:literal, @text="}}}", @ts=19, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>, #<Regexp::Expression::CharacterSet:0x00007fa07b8578c0 @negative=true, @closed=true, @type=:set, @token=:character, @text="[", @ts=22, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=#<Regexp::Expression::Quantifier:0x00007fa07b857640 @token=:zero_or_more, @text="*", @mode=:greedy, @min=0, @max=-1>, @options={}, @expressions=[#<Regexp::Expression::Literal:0x00007fa07b8577d0 @type=:literal, @token=:literal, @text="}", @ts=24, @level=1, @set_level=1, @conditional_level=0, @nesting_level=3, @quantifier=nil, @options={}>]>, #<Regexp::Expression::Literal:0x00007fa07b8575f0 @type=:literal, @token=:literal, @text="}", @ts=27, @level=1, @set_level=0, @conditional_level=0, @nesting_level=2, @quantifier=nil, @options={}>], @number=1, @number_at_level=1>]> 

Please suggest should we remove the rescue block then?

rescue Regexp::Scanner::ScannerError
  return
end

as Errors like this (https://github.com/ammar/regexp_parser/blob/master/spec/scanner/errors_spec.rb) are handled by Lint/Syntax cop.
#app/controllers/accounts_controller.rb

/\p{foobar}/
$1
Inspecting 1 file
E

Offenses:

app/controllers/accounts_controller.rb:7:4: E: Lint/Syntax: invalid character property name {foobar}: /\p{foobar}/
(Using Ruby 2.5 parser; configure using TargetRubyVersion parameter, under AllCops)
   /\p{foobar}/
   ^^^^^^^^^^^^

1 file inspected, 1 offense detected

@marcandre
Copy link
Contributor

That's good news :-)

Right, I see in their Changelog the issue has been changed in v1.7.1, so we could bump our gemspec requirement to that version and remove the rescue (and the test). If ever there's such an exception, it won't be the end of the world (just a notice there was an issue when processing the file) and a bug report can be filed with the gem.

It also means that the other rescue in the code could be removed and that the associated specs are incorrect (as they should fail with the regexp parser 1.7.1)

@sonalinavlakhe
Copy link
Contributor Author

That's good news :-)

Right, I see in their Changelog the issue has been changed in v1.7.1, so we could bump our gemspec requirement to that version and remove the rescue (and the test). If ever there's such an exception, it won't be the end of the world (just a notice there was an issue when processing the file) and a bug report can be filed with the gem.

It also means that the other rescue in the code could be removed and that the associated specs are incorrect (as they should fail with the regexp parser 1.7.1)

@marcandre Well, I got this ok I have to remove the rescue block from code and the test from the out_of_range_regexp_ref_spec.rb.
Any other action item needed to fix this?

@marcandre
Copy link
Contributor

Any other action item needed to fix this?

No, rest can be done separately.

You'll have to rebase to fix the Changelog conflict though

@sonalinavlakhe sonalinavlakhe force-pushed the feature/out_of_range_ref_in_regexp branch from 8c36fe8 to b5e412a Compare August 5, 2020 15:18
@sonalinavlakhe
Copy link
Contributor Author

@marcandre, I have done rebase and all the changes.

@marcandre marcandre merged commit a678aba into rubocop:master Aug 5, 2020
@marcandre
Copy link
Contributor

🎉 Thank you @sonalinavlakhe for this PR and your quick turn around to my multiple review comments 👍

Congratulations on your first Cop 🍾 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants