Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use regexp_parser to improve Style/RedundantRegexp... cops #8625

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c474274
Use `regexp_parser` to improve `Style/RedundantRegexp...` cops
owst Aug 31, 2020
4bbf4cb
fixup! Use `regexp_parser` to improve `Style/RedundantRegexp...` cops
owst Aug 31, 2020
1fd5bf6
Merge remote-tracking branch 'upstream/master' into use_regexp_parser…
owst Sep 5, 2020
1608eb7
fixup! Merge remote-tracking branch 'upstream/master' into use_regexp…
owst Sep 5, 2020
a591d13
fixup! fixup! Use `regexp_parser` to improve `Style/RedundantRegexp..…
owst Sep 9, 2020
7598a99
Merge remote-tracking branch 'upstream/master' into use_regexp_parser…
owst Sep 9, 2020
db7cd65
fixup! Merge remote-tracking branch 'upstream/master' into use_regexp…
owst Sep 9, 2020
c1520e9
fixup! fixup! Merge remote-tracking branch 'upstream/master' into use…
owst Sep 9, 2020
9eac941
fixup! fixup! fixup! Merge remote-tracking branch 'upstream/master' i…
owst Sep 26, 2020
ad3540b
fixup! fixup! fixup! fixup! Merge remote-tracking branch 'upstream/ma…
owst Sep 26, 2020
33caf59
Merge remote-tracking branch 'upstream/master' into use_regexp_parser…
owst Sep 26, 2020
477e3c8
fixup! Merge remote-tracking branch 'upstream/master' into use_regexp…
owst Sep 26, 2020
6d3c405
fixup! fixup! Merge remote-tracking branch 'upstream/master' into use…
owst Sep 26, 2020
3b777ac
Test parsed_tree
owst Sep 27, 2020
cb668c1
Handle comments and blank interpolations in regexp parsed_tree
owst Sep 27, 2020
3e35f83
Merge branch 'handle_comments_and_blank_interpolations_in_regexp_pars…
owst Sep 27, 2020
7693728
Handle comments and blank interpolations in regexp parsed_tree
owst Sep 27, 2020
fd8b231
Merge branch 'handle_comments_and_blank_interpolations_in_regexp_pars…
owst Sep 27, 2020
201fc15
fixup! Merge branch 'handle_comments_and_blank_interpolations_in_rege…
owst Sep 27, 2020
df4f6b4
fixup! Handle comments and blank interpolations in regexp parsed_tree
owst Sep 28, 2020
cfec9ed
fixup! fixup! Handle comments and blank interpolations in regexp pars…
owst Sep 28, 2020
770b91e
Add test for #8805
owst Sep 28, 2020
9b3fe89
Merge remote-tracking branch 'upstream/master' into use_regexp_parser…
owst Sep 29, 2020
b25fd8e
Merge remote-tracking branch 'upstream/master' into handle_comments_a…
owst Sep 29, 2020
cb8a04b
fixup! Merge remote-tracking branch 'upstream/master' into handle_com…
owst Sep 29, 2020
37b2e89
fixup! Merge remote-tracking branch 'upstream/master' into use_regexp…
owst Sep 29, 2020
064b245
Merge branch 'handle_comments_and_blank_interpolations_in_regexp_pars…
owst Sep 29, 2020
ee5bcd8
fixup! Merge branch 'handle_comments_and_blank_interpolations_in_rege…
owst Sep 29, 2020
85d7f4b
fixup! fixup! Merge branch 'handle_comments_and_blank_interpolations_…
owst Sep 29, 2020
a73a64f
Merge remote-tracking branch 'upstream/master' into handle_comments_a…
owst Sep 29, 2020
48ed439
Merge remote-tracking branch 'upstream/master' into use_regexp_parser…
owst Sep 29, 2020
86f6e12
fixup! Merge remote-tracking branch 'upstream/master' into use_regexp…
owst Sep 29, 2020
a6b3cd1
Merge remote-tracking branch 'upstream/master' into handle_comments_a…
owst Sep 30, 2020
9b61289
fixup! Merge remote-tracking branch 'upstream/master' into handle_com…
owst Sep 30, 2020
62007c3
Merge branch 'handle_comments_and_blank_interpolations_in_regexp_pars…
owst Sep 30, 2020
71fe7da
fixup! Merge branch 'handle_comments_and_blank_interpolations_in_rege…
owst Sep 30, 2020
a21a007
Merge remote-tracking branch 'upstream/master' into use_regexp_parser…
owst Sep 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -11,6 +11,7 @@
### Changes

* [#8470](https://github.com/rubocop-hq/rubocop/issues/8470): Do not autocorrect `Style/StringConcatenation` when parts of the expression are too complex. ([@dvandersluis][])
* [#8625](https://github.com/rubocop-hq/rubocop/pull/8625): Improve `Style/RedundantRegexpCharacterClass` and `Style/RedundantRegexpEscape` by using `regexp_parser` gem. ([@owst][])

## 0.90.0 (2020-09-01)

Expand Down
62 changes: 40 additions & 22 deletions lib/rubocop/cop/style/redundant_regexp_character_class.rb
Expand Up @@ -22,32 +22,15 @@ module Style
# # good
# r = /[ab]/
class RedundantRegexpCharacterClass < Base
include MatchRange
include RegexpLiteralHelp
extend AutoCorrector

REQUIRES_ESCAPE_OUTSIDE_CHAR_CLASS_CHARS = '.*+?{}()|$'.chars.freeze
MSG_REDUNDANT_CHARACTER_CLASS = 'Redundant single-element character class, ' \
'`%<char_class>s` can be replaced with `%<element>s`.'

PATTERN = /
(
(?<!\\) # No \-prefix (i.e. not escaped)
\[ # Literal [
(?!\#\{) # Not (the start of) an interpolation
(?: # Either...
\\[^b] | # Any escaped character except b (which would change behaviour)
[^.*+?{}()|$] | # or one that doesn't require escaping outside the character class
\\[upP]\{[^}]+\} # or a unicode code-point or property
)
(?<!\\) # No \-prefix (i.e. not escaped)
\] # Literal ]
)
/x.freeze

def on_regexp(node)
each_redundant_character_class(node) do |loc|
next if whitespace_in_free_space_mode?(node, loc)

add_offense(
loc, message: format(
MSG_REDUNDANT_CHARACTER_CLASS,
Expand All @@ -63,19 +46,54 @@ def on_regexp(node)
private

def each_redundant_character_class(node)
pattern_source(node).scan(PATTERN) do
yield match_range(node.loc.begin.end, Regexp.last_match)
each_single_element_character_class(node) do |char_class|
next unless redundant_single_element_character_class?(node, char_class)

yield node.loc.begin.adjust(begin_pos: 1 + char_class.ts, end_pos: char_class.te)
end
end

def each_single_element_character_class(node)
Regexp::Parser.parse(pattern_source(node)).each_expression do |expr|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use node.parsed_tree here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately parsed_tree doesn't support patterns containing an interpolation. @marcandre I see you added that method - do you recall why you return nil for patterns containing interpolations? As it stands, this diff causes a bunch of spec failures:

diff --git a/lib/rubocop/cop/style/redundant_regexp_character_class.rb b/lib/rubocop/cop/style/redundant_regexp_character_class.rb
index 214a84743..7a3251994 100644
--- a/lib/rubocop/cop/style/redundant_regexp_character_class.rb
+++ b/lib/rubocop/cop/style/redundant_regexp_character_class.rb
@@ -54,7 +54,8 @@ module RuboCop
         end
 
         def each_single_element_character_class(node)
-          Regexp::Parser.parse(pattern_source(node)).each_expression do |expr|
+          # Regexp::Parser.parse(pattern_source(node)).each_expression do |expr|
+          node.parsed_tree.each_expression do |expr|
             next if expr.type != :set || expr.expressions.size != 1
             next if expr.negative? || %i[posixclass set].include?(expr.expressions.first.type)

mostly of the form NoMethodError: undefined method each_expression' for nil:NilClass`.

N.B. I introduced pattern_source (which replaces interpolation and comments with spaces) to remove (as far as the cops are concerned) the "effect" of the interpolation, whilst preserving the indentation/width of the interpolation in the pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could incorporate the "interpolation-blanking" into parsed_tree, or some other mechanism of supporting interpolation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of blanking the interpolations, makes sense and could be quite useful.
Maybe parsed_tree could take a keyword argument :interpolation with value :ignore or :blank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a go at this - please let me know what you think @marcandre!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome.
Ideally, the changes to parsed_tree would be in a different PR, and with at least a test.
It's not clear to me that replacing the interpolations with spaces has no effect, I thought you were going to simply delete them...
I also don't know why the treatment of the comments.

next if expr.type != :set || expr.expressions.size != 1
next if expr.negative? || %i[posixclass set].include?(expr.expressions.first.type)

yield expr
end
rescue Regexp::Scanner::ScannerError
marcandre marked this conversation as resolved.
Show resolved Hide resolved
# Handle malformed patterns that are accepted by Ruby but cause the regexp_parser gem to
# error, see https://github.com/rubocop-hq/rubocop/issues/8083 for details
end

def redundant_single_element_character_class?(node, char_class)
class_elem = char_class.expressions.first.text

non_redundant =
whitespace_in_free_space_mode?(node, class_elem) ||
backslash_b?(class_elem) ||
requires_escape_outside_char_class?(class_elem)

!non_redundant
end

def without_character_class(loc)
loc.source[1..-2]
end

def whitespace_in_free_space_mode?(node, loc)
def whitespace_in_free_space_mode?(node, elem)
return false unless freespace_mode_regexp?(node)

/\[\s\]/.match?(loc.source)
/\s/.match?(elem)
end

def backslash_b?(elem)
# \b's behaviour is different inside and outside of a character class, matching word
# boundaries outside but backspace (0x08) when inside.
elem == '\b'
end

def requires_escape_outside_char_class?(elem)
REQUIRES_ESCAPE_OUTSIDE_CHAR_CLASS_CHARS.include?(elem)
end
end
end
Expand Down
27 changes: 13 additions & 14 deletions lib/rubocop/cop/style/redundant_regexp_escape.rb
Expand Up @@ -59,9 +59,9 @@ def on_regexp(node)

def allowed_escape?(node, char, within_character_class)
# Strictly speaking a few single-letter metachars are currently
# unnecessary to "escape", e.g. g, i, E, F, but enumerating them is
# unnecessary to "escape", e.g. i, E, F, but enumerating them is
# rather difficult, and their behaviour could change over time with
# different versions of Ruby so that e.g. /\g/ != /g/
# different versions of Ruby so that e.g. /\i/ != /i/
return true if /[[:alnum:]]/.match?(char)
return true if ALLOWED_ALWAYS_ESCAPES.include?(char) || delimiter?(node, char)

Expand All @@ -82,21 +82,20 @@ def delimiter?(node, char)
end

def each_escape(node)
pattern_source(node).each_char.with_index.reduce(
[nil, 0]
) do |(previous, char_class_depth), (current, index)|
if previous == '\\'
yield [current, index - 1, !char_class_depth.zero?]

[nil, char_class_depth]
elsif previous == '['
[current, char_class_depth + 1]
elsif current == ']'
[current, char_class_depth - 1]
Regexp::Parser.parse(
pattern_source(node)
).traverse.reduce(0) do |char_class_depth, (event, expr)|
yield(expr.text[1], expr.ts, !char_class_depth.zero?) if expr.type == :escape

if expr.type == :set
char_class_depth + (event == :enter ? 1 : -1)
else
[current, char_class_depth]
char_class_depth
end
end
rescue Regexp::Scanner::ScannerError
# Handle malformed patterns that are accepted by Ruby but cause the regexp_parser gem to
# error, see https://github.com/rubocop-hq/rubocop/issues/8083 for details
end

def escape_range_at_index(node, index)
Expand Down
52 changes: 52 additions & 0 deletions spec/rubocop/cop/style/redundant_regexp_character_class_spec.rb
Expand Up @@ -30,6 +30,45 @@
end
end

context 'with a character class containing a single character inside a gruop' do
it 'registers an offense and corrects' do
expect_offense(<<~RUBY)
foo = /([a])/
^^^ Redundant single-element character class, `[a]` can be replaced with `a`.
RUBY

expect_correction(<<~RUBY)
foo = /(a)/
RUBY
end
end

context 'with a character class containing a single range' do
it 'does not register an offense' do
expect_no_offenses('foo = /[a-z]/')
end
end

context 'with a character class containing a posix bracket expression' do
it 'does not register an offense' do
expect_no_offenses('foo = /[[:alnum:]]/')
end
end

context 'with a character class containing set intersection' do
it 'does not register an offense' do
expect_no_offenses('foo = /[[:alnum:]&&a-d]/')
end
end

context "with a regexp containing invalid \g escape" do
it 'does not register an offense' do
# See https://ruby-doc.org/core-2.7.1/Regexp.html#class-Regexp-label-Subexpression+Calls
# \g should be \g<name>
expect_no_offenses('foo = /[a]\g/')
end
end

context 'with a character class containing an escaped ]' do
it 'registers an offense and corrects' do
expect_offense(<<~'RUBY')
Expand Down Expand Up @@ -224,6 +263,19 @@
end
end

context 'with a redundant character class after an interpolation' do
it 'registers an offense and corrects' do
expect_offense(<<~'RUBY')
foo = /#{x}[a]/
^^^ Redundant single-element character class, `[a]` can be replaced with `a`.
RUBY

expect_correction(<<~'RUBY')
foo = /#{x}a/
RUBY
end
end

context 'with a multi-line interpolation' do
it 'ignores offenses in the interpolated expression' do
expect_no_offenses(<<~'RUBY')
Expand Down
23 changes: 22 additions & 1 deletion spec/rubocop/cop/style/redundant_regexp_escape_spec.rb
Expand Up @@ -29,7 +29,7 @@
end

[
('a'..'z').to_a - %w[c n p u x],
('a'..'z').to_a - %w[c g k n p u x],
('A'..'Z').to_a - %w[C M P],
%w[n101 x41 u0041 u{0041} cc C-c p{alpha} P{alpha}]
].flatten.each do |escape|
Expand All @@ -46,6 +46,14 @@
end
end

context "with an invalid \g escape" do
it 'does not register an offense' do
# See https://ruby-doc.org/core-2.7.1/Regexp.html#class-Regexp-label-Subexpression+Calls
# \g should be \g<name>
expect_no_offenses('foo = /\g/')
end
end

context "with an escaped 'M-a' outside a character class" do
it 'does not register an offense' do
expect_no_offenses('foo = /\\M-a/n')
Expand Down Expand Up @@ -79,6 +87,19 @@
end
end

context "with an escaped '+' inside a character class inside a group" do
it 'registers an offense and corrects' do
expect_offense(<<~'RUBY')
foo = /([\+])/
^^ Redundant escape inside regexp literal
RUBY

expect_correction(<<~RUBY)
foo = /([+])/
RUBY
end
end

context 'with an escaped . inside a character class beginning with :' do
it 'registers an offense and corrects' do
expect_offense(<<~'RUBY')
Expand Down