Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle comments and blank interpolations in regexp parsed_tree #8803

4 changes: 4 additions & 0 deletions CHANGELOG.md
Expand Up @@ -2,6 +2,10 @@

## master (unreleased)

### Changes

* [#8803](https://github.com/rubocop-hq/rubocop/pull/8803): Extend RegexpNode#parsed_tree to handle regexps including interpolaton and comments. ([@owst][])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^interpolaton^interpolation
Also, please back-quote RegexpNode#parsed_tree

Might change altogether, read below


## 0.92.0 (2020-09-25)

### New features
Expand Down
29 changes: 25 additions & 4 deletions lib/rubocop/ext/regexp_node.rb
Expand Up @@ -16,12 +16,16 @@ class << self
@parsed_cache = {}

# @return [Regexp::Expression::Root, nil]
def parsed_tree
return if interpolation?
def parsed_tree(interpolation: :ignore)
unless %i[ignore blank].include?(interpolation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When thinking about the fact that :ignore is confusing (it's "ignore the regexp itself", not the interpolations), the only good name I could come up with was :return_nil and that's a code smell.

That made me wonder if the best course of action isn't to remove the option altogether, and make blanking the default. What do you think? It should be the callers job to check for interpolation lint/mixed_regexp_capture_types would need to do that).

Should be documented, and marked as breaking change in the Changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'm not really a fan of interpolation :ignore. I like the idea of always blanking (that is, until we find a scenario where it isn't appropriate 😄).

Having had a quick look at lint/mixed_regexp_capture_types I think it could also just use blanking, as we could catch strictly more issues (where the regexp contains mixed capture groups in addition to an interpolation)

I'll take a look at making this change, and report back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right about lint/mixed_regexp_capture_types 🤦‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for now, to keep this PR focussed, I left lint/mixed_regexp_capture_types as is - i.e. it still does nothing in the presence of interpolation. A nice follow-up might be to change it, as I mentioned previously

raise ArgumentError, 'interpolation must be one of :ignore or :blank'
end

return if interpolation? && interpolation == :ignore

str = content
str = with_interpolations_blanked
Ext::RegexpNode.parsed_cache[str] ||= begin
Regexp::Parser.parse(str)
Regexp::Parser.parse(str, options: options)
rescue StandardError
nil
end
Expand All @@ -40,6 +44,23 @@ def each_capture(named: ANY)
self
end

private

def with_interpolations_blanked
children.reject(&:regopt_type?).map do |child|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one regopt at the end, right? If so, .reject(&:regopt_type?) => [0...-1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I don't think I'd realised that even for a Regexp without options the regopt node was still present. I think the original was better readability-wise, but the new version is probably fine with a comment 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I add a RegexpNode#content_nodes to rubocop-ast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my slight concern is that RegexpNode#content already exists and ignores interpolation (those with begin_type?) nodes, whereas RegexpNode#content_nodes would not ignore interpolation nodes. I'm not sure I have an alternative suggestion, though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know. I would actually mark content as # @deprecated; I can't think of a single good use for that method.

source = child.source

# We don't want to consider the contents of interpolations as part of the pattern source,
# but need to preserve their width, to allow offsets to correctly line up with the
# original source: spaces have no effect, and preserve width.
if child.begin_type?
' ' * source.length
else
source
end
end.join
end

AST::RegexpNode.include self
end
end
Expand Down
4 changes: 2 additions & 2 deletions rubocop.gemspec
Expand Up @@ -36,9 +36,9 @@ Gem::Specification.new do |s|
s.add_runtime_dependency('parallel', '~> 1.10')
s.add_runtime_dependency('parser', '>= 2.7.1.5')
s.add_runtime_dependency('rainbow', '>= 2.2.2', '< 4.0')
s.add_runtime_dependency('regexp_parser', '>= 1.7')
s.add_runtime_dependency('regexp_parser', '>= 1.8')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.add_runtime_dependency('rexml')
s.add_runtime_dependency('rubocop-ast', '>= 0.5.0')
s.add_runtime_dependency('rubocop-ast', '>= 0.6.0')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.add_runtime_dependency('ruby-progressbar', '~> 1.7')
s.add_runtime_dependency('unicode-display_width', '>= 1.4.0', '< 2.0')

Expand Down
88 changes: 88 additions & 0 deletions spec/rubocop/ext/regexp_node_spec.rb
Expand Up @@ -32,4 +32,92 @@
it { is_expected.to match [named] }
end
end

describe '#parsed_tree' do
let(:source) { '/foo#{bar}baz/' }

context 'with an extended mode regexp with comment' do
let(:source) { '/42 # the answer/x' }

it 'returns the expected tree' do
tree = node.parsed_tree(interpolation: :ignore)

expect(tree.is_a?(Regexp::Expression::Root)).to eq(true)
expect(tree.map(&:token)).to eq(%i[literal whitespace comment])
end
end

context 'with interpolation: :ignore' do
context 'with a regexp containing interpolation' do
it { expect(node.parsed_tree(interpolation: :ignore)).to eq(nil) }
end

context 'with a regexp not containing interpolation' do
let(:source) { '/foobaz/' }

it 'returns the expected tree' do
tree = node.parsed_tree(interpolation: :ignore)

expect(tree.is_a?(Regexp::Expression::Root)).to eq(true)
expect(tree.to_s).to eq('foobaz')
end
end
end

context 'with interpolation: :blank' do
context 'with a regexp containing interpolation' do
it 'returns the expected blanked tree' do
tree = node.parsed_tree(interpolation: :blank)

expect(tree.is_a?(Regexp::Expression::Root)).to eq(true)
expect(tree.to_s).to eq('foo baz')
end
end

context 'with a regexp containing a multi-line interpolation' do
let(:source) do
<<~'REGEXP'
/
foo
#{
bar(
42
)
}
baz
/
REGEXP
end

it 'returns the expected blanked tree' do
tree = node.parsed_tree(interpolation: :blank)

expect(tree.is_a?(Regexp::Expression::Root)).to eq(true)
expect(tree.to_s.split("\n")).to eq(
[
'',
' foo',
' ' * 32,
' baz'
]
)
end
end

context 'with a regexp not containing interpolation' do
let(:source) { '/foobaz/' }

it 'returns the expected tree' do
tree = node.parsed_tree(interpolation: :blank)

expect(tree.is_a?(Regexp::Expression::Root)).to eq(true)
expect(tree.to_s).to eq('foobaz')
end
end
end

context 'with interpolation: :other' do
it { expect { node.parsed_tree(interpolation: :other) }.to raise_error(ArgumentError) }
end
end
end