Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsed Regexp utilities #8510

Merged
merged 2 commits into from Aug 26, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Expand Up @@ -15,7 +15,7 @@
* Add new `Lint/TrailingCommaInAttributeDeclaration` cop. ([@drenmi][])
* [#8578](https://github.com/rubocop-hq/rubocop/pull/8578): Add `:restore_registry` context and `stub_cop_class` helper class. ([@marcandre][])
* [#8579](https://github.com/rubocop-hq/rubocop/pull/8579): Add `Cop.documentation_url`. ([@marcandre][])

* [#8510](https://github.com/rubocop-hq/rubocop/pull/8510): Add `RegexpNode#each_capture` and `parsed_tree`. ([@marcandre][])

### Bug fixes

Expand Down
1 change: 1 addition & 0 deletions lib/rubocop.rb
Expand Up @@ -9,6 +9,7 @@
require 'unicode/display_width/no_string_ext'
require 'rubocop-ast'
require_relative 'rubocop/ast_aliases'
require_relative 'rubocop/ext/regexp_node'

require_relative 'rubocop/version'

Expand Down
37 changes: 2 additions & 35 deletions lib/rubocop/cop/lint/mixed_regexp_capture_types.rb
Expand Up @@ -25,44 +25,11 @@ class MixedRegexpCaptureTypes < Base
'in a Regexp literal.'

def on_regexp(node)
return if contain_non_literal?(node)

begin
tree = Regexp::Parser.parse(node.content)
# Returns if a regular expression that cannot be processed by regexp_parser gem.
# https://github.com/rubocop-hq/rubocop/issues/8083
rescue Regexp::Scanner::ScannerError
return
end

return unless named_capture?(tree)
return unless numbered_capture?(tree)
return if node.each_capture(named: false).none?
return if node.each_capture(named: true).none?

add_offense(node)
end

private

def contain_non_literal?(node)
if node.respond_to?(:type) && (node.variable? || node.send_type? || node.const_type?)
return true
end
return false unless node.respond_to?(:children)

node.children.any? { |child| contain_non_literal?(child) }
end

def named_capture?(tree)
tree.each_expression.any? do |e|
e.instance_of?(Regexp::Expression::Group::Capture)
end
end

def numbered_capture?(tree)
tree.each_expression.any? do |e|
e.instance_of?(Regexp::Expression::Group::Named)
end
end
end
end
end
Expand Down
28 changes: 9 additions & 19 deletions lib/rubocop/cop/lint/out_of_range_regexp_ref.rb
Expand Up @@ -64,25 +64,15 @@ def on_nth_ref(node)

private

def check_regexp(regexp)
return if contain_non_literal?(regexp)

tree = Regexp::Parser.parse(regexp.content)
@valid_ref = regexp_captures(tree)
end

def contain_non_literal?(node)
node.children.size != 2 || !node.children.first.str_type?
end

def regexp_captures(tree)
named_capture = numbered_capture = 0
tree.each_expression do |e|
if e.type?(:group)
e.respond_to?(:name) ? named_capture += 1 : numbered_capture += 1
end
end
named_capture.positive? ? named_capture : numbered_capture
def check_regexp(node)
return if node.interpolation?

named_capture = node.each_capture(named: true).count
@valid_ref = if named_capture.positive?
named_capture
else
node.each_capture(named: false).count
end
end
end
end
Expand Down
46 changes: 46 additions & 0 deletions lib/rubocop/ext/regexp_node.rb
@@ -0,0 +1,46 @@
# frozen_string_literal: true

module RuboCop
module Ext
# Extensions to AST::RegexpNode for our cached parsed regexp info
module RegexpNode
ANY = Object.new
def ANY.==(_)
true
end
private_constant :ANY

class << self
attr_reader :parsed_cache
end
@parsed_cache = {}

# @return [Regexp::Expression::Root, nil]
def parsed_tree
return if interpolation?

str = content
Ext::RegexpNode.parsed_cache[str] ||= begin
Regexp::Parser.parse(str)
rescue StandardError
nil
end
end

def each_capture(named: ANY)
return enum_for(__method__, named: named) unless block_given?

parsed_tree&.traverse do |event, exp, _index|
yield(exp) if event == :enter &&
named == exp.respond_to?(:name) &&
exp.respond_to?(:capturing?) &&
exp.capturing?
end

self
end

AST::RegexpNode.include self
end
end
end
35 changes: 35 additions & 0 deletions spec/rubocop/ext/regexp_node_spec.rb
@@ -0,0 +1,35 @@
# frozen_string_literal: true

require 'timeout'

RSpec.describe RuboCop::Ext::RegexpNode do
let(:source) { '/(hello)(?<foo>world)(?:not captured)/' }
let(:processed_source) { parse_source(source) }
let(:ast) { processed_source.ast }
let(:node) { ast }

describe '#each_capture' do
subject(:captures) { node.each_capture(**arg).to_a }

let(:named) { be_instance_of(Regexp::Expression::Group::Named) }
let(:positional) { be_instance_of(Regexp::Expression::Group::Capture) }

context 'when called without argument' do
let(:arg) { {} }

it { is_expected.to match [positional, named] }
end

context 'when called with a `named: false`' do
let(:arg) { { named: false } }

it { is_expected.to match [positional] }
end

context 'when called with a `named: true`' do
let(:arg) { { named: true } }

it { is_expected.to match [named] }
end
end
end