Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the best way to capture metadata for cops? #8611

Closed
jonatas opened this issue Aug 28, 2020 · 2 comments
Closed

What is the best way to capture metadata for cops? #8611

jonatas opened this issue Aug 28, 2020 · 2 comments

Comments

@jonatas
Copy link
Contributor

jonatas commented Aug 28, 2020

Hello friends, I'm not sure if my question fits better here or in the RuboCop project itself. Let's start here:

I'm building some smart cops that are crossing metadata from different perspectives of the application.

Initially, I build a mechanism to capture data from code and then I reuse it into cops to make them smart, understanding the context.

So, let's start with the capture system:

module RuboCop
  # RuboCop overrides for capturing information from files using NodePattern.
  module Capture
    module_function

    # Search for a node pattern on a given set of files
    # @return [Array<Object>] with results that matches or captures
    # @param files [Array<String>] nil for use
    #
    # @example "Extract all class names from app/models folder"
    # RuboCop::Capture.all(
    #   pattern: '(class $_ ...)',
    #   files: Dir.glob('app/models/**/*.rb')
    # ).flatten.map(&:source) #   ["ModelA", "ModelB", ...]
    #
    # Or use from:
    #
    # @example "Extract all class names from app/models folder"
    # RuboCop::Capture.all(
    #   pattern: '(class $_ ...)',
    #   from: 'app/models/*.rb'
    # ).flatten.map(&:source) #   ["ModelA", "ModelB",...]
    # Or use from:
    def all(pattern:, files: nil, from: nil)
      node_pattern = RuboCop::NodePattern.new(pattern)
      files = files_from(from) if from && files.nil?
      files.map do |file|
        FromFile.new(pattern: node_pattern, file: file).search
      end
    end

    # @return [Array<String>] with all files from recursive search.
    # @example "Factories from all engines"
    # RuboCop::Capture.files_from('engines/*/spec/factories')
    # @see https://apidock.com/ruby/Dir/glob/class
    def files_from(dir)
      Dir.glob("#{dir}/**/**.rb")
    end

    # This class searches and captures
    # node or data from AST using [RuboCop::NodePattern].
    # @example "Extract all class names from app/models folder"
    # RuboCop::Capture::FromFile.all(
    #   pattern: '(class $_ ...)',
    #   files: Dir.glob('app/models/*.rb')
    # ).flatten.map(&:source) #   ["ModelA", "ModelB", ...]
    #
    # @example "Extracting class name from single file"
    #  RuboCop::Capture::FromFile.new(
    #    pattern: '(class $_ ...)',
    #    file: 'app/models/user.rb'
    #  ).search => s(:const, nil, :User)
    class FromFile
      include RuboCop::AST::Traversal

      attr_reader :pattern, :file
      def initialize(pattern:, file:)
        @pattern = pattern.is_a?(String) ? RuboCop::NodePattern.new(pattern) : pattern
        @file = file
      end

      # @return [Array<Object>] captures from node
      # @param pattern String with the node pattern
      # @param node [RuboCop::AST::Node]
      def search(pattern = self.pattern, node = ast_from_file)
        match = pattern.match(node)
        if match
          match == true ? node : match
        elsif node.nil?
          nil
        elsif node.children.nil?
          []
        elsif node.descendants.any?
          node.descendants.flat_map { |e| search(pattern, e) }.flatten.compact
        end
      end

      # @return [RuboCop::AST::Node] from #processed_source
      def ast_from_file
        processed_source.ast
      end

      # @return [RuboCop::ProcessedSource] from #file using RUBY_VERSION
      def processed_source
        RuboCop::ProcessedSource.new(IO.read(file), RUBY_VERSION.to_f, file)
      end
    end
  end
end

And then I can start capturing things. Example, Let's say I'd like to capture all factory names from an specific folder:

 # @return [Array<String>] with the definition of factory class names' from a
    # given folder.
    def factory_names(from:)
      (@factories_names ||= {})[from] ||=
        RuboCop::Capture.all(
          pattern: '(send nil? :factory ({sym str} $_) ... )',
          from: from
        ).flatten.compact.uniq
    end

Then I can build a cop to check if some factories from rails engines are used outside of the boundaries of the domain. Ideally, a rails engine should have the domain isolated and do not know about the factories of other engines or the main rails application. The inverse is also valid, so I have two cops to check if the main app is touching engine code and another to check if the engine code is using the main app code.

      class AvoidRailsModelsOnEngines < Cop
        MSG = 'Do not use models from rails inside the engines.'.freeze

        def_node_matcher :rails_model?, <<~PATTERN
          (send (const {nil? cbase} #forbidden_model?) ...)
        PATTERN

        class << self
          # @return [Array<String>] with all class names from models
          def rails_models
            capture_const_name(from: model_files)
          end

          # @return [Array<String>] with all filenames from models ignoring concerns
          def model_files
            Dir.glob('app/models/**/**.rb').reject { |e| e =~ %r{/concerns/} }
          end

          # @return [Array<String>] with the definition of class names' from a
          # given folder.
          def capture_const_name(from:)
            (@consts_from ||= {})[from] ||=
              begin
                results = RuboCop::Capture.all(pattern: '(class $_ ...)', files: from)
                results.flatten.compact.flat_map(&:source).uniq
              end
          end
        end

        def on_send(node)
          return unless rails_model?(node)

          add_offense(node.children.first)
        end

        private

        def forbidden_model?(name)
          models.include?(name.to_s)
        end

        def models
          (self.class.rails_models - engine_models)
        end

        def engine_models
          self.class.capture_const_name from: files_from_current_engine
        end

        def files_from_current_engine
          all_files_from_engine_folder.reject { |e| e =~ %r{/concerns/} }
        end

        def all_files_from_engine_folder
          dir = File.dirname(processed_source.path)
          folders = dir.split('/')
          engines_folder_index = folders.index('engines')
          return [] unless engines_folder_index

          folder_engine = folders[0, (engines_folder_index + 2)]
          Dir.glob(File.join(*folder_engine, '**', '**.rb'))
        end
      end

When I run RuboCop sequentially it works like a charm, caching the @consts_from, but if I need to introduce a parallel process it becomes heavy and slow down blocking all threads until captures all cases in all threads separately.

I'm not very familiar with the caching system and I'm not sure how can I build such captures in a way that it would process only once. I don't know exactly what is the best way to warm up this cache in a way that we don't lock all cops to work.

For now, my solution was just to isolate the cops into a separated .rubocop.yml configuration and just run them isolated disabling all other cops, but I'd love to hear any tips on how to better enhance the capturing system into the RuboCop ecosystem in a way that it would not hurt to run along with all other cops.

@marcandre marcandre transferred this issue from rubocop/rubocop-ast Aug 28, 2020
@marcandre
Copy link
Contributor

Transferring to main gem.

My summary is: you want to write a Cop that checks a set of files my_engine/* using information gathered from a (distinct?) set of files my_app/*, right?

There's clearly not an easy way to do this right now. This feels related to #7968 but goes beyond that. It feels outside the scope of RuboCop tbh, but could make for a nice gem?

@jonatas
Copy link
Contributor Author

jonatas commented Aug 28, 2020

My summary is: you want to write a Cop that checks a set of files my_engine/* using information gathered from a (distinct?) set of files my_app/*, right?

Yes.

but could make for a nice gem?

I agree.

@jonatas jonatas closed this as completed Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants