Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8650 for improved hidden files finder performance #8784

Merged
merged 2 commits into from Sep 25, 2020

Conversation

tleish
Copy link
Contributor

@tleish tleish commented Sep 24, 2020

Faster find of hidden files in TargetFinder class which improves rubocop initial startup speed.

Benchmarks now show a 57x improvement in one large project tested:

Comparison:
             select:       50.6 i/s
               glob:        0.9 i/s - 57.38x  (± 0.00) slower

Before submitting the PR make sure the following are checked:

  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.

@marcandre
Copy link
Contributor

57x is nice 😆
Could you not use grep with a regexp here?

@tleish
Copy link
Contributor Author

tleish commented Sep 24, 2020

Could you not use grep with a regexp here?

@marcandre - I previously tested this, but turns out that the proposed solution is faster.

      require "benchmark/ips"

      Benchmark.ips(7) do |x|
        x.report("glob") { hidden_files = Set.new(all_files - find_files(base_dir, 0)) }
        x.report("select") { hidden_files = all_files.select { |file| file.include?(HIDDEN_FILE_PATTERN) } }
        x.report("grep") { hidden_files = all_files.grep(%r{#{File::SEPARATOR}\.}) }
        x.compare!
      end
Comparison:
             select:       52.3 i/s
               grep:       23.5 i/s - 2.22x  (± 0.00) slower
               glob:        0.9 i/s - 59.16x  (± 0.00) slower

@marcandre
Copy link
Contributor

Could you not use grep with a regexp here?

@marcandre - I previously tested this, but turns out that the proposed solution is faster.

Ah, right... Forgot about https://bugs.ruby-lang.org/issues/17030

@tleish
Copy link
Contributor Author

tleish commented Sep 24, 2020

Ah, right... Forgot about https://bugs.ruby-lang.org/issues/17030

FYI

Benchmark.ips(7) do |x|
    x.report("glob") { hidden_files = Set.new(all_files - find_files(base_dir, 0)) }
    x.report("include?") { hidden_files = all_files.select { |file| file.include?(HIDDEN_FILE_PATTERN) } }
    x.report("=~") { hidden_files = all_files.select { |file| file =~ hidden_file_regex } }
    x.report("match?") { hidden_files = all_files.select { |file| file.match?(hidden_file_regex) } }
    x.report("grep") { hidden_files = all_files.grep(hidden_file_regex) }
    x.compare!
end
Comparison:
            include?:       50.2 i/s
              match?:       42.9 i/s - 1.17x  (± 0.00) slower
                grep:       24.6 i/s - 2.04x  (± 0.00) slower
                  =~:       22.9 i/s - 2.19x  (± 0.00) slower
                glob:        0.9 i/s - 56.91x  (± 0.00) slower

@@ -5,6 +5,8 @@ module RuboCop
# and picking ruby files.
# @api private
class TargetFinder
HIDDEN_FILE_PATTERN = "#{File::SEPARATOR}."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that naming this "pattern" is a good idea, as it makes me think of regular expressions. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to HIDDEN_PATH_SUBSTRING

@@ -55,7 +57,7 @@ def target_files_in_dir(base_dir = Dir.pwd)
# Support Windows: Backslashes from command-line -> forward slashes
base_dir = base_dir.gsub(File::ALT_SEPARATOR, File::SEPARATOR) if File::ALT_SEPARATOR
all_files = find_files(base_dir, File::FNM_DOTMATCH)
hidden_files = Set.new(all_files - find_files(base_dir, 0))
hidden_files = all_files.select { |file| file.include?(HIDDEN_FILE_PATTERN) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a note we're doing this for performance reasons, as someone might want to optimize it for readability or whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adde comment

@bbatsov bbatsov merged commit f788a75 into rubocop:master Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants