-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Slow find_files does not use Include configuration #8646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry about this. There are multiple issues open about |
Ah, thanks for the heads up. I briefly looked before but didn't dig deep enough. I found several now. I'll close this one. |
@marcandre - how would you feel about adding an optional parameter to fine-tune the initial search and loading of files for rubocop? Something like: AllCops:
GlobSearchPatterns:
- '*'
- 'engines/*/*'
- '**/*.{rb,rake}' Then, update def find_files(base_dir, flags)
pattern = files_config_pattern(base_dir) || files_default_pattern(base_dir)
Dir.glob(pattern, flags | File::FNM_EXTGLOB).select { |path| FileTest.file?(path) }
end
def files_config_pattern(base_dir)
all_cops_config = @config_store.for(base_dir).for_all_cops
glob_search_patterns = all_cops_config['GlobSearchPatterns']
if glob_search_patterns
glob_search_patterns.map { |pattern| File.join(base_dir, pattern) }
end
end
def files_default_pattern(base_dir)
wanted_toplevel_dirs = toplevel_dirs(base_dir, flags) -
excluded_dirs(base_dir)
wanted_toplevel_dirs.map! { |dir| dir << '/**/*' }
if wanted_toplevel_dirs.empty?
# We need this special case to avoid creating the pattern
# /**/* which searches the entire file system.
["#{base_dir}/**/*"]
else
# Search the non-excluded top directories, but also add files
# on the top level, which would otherwise not be found.
wanted_toplevel_dirs.unshift("#{base_dir}/*")
end
end Testing this in a large project with multiple engines and files in node_modules directories. #find_files BEFORE:Found Files: 117,812 #find_files AFTER:Found Files: 592 Using Glob in the name indicates it only takes glob parameters (and not regex). Thoughts? |
I'm wary of adding more parameters without very careful consideration, as already the configuration is way more complicated than what I had in mind initially. :-) I can see the merit of what you're proposing, but I can only imagine it would add further confusion for some people. Might be best to create some meta-issue collecting the common problems with include/exclude, so we can discuss solutions in a more focused manner.
We can also consider changing the behaviour in this manner, as I don't think it's going to break a lot of things. First we only had a notion of exclusion, but then added the inclusion to allow people to specify more precise filesets on which to operate. I'm assuming that part of the inefficiency in the current implementation might be related to this. @jonas054 might remember more. |
Ruby Extension StrategySo, why does rubocop search for all files initially, and not just files with known ruby extensions? (i.e. *.rb, *.rake, Rakefile, etc) Gitignore StrategySo, digging into this a little further, I believe What are you thoughts on the following approach?
Some initial experimentation of this shows this as promising. Often projects exclude |
We have always stayed away from assuming any specific version handling system, which is why #7920 was added to fix #595. Is it possible to solve the current problem using only |
@jonas054 - LOL! I was so deep into the weeds I didn't think to use Exclude to apply to directories? This could work. In the process of mocking it up I discovered it should use glob/string patterns only and not regex. With regex, could could mistakenly leave out an entire directory. For example, a pattern to say exclude all files in the config folder except Exclude:
- !ruby/regexp /\/config\/(?:(?!routes.rb).)*$/ When applied to a Also, using just the glob exclude patterns (and not the regex patterns) is good enough. The benchmarks I have with this change are: Before:$ time bundle exec rubocop -L
...
________________________________________________________
Executed in 6.41 secs fish external
usr time 5.16 secs 88.00 micros 5.16 secs
sys time 1.25 secs 476.00 micros 1.25 secs After:$ time bundle exec rubocop -L
...
________________________________________________________
Executed in 2.31 secs fish external
usr time 1.36 secs 131.00 micros 1.36 secs
sys time 0.91 secs 668.00 micros 0.91 secs Executed almost 1/3 of the time, same list of files. |
Created an MR. Github displays a diff a little oddly, a clearer diff would look like this. def find_files(base_dir, flags)
- wanted_toplevel_dirs = toplevel_dirs(base_dir, flags) -
- excluded_dirs(base_dir)
-
- wanted_toplevel_dirs.map! { |dir| dir << "/**/*" }
- pattern = if wanted_toplevel_dirs.empty?
- # We need this special case to avoid creating the pattern
- # /**/* which searches the entire file system.
- ["#{base_dir}/**/*"]
- else
- # Search the non-excluded top directories, but also add files
- # on the top level, which would otherwise not be found.
- wanted_toplevel_dirs.unshift("#{base_dir}/*")
- end
+ patterns = wanted_dir_patterns(base_dir, flags)
+ # We need this special case to avoid creating the pattern
+ # /**/* which searches the entire file system.
+ patterns = ["#{base_dir}/**/*"] if patterns.empty?
- Dir.glob(pattern, flags | File::FNM_EXTGLOB).select { |path| FileTest.file?(path) }
+ Dir.glob(patterns, flags | File::FNM_EXTGLOB).select { |path| FileTest.file?(path) }
end
- def toplevel_dirs(base_dir, flags)
- Dir.glob(File.join(base_dir, '*'), flags).select do |dir|
- File.directory?(dir) && !dir.end_with?('/.', '/..')
- end
- end
-
- def excluded_dirs(base_dir)
- all_cops_config = @config_store.for(base_dir).for_all_cops
- dir_tree_excludes = all_cops_config['Exclude'].select do |pattern|
- pattern.is_a?(String) && pattern.end_with?("/**/*")
- end
- dir_tree_excludes.map { |pattern| pattern.sub(%r{/\*\*/\*$}, '') }
- end
+ def wanted_dir_patterns(base_dir, flags)
+ exclude_pattern = combined_exclude_glob_patterns(base_dir)
+ flags = flags | File::FNM_PATHNAME | File::FNM_EXTGLOB | File::FNM_DOTMATCH
+ Dir.glob("#{base_dir}/**/", flags)
+ .map { |dir| dir << '*' } # add file glob pattern to end of each dir
+ .reject { |dir| File.fnmatch?(exclude_pattern, dir, flags) }
+ end
+
+ def combined_exclude_glob_patterns(base_dir)
+ all_cops_config = @config_store.for(base_dir).for_all_cops
+ patterns = all_cops_config['Exclude'].select { |pattern| pattern.is_a? String }
+ .map { |pattern| pattern.sub("#{base_dir}/", '') }
+ "#{base_dir}/{#{patterns.join(',')}}"
+ end
|
…ies first and then apply Rubocop Exclude on directories before finding files
…ies first and then apply Rubocop Exclude on directories before finding files
After sleeping on it, I realized there was a simpler (less code) and slightly faster version of the previous #8806. By _recursively_ getting all directories and then at each directory level testing Rubocop Exclude on directories. Once directories are found at all levels, we search for files. In addition, I replaced many of the `base_dir` string concatenations with the safer `File#join`. Benchmarks on a large project removes an additional 500+ milliseconds.
A project with inline Rails engines can become slow with rubocops initial scan. For example, a project that looks like:
With the following configuration file:
Expected behavior
I would expect that the file scanning would scan the "Include" directories, and "Exclude" the directories after before analyzing.
Actual behavior
What actually happens is rubocop scans top level root directories, then filters for Exclude patterns
The problem is, node_modules folder will often have thousands of files. The root node_modules can be skipped from the initial scan, but the
engines/my_engine/node_modules
cannot be skipped in the initial filescan (based on the configuration).Because of this, in the project I'm testing the scanning takes 7 seconds just to get the list of files. If I scan by only
Steps to reproduce the problem
See description
RuboCop version
0.77.0 (using Parser 2.6.5.0, running on ruby 2.6.5 x86_64-darwin18)
The text was updated successfully, but these errors were encountered: