New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Find of hidden files #8650
Comments
Please, open a PR. hidden_files = all_files.select {|file|file =~ /[\/|\\]\./ } can be also improved to hidden_files = all_files.select {|file| file.include?("#{File::SEPARATOR}.") } |
Adding to the suggestion from @fatkodima, I realized we are creating a new string and/or regex with each loop. Testing the following code change: + HIDDEN_FILE_PATTERN = "#{File::SEPARATOR}."
- hidden_files = Set.new(all_files - find_files(base_dir, 0))
+ hidden_files = all_files.select { |file| file.include?(HIDDEN_FILE_PATTERN) } Benchmarks now show a 57x improvement in my project:
I'll create the PR. |
FYI, I found a slightly better refactoring for hidden files which (for my project) saves an additional 30% inside lib/rubocop/path_util.rb + HIDDEN_FILE_PATTERN = "#{File::SEPARATOR}."
+ def self.hidden_path?(path)
+ path.include?(HIDDEN_PATH_PATTERN)
+ end
module_function
def hidden_file_in_not_hidden_dir?(pattern, path)
+ PathUtil.hidden_path?(path) &&
File.fnmatch?(
pattern, path,
File::FNM_PATHNAME | File::FNM_EXTGLOB | File::FNM_DOTMATCH
) &&
!hidden_dir?(path)
end
def hidden_dir?(path)
- File.dirname(path).split(File::SEPARATOR).any? do |dir|
- dir.start_with?('.')
- end
+ PathUtil.hidden_path?(File.dirname(path))
end lib/rubocop/target_finder.rb - hidden_files = Set.new(all_files - find_files(base_dir, 0))
+ hidden_files = all_files.select { |file| PathUtil.hidden_path?(file) } So, the overall gain on finding hidden files and directories is faster. Do you want me to update the PR to use this approach? |
Fine by me. I don't mind more speed improvements. |
Nevermind with the larger change. After running tests, the changes were breaking... so leaving the existing changes as is. |
In profiling the delay between rubocop startup and the beginning of scanning files I discovered that rubocop will traverse the project directories/files twice. Once for
all_files
and then again to findhidden_files
.In the code above,
#find_files
is called twice which calls Dir.glob. Each time with a different flag. This is ineficient as it's touching the local disk twice for ever non-hidden file and it's checking every entry0forFileTest.file?(path)
twice .Describe the solution you'd like
A more efficient solution would be to only call
#find_files
once and then filter for hidden files.The regex looks for any file paths which include a '/.' (Linux) or ''\.'' (Windows)
Benchmarks
I tested this code change in a medium size Rails project.
Project Stats Stats:
Note: The majority of the files are in engines/my-engine/node_modules directory (which is a separate issue).
Results:
The above IPS benchmark shows that on this project the 'select' code can ran 23 times within 1 second, whereas the glob could only run once within a second. In other words, glob is 23.99x slower.
Additional context
If this code change is desired, I can create an MR.
The text was updated successfully, but these errors were encountered: