New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix #9636] Resolve symlinks when excluding directories #9703
Conversation
Did you measure the performance impact of the fix? I guess some hit is unavoidable, but I'm curious how significant it is. |
To be clear - I feel this fixes a legitimate bug, but given the history of how this bug was introduced I want to make sure that resolving the real paths is not going to cause a performance issue. |
@bbatsov Thanks for the quick reply! I just did some very basic performance testing, running the following script on my laptop (Macbook Pro 2017, ruby 2.7.2p137) from the root of the rubocop repository: require 'benchmark'
require_relative './lib/rubocop'
config = RuboCop::ConfigStore.new
options = {force_exclusion: false, debug: false}
target_finder = RuboCop::TargetFinder.new(config, options)
Benchmark.bm do |x|
x.report { 1_000.times { target_finder.target_files_in_dir(__dir__) } }
end Results for Rubocop 1.12.1:
Results for Rubocop 1.12.1 + my patch:
It looks like the additional Let me know if there are additional tests you'd like me to run! |
@rubocop/rubocop-core What do you think, guys? It's not a big performance hit, but it's still a step back. On the other hand I do believe that symlinks should be resolved. |
lib/rubocop/target_finder.rb
Outdated
dir.end_with?('/./', '/../') || | ||
File.fnmatch?(exclude_pattern, dir, flags) || | ||
File.fnmatch?(exclude_pattern, "#{File.realpath(dir)}/", flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 101 is redundant here (all the tests pass without it), and that might help performance. I ran your benchmark and got this (after removing the line):
master:
$ bundle exec ruby benchmark.rb
user system total real
68.409056 14.974495 83.383551 ( 84.883239)
patched:
$ bundle exec ruby benchmark.rb
user system total real
67.697482 15.706053 83.403535 ( 84.581582)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make it impossible to exclude paths by their symlink names though, and would change the current behavior. I've added a test for this scenario in a new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks for that added test.
We should still avoid doing multiple File.fnmatch?
when dir
is not a symlink (which is probably the majority case by far). Something like (could be cleaned up probably):
dirs = Dir.glob(File.join(base_dir.gsub('/**/', '/\**/'), '*/'), flags)
.reject do |dir|
next true if dir.end_with?('/./', '/../')
next true if File.fnmatch?(exclude_pattern, dir, flags)
File.symlink?(dir.chomp('/')) && File.fnmatch?(exclude_pattern, "#{File.realpath(dir)}/", flags)
end
user system total real
66.163276 14.268255 80.431531 ( 81.158242)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symlink check makes sense. I've updated the PR with your suggestion.
Looks good, and what @dvandersluis said 👍 |
Agreed. Let's wait for the changes proposed by @dvandersluis to be applied and I'll merge this. I was planning to do a new release today, but I can wait for this PR to make it into the release. |
766a108
to
2806dee
Compare
Thanks! |
Thank you all! Looking forward to the release, as this issue was preventing us from upgrading :) |
rubocop#8815 introduced a traversal strategy that used recursion. rubocop#9703 then fixed an issue with this traversal which accounted for directories and symlinks. When a symlink points to a parent directory that contains that symlink it'll cause this to go into a loop until the filename is too long for glob to handle. We prevent this by checking for the inclusion of a symlink's real path in the base directory's realpath. If the base directory's path starts with the symlink's destination then we are in a loop and should skip processing the directory
#8815 introduced a traversal strategy that used recursion. #9703 then fixed an issue with this traversal which accounted for directories and symlinks. When a symlink points to a parent directory that contains that symlink it'll cause this to go into a loop until the filename is too long for glob to handle. We prevent this by checking for the inclusion of a symlink's real path in the base directory's realpath. If the base directory's path starts with the symlink's destination then we are in a loop and should skip processing the directory
Resolve symlinks when excluding directories, i.e. if
foo/**/*
is present in the Exclude list and there is abar
symlink pointing tofoo
, files underbar
will be properly excluded.Fixes #9636.
Before submitting the PR make sure the following are checked:
[Fix #issue-number]
(if the related issue exists).master
(if not - rebase it).bundle exec rake default
. It executes all tests and runs RuboCop on its own code.{change_type}_{change_description}.md
if the new code introduces user-observable changes. See changelog entry format for details.