Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #9636] Resolve symlinks when excluding directories #9703

Merged
merged 1 commit into from Apr 19, 2021

Conversation

ob-stripe
Copy link
Contributor

Resolve symlinks when excluding directories, i.e. if foo/**/* is present in the Exclude list and there is a bar symlink pointing to foo, files under bar will be properly excluded.

Fixes #9636.


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

@bbatsov
Copy link
Collaborator

bbatsov commented Apr 17, 2021

Did you measure the performance impact of the fix? I guess some hit is unavoidable, but I'm curious how significant it is.

@bbatsov
Copy link
Collaborator

bbatsov commented Apr 17, 2021

To be clear - I feel this fixes a legitimate bug, but given the history of how this bug was introduced I want to make sure that resolving the real paths is not going to cause a performance issue.

@ob-stripe
Copy link
Contributor Author

@bbatsov Thanks for the quick reply!

I just did some very basic performance testing, running the following script on my laptop (Macbook Pro 2017, ruby 2.7.2p137) from the root of the rubocop repository:

require 'benchmark'
require_relative './lib/rubocop'

config = RuboCop::ConfigStore.new
options = {force_exclusion: false, debug: false}
target_finder = RuboCop::TargetFinder.new(config, options)

Benchmark.bm do |x|
  x.report { 1_000.times { target_finder.target_files_in_dir(__dir__) } }
end

Results for Rubocop 1.12.1:

       user     system      total        real
  52.574793  30.188860  82.763653 ( 88.034613)

Results for Rubocop 1.12.1 + my patch:

       user     system      total        real
  52.337204  34.995932  87.333136 ( 91.022406)

It looks like the additional realpath calls add about 16% of system time and about 5% of total time.

Let me know if there are additional tests you'd like me to run!

@bbatsov
Copy link
Collaborator

bbatsov commented Apr 18, 2021

@rubocop/rubocop-core What do you think, guys? It's not a big performance hit, but it's still a step back. On the other hand I do believe that symlinks should be resolved.

Comment on lines 100 to 102
dir.end_with?('/./', '/../') ||
File.fnmatch?(exclude_pattern, dir, flags) ||
File.fnmatch?(exclude_pattern, "#{File.realpath(dir)}/", flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 101 is redundant here (all the tests pass without it), and that might help performance. I ran your benchmark and got this (after removing the line):

master:

$ bundle exec ruby benchmark.rb
       user     system      total        real
  68.409056  14.974495  83.383551 ( 84.883239)

patched:

$ bundle exec ruby benchmark.rb
       user     system      total        real
  67.697482  15.706053  83.403535 ( 84.581582)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make it impossible to exclude paths by their symlink names though, and would change the current behavior. I've added a test for this scenario in a new commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks for that added test.

We should still avoid doing multiple File.fnmatch? when dir is not a symlink (which is probably the majority case by far). Something like (could be cleaned up probably):

dirs = Dir.glob(File.join(base_dir.gsub('/**/', '/\**/'), '*/'), flags)
          .reject do |dir|
            next true if dir.end_with?('/./', '/../')
            next true if File.fnmatch?(exclude_pattern, dir, flags)

            File.symlink?(dir.chomp('/')) && File.fnmatch?(exclude_pattern, "#{File.realpath(dir)}/", flags)
          end
       user     system      total        real
  66.163276  14.268255  80.431531 ( 81.158242)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symlink check makes sense. I've updated the PR with your suggestion.

@marcandre
Copy link
Contributor

Looks good, and what @dvandersluis said 👍

@bbatsov
Copy link
Collaborator

bbatsov commented Apr 19, 2021

Agreed. Let's wait for the changes proposed by @dvandersluis to be applied and I'll merge this. I was planning to do a new release today, but I can wait for this PR to make it into the release.

@ob-stripe ob-stripe force-pushed the ob-fix-9636 branch 2 times, most recently from 766a108 to 2806dee Compare April 19, 2021 17:53
@bbatsov bbatsov merged commit fe934c2 into rubocop:master Apr 19, 2021
@bbatsov
Copy link
Collaborator

bbatsov commented Apr 19, 2021

Thanks!

@ob-stripe ob-stripe deleted the ob-fix-9636 branch April 19, 2021 20:32
@ob-stripe
Copy link
Contributor Author

Thank you all! Looking forward to the release, as this issue was preventing us from upgrading :)

Tonkpils added a commit to Tonkpils/rubocop that referenced this pull request Apr 29, 2021
rubocop#8815 introduced a traversal
strategy that used recursion.
rubocop#9703 then fixed an issue with
this traversal which accounted for directories and symlinks.

When a symlink points to a parent directory that contains that symlink
it'll cause this to go into a loop until the filename is too long for
glob to handle.

We prevent this by checking for the inclusion of a symlink's real path
in the base directory's realpath. If the base directory's path starts
with the symlink's destination then we are in a loop and should skip
processing the directory
bbatsov pushed a commit that referenced this pull request May 3, 2021
#8815 introduced a traversal
strategy that used recursion.
#9703 then fixed an issue with
this traversal which accounted for directories and symlinks.

When a symlink points to a parent directory that contains that symlink
it'll cause this to go into a loop until the filename is too long for
glob to handle.

We prevent this by checking for the inclusion of a symlink's real path
in the base directory's realpath. If the base directory's path starts
with the symlink's destination then we are in a loop and should skip
processing the directory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symlinks pointing to excluded directories are not excluded
4 participants