Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden directories containing a large number of files impacts performance #9208

Closed
dark-panda opened this issue Dec 10, 2020 · 0 comments
Closed

Comments

@dark-panda
Copy link
Contributor

Finding hidden files can have a performance impact when there are hidden
directories in the working directory that contain many files, as the
hidden_files array can become rather large, resulting in a long
start-up time while the target files are determined. Using a binary search
can cut down on this start-up time considerably.

As an example, on my machine, I have a .bundle directory containing my local
bundle, which contains tens of thousands of files. This causes the target_files
array to be quite large as a result. Scanning on these using Array#include?
becomes at worst O(n), while using a binary search will be on average O(log n).
This change reduced the start-up time on my Rubocop scans from around
100 seconds to around 15 seconds when analyzed using Benchmark.

Expected behavior

Quick start-up time.

Actual behavior

Potentially slow start-up time when a large number of hidden files are present.

Steps to reproduce the problem

Create a hidden directory containing a large number of files and run Rubocop.
The issue will worse as the number of files increases appreciably.

RuboCop version

1.6.1 (using Parser 2.7.2.0, rubocop-ast 1.3.0, running on ruby 2.7.2 x86_64-darwin18)
  - rubocop-performance 1.9.1
  - rubocop-rspec 2.0.1
dark-panda added a commit to dark-panda/rubocop that referenced this issue Dec 11, 2020
…ect hidden files

Finding hidden files can have a performance impact when there are hidden
directories in the working directory that contain many files, as the
`hidden_files` array can become rather large, resulting in a long
start-up time while the target files are determined. Using a binary search
can cut down on this start-up time considerably.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant