Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip_gitignore=true will break ignoring directories defined in skip during filesystem traversal #1912

Open
richardebeling opened this issue Mar 19, 2022 · 3 comments

Comments

@richardebeling
Copy link

richardebeling commented Mar 19, 2022

Description

When starting isort with --skip_gitignore, directories that should not be traversed because they are configured in skip will be traversed.

From what I've seen, this is purely a performance problem, so all files that should be skipped will be filtered out at some later point.

Instructions to reproduce

Minimal setup (note: "node_modules" is in the default skip values.)

# assuming isort 5.10.1 is installed (`pip3 install isort`)
mkdir test && cd test
git init
mkdir -p node_modules/this_should_not_be_accessed
touch test.py
strace isort . 2>&1 | grep this_should_not  # as expected, not queried
strace isort --skip-gitignore . 2>&1 | grep this_should_not  # suddenly, stat and open syscalls appear

On a side note: Add more files, and the node_modules folder will be queried more often:

touch test2.py
strace isort --skip-gitignore . 2>&1 | grep this_should_not  # 2 stat and 2 open calls
touch test3.py
strace isort --skip-gitignore . 2>&1 | grep this_should_not  # 3 stat and 3 open calls

adding node_modules to a .gitignore doesn't help, even though the directory should then be double-ignored by isort (this is #1895).

echo "node_modules" >> .gitignore
strace isort --skip-gitignore . 2>&1 | grep this_should_not  # 3 stat and 3 open calls

Environment & Versions

Run on ubuntu21.10 with python3.9.7. Actual syscalls probably depend on the python interpreter, if you can't reproduce, I'll probably be able to figure out what python code causes the access.

Related Issues

Closely related to #1895, but if I undestand that issue correctly, it does not talk about skip at all, but just files from .gitignore. It seems to me that the current PR for #1895, #1900, would fix this.
Possibly related #1762 and #1777, both read as if they were fixed.

Background

Yesterday, I realized that isort takes >90 seconds to run on one project where it processes 107 files with approximately 10k lines of code. Turns out, it doesn't need 90 seconds to process, but rather 89s to traverse what feels like the entire file system, only to then do its actual work in ~1s:

time isort --show-files . > isort_output.txt
real  1m35.831s

time cat isort_files.txt | xargs isort
real 0m0.896s

which is absurd, considering that black also finds the code files and finishes in 3 seconds. I wasn't able to get isort to skip reading directories it really doesn't need to access (venv, node_modules, web server static files, ...)

@bmalehorn
Copy link
Contributor

Yeah, this is highly related to #1895 and probably resolved by #1900.

It sounds like your specific issue is:

  1. isort has a default ignore list, including node_modules
  2. when you pass in --skip-gitignore, it actually runs slower because it reads every file then filters afterwards
  3. isort is still functionally correct when you pass in --skip-gitignore, it just takes too long

@richardebeling
Copy link
Author

#1900 is now merged, but iirc, we still have cases where we traverse through directories that are ignored, see my comment on the PR and the response to it.

@adamjstewart
Copy link

Just encountered this same problem with isort 5.10.1. I have a data directory with millions of files. My .gitignore includes /data/ as one of the directories to skip. I see the following performance metrics:

$ isort --skip-gitignore .  # 23 min
$ isort --extend-skip data .  # 43 sec

If I delete the data directory, isort . takes < 2 seconds. Both 23 min and 43 sec are prohibitively long, I usually end up just not using isort and manually sorting my imports 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants