Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: dynamically ignore files or directories #265

Open
dflupu opened this issue Mar 11, 2020 · 3 comments
Open

Feature request: dynamically ignore files or directories #265

dflupu opened this issue Mar 11, 2020 · 3 comments
Assignees

Comments

@dflupu
Copy link

dflupu commented Mar 11, 2020

It would be useful if the user would be able to pass a callback to fast-glob that, when called with paths to files or directories, returns whether or not fast-glob should include them in the results. If the callback returns false for a directory, then the directory is not iterated at all.

One use case for this would be would be improving globby performance when the gitignore option is enabled. Currently, this option works by globbing the given path twice: once with a **/.gitignore pattern to find and read the gitignore files, and again with the users' globs. While this could be improved upon, I cannot see a way that both a) avoids the initial glob for gitignore files and b) makes fast-glob not iterate gitignored directories. Adding a callback would make that possible.

If I am not aware of some relevant fast-glob feature, let me know.
Globby issue: sindresorhus/globby#50

@mrmlnc mrmlnc self-assigned this Mar 13, 2020
@mrmlnc
Copy link
Owner

mrmlnc commented Mar 13, 2020

Hello, @dflupu,

Thank you for interesting question 🎉

First, I think that the hook mechanism will help you here. Unfortunately, this mechanism requires a lot of effort and will not appear in the near future (~1 year, i'll create an issue). The problem is that this mechanism will slow down the directory tree crawl. But even in this case, I don't see any ways to add patterns to primary filters on the fly — only in hooks.

Second, I see that right now you are doing a primary crawl of the directory tree to search for files (.gitignore). In this case, you can use @nodelib/fs.walk instead of fast-glob. The @nodelib/fs.walk two or three times faster because it does not have any filter by default. In this case, the first crawl should be several times faster and you can add patterns to the filter on the fly (there may be problems with async/stream — this is a really asynchronous).

Additional question:

Why do you need to consider all .gitignore files? Why not just consider the root file? Big monorepo? Do you have any real use case?

@Toxaris
Copy link

Toxaris commented Dec 18, 2020

We are affected by the globby performance issue described by @dflupu. We have a monorepo (using rush+pnpm) with a big .gitignore in the repository root plus some smaller .gitignore files in some of the projects. We had a globby-based script that takes 60sec to find files. I just rewrote it to perform the directory crawling manually with fs.readdir and now it runs in 2sec. I guess it could be faster if I could use fast-glob instead of my clumsy home-grown script :)

@fabiospampinato
Copy link

fabiospampinato commented Nov 21, 2023

I just hit a scenario where my ignore globs are slowing things down significantly, most of the time is actually spent executing them, and if I just merge them into one it's still slow because they seem to be expanded anyway.

If a raw callback were allowed I could just write like /[\\\/](\.git|node_modules)$/.test(targetPath) or similar, which should be more or less free compared to the current regexes, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants