Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintuitive handling of .markdownlintignore files #1

Closed
borekb opened this issue Aug 13, 2020 · 8 comments
Closed

Unintuitive handling of .markdownlintignore files #1

borekb opened this issue Aug 13, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@borekb
Copy link

borekb commented Aug 13, 2020

It took me a while to realize that .markdownlintignore is not treated as an ignore file but rather in a custom way.

For example, this doesn't work ❌:

node_modules
/ignored-folder

This works ✅:

**/node_modules/**
ignored-folder/**

It's described in the README but I still think it would be much better if the standard "ignore" format was used.

@borekb
Copy link
Author

borekb commented Aug 13, 2020

And my "✅" updated example is even wrong – the second pattern should be **/ignored-folder/** (I learned this painfully when lint-staged was reporting markdownlint errors for MD files in the ignored folder; it uses an absolute path so my original glob didn't match it).

@DavidAnson
Copy link
Owner

This is explained here: igorshubovych/markdownlint-cli#108 (comment)

As you note, everything you want to do is possible with the current implementation. It's just slightly different than usual. I haven't come up with a way around this yet, though I'll keep thinking about it. Would using a different file name for the ignore file have helped avoid the expectations you had that were wrong?

@borekb
Copy link
Author

borekb commented Aug 13, 2020

Yes, for example, if it was an ignoreGlobs key in .markdownlint.json, that would make it cleaner.

But even better would be to support the ignore format, ideally with hierarchy as well (#2). I don't know how ripgrep or fd do that but for example, fd can find *.md files instantly between the 180k files I have in the current repo (most of that gitignored node_modules):

$ time fd -g '**/*.md' .
fd -g '**/*.md' .  0.02s user 0.02s system 270% cpu 0.014 total

@DavidAnson
Copy link
Owner

The ignore configuration can't (currently) be in one of the config files like you suggest because those files are only found and parsed as part of the command-line globbing (and it's too late to start ignoring then).

Regarding your timings, the current implementation of markdownlint-cli2 is fast, also! I'm trying to avoid an implementation (like markdownlint-cli uses) that will be slow. :)

The obvious choice to implement this behavior consistent with .gitignore - and the one I made for markdownlint-cli - is with this package: https://github.com/kaelzhang/node-ignore. HOWEVER, that basically returns a filter function you feed every file to find out if it should be ignored. As covered in the related issue, it's too late at that point to avoid enumerating large directories like node_modules which means things can be very slow even when there are only a small number of Markdown files.

I want to do the ignoring as an integral part of the globbing operation which is why markdownlint-cli2 behaves like it does today. I can definitely name the ignore file something like .markdownlint-cli2-extra-globs, but I don't yet see how to change the implementation.

@DavidAnson DavidAnson added the enhancement New feature or request label Aug 14, 2020
@borekb
Copy link
Author

borekb commented Aug 14, 2020

The ignore configuration can't (currently) be in one of the config files like you suggest because those files are only found and parsed as part of the command-line globbing (and it's too late to start ignoring then).

That makes sense, thanks for the explanation.

Regarding your timings, the current implementation of markdownlint-cli2 is fast, also!

cli2 is fast and I love that but achieves it by explicit globbing. My point was that fd can be equally fast while respecting the hierarchy of .gitignore files, which is a much harder problem.

I can definitely name the ignore file something like .markdownlint-cli2-extra-globs

I'd probably vote for that but it's a bit ugly and "proper" ignore support would be better, which leads me to:

[...] but I don't yet see how to change the implementation.

I also don't know if there's a Node.js module that would process ignore files and be fast at the same time 😀. For example, globby suffers from exactly the same problem: sindresorhus/globby#50 (comment).

We once needed to find project.json files in a monorepo quickly (this also needs to avoid many such instances in node_modules) and I think we used vscode-ripgrep for that – it's a package that downloads the ripgrep binary and uses that. The Rust magicians somehow managed to implement globbing + gitignoring in a blazing fast way so we utilized their work.

I think fd is more appropriate tool for the job and I just found that there's an npm package for it: https://www.npmjs.com/package/fd-find, so maybe that's some option.

(I realize that plain Node.js implementation would be better but Rust binaries are cross-platform and usually unbelievably fast.)

@DavidAnson
Copy link
Owner

Quick comments since I don't have a lot of time right now:

CLI2 uses the globby package you reference above, so all of that discussion is directly relevant here. I chose this package after some research, but I can look at the alternatives mentioned in that thread.

I had an idea this morning that I feel optimistic about. I will remove support for .markdownlintignore and add a ignoreGlobs property to .markdownlint-cli2.jsonc. It will still be fast when that file is in the base directory for the same reason that ignores are fast today. It will be slower in child directories, but that is less common and will still enable that scenario with the added benefit of using the same syntax everywhere. It will additionally enable the use of CLI2 without any command line parameters by specifying the command line globs in the config file in the base directory.

I have a philosophical objection to using precompiled binaries from a Node application. It hurts platform-independence and is much harder to audit from a security perspective. That is a fine idea, but not something I am really open to.

@borekb
Copy link
Author

borekb commented Aug 17, 2020

Since I believe ignore files would be the best UX, I tried to do another search for packages that would be able to do it. My conclusion:

So nothing directly usable, I think.

I was also thinking about making fd/ripgrep opt-in, like markdonwlint-cli2 --use-fd, which would have this binary file dependency but be 1000x faster in return. The trouble is, probably, that to make the base cli2 reasonably fast, you'll need to invent your own performant way to ignore files, like the ignoreGlobs in .markdownlint-cli2.jsonc you mentioned, and this will be inherently incompatible with ignore-based tools.

@DavidAnson
Copy link
Owner

Thanks for the investigation!

  • ignore-walk does not seem to accept globs for search
  • glob-gitignore does not seem to read ignore files
  • deglob seems to do what we want, but is missing the streaming results API that globby offers

CORRECTION: deglob only seems to handle ignore files at the root: https://github.com/standard/deglob/blob/61d0f01ddc86a5d9361246e176b2503b0acda835/index.js#L98

I was going to try deglob, though I'm worried about not being able to start linting until the entire search is done. However, root-only is a problem.

So I think I'll explore the .markdownlint-cli2.jsonc option I suggest above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants