Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD051: Enhance with optional ignore prefix or regex #547

Open
tunetheweb opened this issue Aug 4, 2022 · 10 comments
Open

MD051: Enhance with optional ignore prefix or regex #547

tunetheweb opened this issue Aug 4, 2022 · 10 comments

Comments

@tunetheweb
Copy link

MD051 is a very nice rule addition! However some of the markdown files in our project have dynamically inserted figures which start called fig-1, fig-2. These anchors don't exist in the markdown, but are added as part of our build process.

What do you think about adding an optional config with a prefix or regex of links to ignore?

Something like:

MD051:
  ignore_prefix: "fig"

Or

MD051:
  ignore_prefixes: "fig,somethingelse"

Or

MD051:
  ignore_regex: "^(fig|somethingelse)"

I'd be willing to have a go at a PR for this if this sounds reasonable and have any preference for any of the above or any preferred name/syntax.

@DavidAnson
Copy link
Owner

What about linting after the Markdown files are fully generated by the build? Otherwise you may have broken "fig-" links and won't know it.

@tunetheweb
Copy link
Author

That is true and certainly a risk. In my case they are built into HTML and and we do lint those. But apparently the HTML linter we use (HTMLHint) is not as good as markdown lint 😄 since we have several broken links that this check has only just surfaced. I could look at expanding that project to possibly have a similar rule to MD051 as an alternative to adding the exception here if you’d prefer not to complicate this code base with an exception option.

@DavidAnson
Copy link
Owner

How many links do there tend to be in a document? Would supporting a list of strings be enough because you could provide a project level configuration that listed "fig-1" to "fig-10"? Or are there hundreds of these in a document?

@tunetheweb
Copy link
Author

There can be a lot. Here’s an example one if you’re curious: https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/src/content/en/2021/css.md

We also have translations and one thing that’s particularly apparent with this new link is the translators often leave the original links, but translate the headings (which then obviously changes the heading anchor and so breaks the link). MD053 has surfaced a lot of those which is why I’m keen to be able to use this check to prevent that in future. Those and typos in links (or edits after we create links, but then change the heading names) are main use case for me.

Figure links are less of an issue for us - and a more complicated one to solve anyway since the figure link likely exists but could be wrong one if they all shift along by inserting a new figure. But we also tend to link those less anyway as usually talk about the figures just after then so don’t need a link.

@DavidAnson
Copy link
Owner

I may have been unclear. I was asking how many instances of figure links might be in a document. If it is very few, then it would be possible to provide a fixed size list of the first 10 and that would cover you. If there are very many, maybe a regular expression is more relevant. I don't see any matches for "fig-" in that document, so maybe I'm looking for the wrong thing?

@tunetheweb
Copy link
Author

So there are anything for a few to many figure links. That particular document has 67 figures. You can see the final published version here. In this example none of the figures are referenced by links.

Here’s one where there is a link: https://raw.githubusercontent.com/HTTPArchive/almanac.httparchive.org/main/src/content/en/2021/pwa.md (search for fig-4).

The nature of publication is most figures are talked about directly after the figure so don’t need to reference the figure, but occasionally another figure in the text is referenced and linked.

So I was thinking to allow me to configured a prefix allow list (fig-) or regex, rather than having to list all possible figures that might be referenced by the authors.

@DavidAnson
Copy link
Owner

I'm worried that prefix is not general enough for other scenarios (if there are any?) and that regular expression is harder to work with for many folks.

I also feel kind of like the approach you describe now is quite fragile and may have a bunch of broken figure links already.

So I don't have an approach I like yet.

@tunetheweb
Copy link
Author

FYI I managed to work around this using inline ignores on the affected lines since, luckily, we don't reference figures that much internally so this is feasible.

I still think it would be useful to have some sort of more generic overrides for dynamically inserted content like this, where the markdown is, in effect, a source file, rather than the final output. I'm know I'm not the only one that does this (though not aware of any others that explicitly link to generated content). I take onboard your above point that it might be better to lint the final output HTML in these cases, though it's also nice to be able to flag items at source (so we get the correct line number), but without the noise of items markdownlint can't be expected to deal with. Maybe inline ignores are the best way of dealing with this but feels a little verbose for regular use, and listing all the possible figures in any MD051 config is almost as verbose. Maybe some kind of regex-lite like D051: ignore: "fig-{int},somethingelse" would be a middle ground between full regex support and listing every permutation?

Anyway, I understand if you'd prefer not to handle this in markdownlink and so wish to close this issue. As I say I've managed to work around it with existing functionality and still benefit from MD051 to help identify a lot of real issues/typos thanks to this new rule.

Thanks again for creating and supporting this tool!

@DavidAnson
Copy link
Owner

Great news! I'll leave this open as a possible enhancement and see if/what other scenarios come up.

@mkrg-capco
Copy link

We hit similar issue. Our project which is held in bitbucket generates README for terraform with DocTor https://github.com/thlorenz/doctoc. It generates table of contents with links in format

[Terraform Documentation](#markdown-header-terraform-documentation)
    - [Requirements](#markdown-header-requirements)

and markdown-linter marks links with markdown-header prefix as invalid. having a way to configure the linter to ignore these prefixes would resolve the problem.

For now I have to disable the rule.

@DavidAnson DavidAnson changed the title Enhance MD051 with optional ignore prefix or regex MD051: Enhance with optional ignore prefix or regex Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants