MD051: Enhance with optional ignore prefix or regex #547

tunetheweb · 2022-08-04T09:17:34Z

MD051 is a very nice rule addition! However some of the markdown files in our project have dynamically inserted figures which start called fig-1, fig-2. These anchors don't exist in the markdown, but are added as part of our build process.

What do you think about adding an optional config with a prefix or regex of links to ignore?

Something like:

MD051:
  ignore_prefix: "fig"

Or

MD051:
  ignore_prefixes: "fig,somethingelse"

Or

MD051:
  ignore_regex: "^(fig|somethingelse)"

I'd be willing to have a go at a PR for this if this sounds reasonable and have any preference for any of the above or any preferred name/syntax.

The text was updated successfully, but these errors were encountered:

DavidAnson · 2022-08-04T16:35:05Z

What about linting after the Markdown files are fully generated by the build? Otherwise you may have broken "fig-" links and won't know it.

tunetheweb · 2022-08-04T16:43:09Z

That is true and certainly a risk. In my case they are built into HTML and and we do lint those. But apparently the HTML linter we use (HTMLHint) is not as good as markdown lint 😄 since we have several broken links that this check has only just surfaced. I could look at expanding that project to possibly have a similar rule to MD051 as an alternative to adding the exception here if you’d prefer not to complicate this code base with an exception option.

DavidAnson · 2022-08-04T16:58:20Z

How many links do there tend to be in a document? Would supporting a list of strings be enough because you could provide a project level configuration that listed "fig-1" to "fig-10"? Or are there hundreds of these in a document?

tunetheweb · 2022-08-04T17:11:01Z

There can be a lot. Here’s an example one if you’re curious: https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/src/content/en/2021/css.md

We also have translations and one thing that’s particularly apparent with this new link is the translators often leave the original links, but translate the headings (which then obviously changes the heading anchor and so breaks the link). MD053 has surfaced a lot of those which is why I’m keen to be able to use this check to prevent that in future. Those and typos in links (or edits after we create links, but then change the heading names) are main use case for me.

Figure links are less of an issue for us - and a more complicated one to solve anyway since the figure link likely exists but could be wrong one if they all shift along by inserting a new figure. But we also tend to link those less anyway as usually talk about the figures just after then so don’t need a link.

DavidAnson · 2022-08-04T19:43:58Z

I may have been unclear. I was asking how many instances of figure links might be in a document. If it is very few, then it would be possible to provide a fixed size list of the first 10 and that would cover you. If there are very many, maybe a regular expression is more relevant. I don't see any matches for "fig-" in that document, so maybe I'm looking for the wrong thing?

tunetheweb · 2022-08-04T21:01:36Z

So there are anything for a few to many figure links. That particular document has 67 figures. You can see the final published version here. In this example none of the figures are referenced by links.

Here’s one where there is a link: https://raw.githubusercontent.com/HTTPArchive/almanac.httparchive.org/main/src/content/en/2021/pwa.md (search for fig-4).

The nature of publication is most figures are talked about directly after the figure so don’t need to reference the figure, but occasionally another figure in the text is referenced and linked.

So I was thinking to allow me to configured a prefix allow list (fig-) or regex, rather than having to list all possible figures that might be referenced by the authors.

DavidAnson · 2022-08-04T22:26:22Z

I'm worried that prefix is not general enough for other scenarios (if there are any?) and that regular expression is harder to work with for many folks.

I also feel kind of like the approach you describe now is quite fragile and may have a bunch of broken figure links already.

So I don't have an approach I like yet.

tunetheweb · 2022-08-07T11:25:33Z

FYI I managed to work around this using inline ignores on the affected lines since, luckily, we don't reference figures that much internally so this is feasible.

I still think it would be useful to have some sort of more generic overrides for dynamically inserted content like this, where the markdown is, in effect, a source file, rather than the final output. I'm know I'm not the only one that does this (though not aware of any others that explicitly link to generated content). I take onboard your above point that it might be better to lint the final output HTML in these cases, though it's also nice to be able to flag items at source (so we get the correct line number), but without the noise of items markdownlint can't be expected to deal with. Maybe inline ignores are the best way of dealing with this but feels a little verbose for regular use, and listing all the possible figures in any MD051 config is almost as verbose. Maybe some kind of regex-lite like D051: ignore: "fig-{int},somethingelse" would be a middle ground between full regex support and listing every permutation?

Anyway, I understand if you'd prefer not to handle this in markdownlink and so wish to close this issue. As I say I've managed to work around it with existing functionality and still benefit from MD051 to help identify a lot of real issues/typos thanks to this new rule.

Thanks again for creating and supporting this tool!

DavidAnson · 2022-08-07T17:12:09Z

Great news! I'll leave this open as a possible enhancement and see if/what other scenarios come up.

mkrg-capco · 2022-12-21T11:51:04Z

We hit similar issue. Our project which is held in bitbucket generates README for terraform with DocTor https://github.com/thlorenz/doctoc. It generates table of contents with links in format

[Terraform Documentation](#markdown-header-terraform-documentation)
    - [Requirements](#markdown-header-requirements)

and markdown-linter marks links with markdown-header prefix as invalid. having a way to configure the linter to ignore these prefixes would resolve the problem.

For now I have to disable the rule.

DavidAnson added the question label Aug 4, 2022

DavidAnson added enhancement and removed question labels Aug 7, 2022

DavidAnson changed the title ~~Enhance MD051 with optional ignore prefix or regex~~ MD051: Enhance with optional ignore prefix or regex Aug 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MD051: Enhance with optional ignore prefix or regex #547

MD051: Enhance with optional ignore prefix or regex #547

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 7, 2022

DavidAnson commented Aug 7, 2022

mkrg-capco commented Dec 21, 2022

MD051: Enhance with optional ignore prefix or regex #547

MD051: Enhance with optional ignore prefix or regex #547

Comments

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 4, 2022

DavidAnson commented Aug 4, 2022

tunetheweb commented Aug 7, 2022

DavidAnson commented Aug 7, 2022

mkrg-capco commented Dec 21, 2022