Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header and navigation sidebars display incorrectly arbitrary markup #3357

Closed
Enn83 opened this issue Aug 28, 2023 · 76 comments · Fixed by #3564 or #3578
Closed

Header and navigation sidebars display incorrectly arbitrary markup #3357

Enn83 opened this issue Aug 28, 2023 · 76 comments · Fixed by #3564 or #3578
Milestone

Comments

@Enn83
Copy link

Enn83 commented Aug 28, 2023

Hello!

Since when does the bug occur?

I recently upgraded my mkdocs package from 1.4.2 to 1.5.2, and I noticed this bug when...

  • I don't specify a page title in mkdocs.yml file or in the page metadata (in yaml).
  • I use an emoji (HTML in general) in the first level section title in the markdown file:
# Intégration et Déploiement Continu sur :simple-gitlab: Gitlab

For example (with mkdocs-material theme and readthedocs theme):

263000285-555a71e9-3530-4a63-906a-17617c1d0be3
263459179-2afb6262-bde4-4727-b0e3-c5c4da035873

... while it was displayed like that before:

262992157-dcebad1a-82b1-445d-b250-8ef7aa34bafb

More generally, some parts like the search page of readthedocs theme don't handle well the HTML in title sections.

External links:

readthedocs theme pictures and bug explanations are from @squidfunk.

@squidfunk
Copy link
Sponsor Contributor

squidfunk commented Sep 14, 2023

@oprypin friendly ping, since this is a pretty serious bug. As of MkDocs 1.5, the page title contains HTML, which does not only break our project Material for MkDocs, but other themes like ReadTheDocs, and potentially many plugins that are not prepared to handle HTML in page.title. It's new behavior that was introduced in MkDocs 1.5.0 and effectively prohibits to use markup in headlines, as this markup doesn't seem to be stripped from the page title anymore. It is now also part of the search index, for which the search that comes with MkDocs is not prepared:

263459257-b8f7105a-3f4f-4536-a472-b2342b625b6c

The changelog of MkDocs 1.5.0 only talks about a new method for deducing the page title, but not about this change in behavior which, in my opinion, is a breaking change that requires the ecosystem to adapt. For this reason, I assume it's an unintended bug that we need to fix.

@ultrabug
Copy link
Member

Just to make sure I get it right: what would be the expected behavior here? Silently ignore any <span> containing the rendered SVG of the emoji? Raise an error and crash the build?

@squidfunk
Copy link
Sponsor Contributor

squidfunk commented Sep 14, 2023

The behavior of MkDocs 1.4 – filter HTML tags from page titles.

Edit: If that is the new desired behavior, it should go into the docs and it should actually be a breaking release (= MkDocs 2.0) because all plugins that process page.title are impacted and probably need to be adjusted.

@pawamoy
Copy link
Sponsor Contributor

pawamoy commented Sep 14, 2023

Possibly related: waylan/mkdocs-nature#5. Only recently did we introduce HTML <code> tags in some parts of the navigation of Python-Markdown's docs (API reference, see Python-Markdown/markdown#1379) that were then shown in the browser's tab title (and therefore window manager's window title). Maybe the change wouldn't have been needed with MkDocs 1.4. Will check and report back.

UPDATE: well nope, <code> tags weren't stripped with MkDocs 1.4.3 either. Though the page titles here are obtained from the nav directly (well, through literate-nav) rather than Markdown page H1 titles.

So IIUC, the behavior differs whether a title is given in nav or in the page? It would be nice to reconcile both cases. Maybe that's what MkDocs 1.5 did, not stripping anything in any case.

@ultrabug
Copy link
Member

Maybe we need to modify the python-markdown preprocessing for the title parser.

Gentle ping to @facelessuser who I suspect is a master in python-markdown processors. It's quite obscure to me I'm afraid so far...

Could we tweak the title processor to ignore :emoji: formatting?

@facelessuser
Copy link
Contributor

@ultrabug I'm not sure I understand the question. Can you break this down for me?

@ultrabug
Copy link
Member

MkDocs 1.5 have a title processor which is now in charge of rendering the title of the page using a custom python-markdown Treeprocessor.

Emojis in page title which are rendered as SVG (for example) do break the page structure and CSS as demonstrated here.

I was wondering if we could alter our custom Treeprocessor for title abstraction to make it NOT tokenize the :(.*): emoji syntax maybe?

@squidfunk
Copy link
Sponsor Contributor

It might be an issue of ordering – the title processor will run before emoji is being processed, and then after the title processor removed all markup, emojis keeps adding stuff to it. That's at least my theory.

@facelessuser
Copy link
Contributor

Emoji is an inline processor. These get run before tree processors.

  1. Preprocessor
  2. Block Processor
  3. Inline Processor
  4. Tree Processor
  5. Post Processor

@facelessuser
Copy link
Contributor

Python Markdown allows you to specify special labels if you need to control exactly what is used for TOC etc.: https://python-markdown.github.io/extensions/toc/#custom-labels

@facelessuser
Copy link
Contributor

If you are hoping for TOC to accept some way to exclude specific syntax, you will probably be disappointed. It should not have knowledge of specific (especially 3rd party) extensions. I'm sure you can imagine the maintenance nightmare that would be as other extensions popup that also want to be excluded.

Help me out, what exactly is the end result you want? What is your expectation of how things should work and why would it solve your problem? If I am to be of any help, I need to understand the end goal. Maybe there is a creative way to workaround this. Maybe one I could implement into emoji (if necessary). I just need to understand what you are hoping to accomplish.

@squidfunk
Copy link
Sponsor Contributor

The endgoal is to sanitize the page.title of any HTML tags. This is not done anymore due to some bug in the new title extraction processor. I'm not sure we need to change something in the emoji extension for that – it worked before, so it's likely that we need to fix the title tree processor.

@squidfunk
Copy link
Sponsor Contributor

I've investigated and think I know what the problem is:

# Drop anchorlink from the element, if present.
if len(el) > 0 and el[-1].tag == 'a' and not (el[-1].tail or '').strip():
el = copy.copy(el)
del el[-1]
# Extract the text only, recursively.
title = ''.join(el.itertext())
# Unescape per Markdown implementation details.
for pp in self.postprocessors:
title = pp.run(title)
self.title = title

  1. Anchors links are removed to omit "Permalink" being part of the extracted title
  2. All tags are stripped by just extracting the text with itertext()
  3. Stashed content is reinjected, which includes the <svg> element

A brute force approach would be to strip all tags again, but maybe there's a better solution.

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

First off let's make a reproduction case and clarify the behavior of markup in nav headers.

mkdocs.yml:

site_name: My Docs

markdown_extensions:
  - pymdownx.emoji:
      emoji_index: !!python/name:materialx.emoji.twemoji
      emoji_generator: !!python/name:materialx.emoji.to_svg

docs/index.md:

# Title *title* <i>title</i> :material-book:

There is no change in the content itself but there is change in the nav rendering. Each theme behaves consistently with each other.


MkDocs 1.4.3MkDocs 1.5.2

Title *title* title :material-book:

Markdown is ignored/preserved,
raw HTML is applied

Title title title 📖

Markdown is applied and stripped but
stashed Markdown is applied, raw HTML is applied

"mkdocs" theme
image

"mkdocs" theme
image

"readthedocs" theme
image

"readthedocs" theme
image

"material" theme - 9.1.20
image

"material" theme - 9.1.20
image

All in all both behaviors are wildly inconsistent. If I had to pick one, probably the one in MkDocs 1.5 makes more sense, but that doesn't help much.

Emojis were producing poor results both before and after, just differently.
Only "material" theme had somewhat proper styling for the emoji.


And there is the repro case from the issue on mkdocs-material, indeed the behavior is somehow different.
But I haven't looked why exactly the theme behaves like that - at first glance seems theme-specific.

MkDocs 1.4.3MkDocs 1.5.2

"material" theme - 9.1.20-insiders-4.37.0
image

image


There's also the separate matter that there is a "material/typeset" plugin (not public) that was relying on the behavior that Markdown is being ignored/preserved as it's being moved to the nav title, and then re-applying that Markdown in its own way. So emojis were fully supported only on mkdocs-material-insiders in its own way, and this change made that not function anymore.


And it is definitely a huge issue that the HTML now reaches directly into search results

image

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

So some things to note:

  • Even in old MkDocs, HTML was able to get through. It wasn't actually sanitized, it's just that Markdown wasn't rendered, adding more of its own HTML.

  • If MkDocs now actually starts to sanitize all HTML tags, then emojis will be completely obliterated.

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

Regarding why the current behavior is like that, I'm quite sure that @squidfunk has it spot on - #3357 (comment)

But it is unclear what the actual best outcome would be, seeing as, again, actually stripping all HTML would just make emojis actually impossible to support in titles. Whereas in the current state, to support them, only some CSS fixes would be needed.

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

And yet another aspect - indeed, specifying the page title in the nav YAML config still has exactly the same behavior as with MkDocs 1.4 (ignoring/preserving Markdown) - it's only the titles obtained from the page content that now selectively render Markdown, which is certainly unfortunate.

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

#3357 (comment)
Indeed another huge deal - similarly to the search results - the HTML also ends up propagating to the <title> tag in both built-in themes, so themes need to manually call striptags, or otherwise the browser's title bar can get a large chunk of HTML.

But this was always partly the case if one were to write raw HTML for the page title, just that now Markdown can also insert a huge SVG emoji and it will end up in the page title as raw markup

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

One pretty straightforward resolution here would be to indeed almost totally rollback to pre-1.5 behavior, even if it's not ideal. There's actually a way to do that that doesn't require rolling back the new title detection.

@pawamoy
Copy link
Sponsor Contributor

pawamoy commented Sep 14, 2023

Always amazed by your thoroughness @oprypin, thank you for doing the heavy lifting of investigating the whys and hows.

@squidfunk
Copy link
Sponsor Contributor

Hmm. Can't we change the priority of Markdown extensions, so that the stashes are restored before the tags are stripped? IMHO, this would be a better solution, but I'm probably missing something why this is not possible.

@squidfunk
Copy link
Sponsor Contributor

There's also the separate matter that there is a "material/typeset" plugin (not public) that was relying on the behavior that Markdown is being ignored/preserved as it's being moved to the nav title, and then re-applying that Markdown in its own way. So emojis were fully supported only on mkdocs-material-insiders in its own way, and this change made that not function anymore.

The typeset plugin is actually an attempt to allow for a controlled variant of what's reported in this issue, particularly that themes can opt into allowing for rich text in specific parts without forcing HTML tags down every theme and plugin. Quite a few popular plugins in the ecosystem would need to be adapted if we would opt for MkDocs 1.5 behavior, which is why I consider this to be a breaking change that mandates a major release (if not declared a bug).

@oprypin
Copy link
Contributor

oprypin commented Sep 14, 2023

Hmm. Can't we change the priority of Markdown extensions, so that the stashes are restored before the tags are stripped? IMHO, this would be a better solution, but I'm probably missing something why this is not possible.

@squidfunk I'm just quite lost regarding what the output should be conceptually. In terms of how to implement it, anything is doable.
The most helpful feedback that I'd like to see is what value of title you expect to see for a given input Markdown.

E.g.

  • Input: \*Hello --- *beautiful* `world`
    MkDocs 1.4: \*Hello --- *beautiful* `world`
    MkDocs 1.5: *Hello — beautiful world

  • Input: Title *title* <i>title</i> :material-book:
    MkDocs 1.4: Title *title* <i>title</i> :material-book:
    MkDocs 1.5: Title title <i>title</i> <svg blob...></svg>

@squidfunk
Copy link
Sponsor Contributor

The most helpful feedback that I'd like to see is what value of title you expect to see for a given input Markdown.

Ideally a title without Markdown formatting and HTML tags. If that's not possible, MkDocs 1.4 behavior.

@ultrabug
Copy link
Member

Thanks for clarifying things better @oprypin

I allowed myself to illustrate the effects of the proposed PR to link it to this discussion

@waylan
Copy link
Member

waylan commented Sep 16, 2023

As @pawamoy mentioned above, we encountered a similar issue and addressed it in waylan/mkdocs-nature#5. In our case the HTML tags were coming from connect generated from a script so we were not actually affected by this exact issue (in that the source our page titles were not parsed Markdown). Regardless, we had HTML tags in our page titles which were being inserted in our templates in places they shouldn't be.

The solution we used was to add the striptags filter to our templates where appropriate. In fact, in some places, we wanted the HTML tags, but in other places they were not desirable. Obviously, we want no HTML tags in the <title> tag for a page. Same goes for title attributes. But we do want the HTML tags in link labels (for example in the nav). By having MkDocs leave the tags in, we have total control in our templates. Consider this pseudo example:

<a href="{{ page.url }}" title="{{ page.title|striptags }}">{{ page.title }}</a>

If the current behavior is reversed, then all titles all of the time would never contain any HTML tags. In our case, we wanted <code> tags to wrap each page title as each represents a Python module. If all of those <code> tags were stripped, our spellchecking CI test would throw a fit and many of the module names are not in the dictionary and there would be no way to work around it.

I realize that a breaking change was introduced to MkDocs here and now there are compatibility issues for previous releases of themes. So, leaving the behavior as-is is probably not a reasonable way forward. However, from my perspective, the current behavior is actually preferred for my specific use case. Just throwing that out there as a consideration.

As an aside, it occurred to me that a future update could reverse MkDocs behavior and now we would have unnecessary calls to the striptags filter in our templates. But I am okay with that as it ensures we never get tags were they are not supposed to be regardless of any future updates. It might be good practice for all theme developers to be using a similar approach with their templates.

Finally, I will note that any future reversal could have no effect on our issue as we are not getting our titles from Markdown text. If so, that is good for us as we can continue to use our generated titles with HTML tags.

@oprypin
Copy link
Contributor

oprypin commented Feb 3, 2024

@squidfunk
I made a mistake in that particular change, sorry about that. Thanks for letting me know.
Yes, I am spending a lot of effort making it downward compatible. Then you come and tell me that I'm so bad for not making it downward compatible when really I made a separate mistake just in that change timvink/mkdocs-git-revision-date-localized-plugin#126
This is not the way to talk, you're just being "holier than thou" over and over, rather than actually saying something helpful.

@pawamoy
Copy link
Sponsor Contributor

pawamoy commented Feb 3, 2024

I think any downstream project maintainer following this thread worries a bit about having to deal with non-easily-fixable (or not fixable-at-all) breaking changes. I also think we know that you (@oprypin and @waylan, since you both seem to be the most invested in this issue) are being careful about backward compatibility. Honestly I'm lost here so thank you for doing this hard work. I think it's safe to assume that downstream maintainers are simply waiting for clear instructions once MkDocs maintainers have reached consensus on a solution, whether there are breaking changes or not.

I'll continue to test PR branches locally in any case and report any issue I find in relation with my own plugins (and I invite every other plugin/theme maintainer to do the same of course) 🙂 Maybe an updated issue body with a call-to-test disclaimer and some communication would bring more downstream maintainers to do just that, once there's a champion PR? Currently it seems there are 4 PRs related to this issue:

Or maybe it's not needed at all and we should all just keep an eye on the regressions pipeline, maybe adding relevant projects to it to make sure each case is covered.

(Feel free to hide this meta comment)

@squidfunk
Copy link
Sponsor Contributor

@oprypin

Then you come and tell me that I'm so bad for not making it downward compatible when really I made a separate mistake just in that change timvink/mkdocs-git-revision-date-localized-plugin#126
This is not the way to talk, you're just being "holier than thou" over and over, rather than actually saying something helpful.

I did not want to hurt you. Being condescending or disrespectful is never my goal. Could you please point me to the exact formulation in my comment that you feel was off? I'm not sure where you read that I thought you or your actions were as you phrase it 'bad'.

Maybe it's good to understand how all of this played out from my vantage point. We started the conversation several months ago, and then nothing happened for a long time. No doubt, this is a complex matter. Waylan came in, and although he had some good points, we did not seem to find an agreement. I retracted myself from the further discussion, knowing that the maintainers of MkDocs will find a good solution for this. I continued to receive email notifications, and the latest changes caught my attention, because they sounded potentially breaking to my work on Material for MkDocs.

Now, as you know, I maintain a project with tens of thousands of users, and as many of the support requests land on our issue tracker, since for many users Material for MkDocs is MkDocs (nothing we promote), I wanted to point out that for the ecosystem to remain successful and growing, IMHO, a clear and easy migration path is essential. Several of the Material for MkDocs major releases where necessary because there were breaking changes in MkDocs in patch- or minor-level versions. The impending changes sound very fundamental to me, as they touch plugins and themes. Since Material for MkDocs is one of the biggest downstream projects, I wanted to raise awareness.

A few hours later, a user created an issue on our issue tracker, pointing out that the git-revision-date-localized plugin integration is broken. A new revision of the plugin got released, which included a commit by you, talking about preparing the plugin for the new approach of escaping in templates. Downgrading the plugin resolved the error. I didn't see any reaction to my comment, so I felt that I needed to raise awareness about the issue, which was exactly the thing I talked about in my previous comment, hoping for some reaction, given that this change affected thousands and thousands of projects, as the git-revision-date-localized plugin has 340k downloads a month. In my humble view, fixing this was urgent.

Please tell me which words got you upset, so I can chose them more wisely next time. Additionally, I tried to be helpful, but it seems that my input on this issue, albeit it affects my work deeply, is of no help to you or waylan, as I can read from his and your comments. I'm happy about any suggestion on how you feel we can better gravitate towards a solution.

I hope we can resolve this matter quickly, so we can actually work on the problem ☺️

@oprypin
Copy link
Contributor

oprypin commented Feb 3, 2024

Here's a summary of what's happening with titles.


A title of a nav item can come from one of three sources, in this order of precedence.

  1. From the nav config in mkdocs.yml:
    The string is directly passed to themes, which normally paste it as raw HTML (tags, entities and all)
    This was a bad historical decision, but at least there's no ambiguity. May or may not be worth trying to change this.

  2. From the meta title in the document:
    Exactly the same as (1)

  3. From the first heading in the document (which is surely understood to be Markdown)

    • In MkDocs 1.4 and lower:
      The Markdown string is directly passed to themes, which normally paste it as raw HTML (tags, entities and all)
      This is really bad and I don't think anyone expected it.
    • In MkDocs 1.5:
      The Markdown is rendered, all HTML tags removed but with a bug that causes a failure to remove some of tags, HTML entities are kept.

A title of a ToC item comes from any heading in the document:

  • The Markdown is rendered, all HTML tags removed, HTML entities are kept.
    (This is instead done by an implementation from python-markdown, thus with subtle differences)

As you can see, MkDocs 1.4 had "consistent" nav item titles in that it consistently put them as raw HTML even though in one of the contexts the source material is clearly Markdown.

Then in MkDocs 1.5 instead we got a different form of consistency: the way to determine a nav item title from a Markdown heading became the same as determining the ToC item title from a Markdown heading (though with subtle differences and a bug).


The original bug report correctly identifies this failure to remove some of the tags from ToC titles (highlighted in bold above). But the followup reply then totally wrongly claimed that MkDocs 1.4 was perfect and was stripping tags. But as you can see it really wasn't stripping anything whatsoever, just not bothering to render Markdown or do anything at all otherwise.


In terms of the latest replies in this thread, they went in this direction:
Modify the nav item determination in the following way:

  1. From the first heading in the document (Markdown):
    • The Markdown is rendered and all HTML is kept, although attempts are still made to not break existing usages and somehow magically strip the HTML tags where they weren't expected

And I actually no longer think that this is a feasible way to proceed unless we also have plans to rework how ToC item headings are determined. Also it's otherwise quite reckless. Instead I'm more inclined to just fully unify these two behaviors (remember, the ToC item determination is only subtly different from current MkDocs 1.5 behavior for nav items), meaning that mainly just the bug at the beginning of this thread is to be fixed through better removal of tags. Other changes (such as autoescape for templates) become unnecessary then, and kind of a separate topic, to be considered on its own merits.

The fact that the first two ways to specify the nav item title do still produce unguarded raw HTML is a big pain, inconsistency and even danger, but maybe we should not be blinded by this. In some ways it is the bigger problem, but in ways of practicality it doesn't show up much and we don't have much choice but to keep existing behavior anyway.

@ofek
Copy link
Contributor

ofek commented Feb 3, 2024

If your proposed fix is implemented, would there still be a possibility of changing the resolution order with which titles are derived? Specifically, I'm thinking about this issue: #3532

@oprypin
Copy link
Contributor

oprypin commented Feb 3, 2024

@ofek Yes that will be #3532 (comment) #3533

@oprypin
Copy link
Contributor

oprypin commented Feb 3, 2024

It also seems to me that there is a high demand for a good implementation of this functionality:

"Take an etree element in the middle of Markdown rendering, finish rendering it and convert it to a reasonable string with HTML entities but without any HTML tags."

I only now realize that the 'toc' extension implements exactly this by necessity and instead I had made two separate implementations of pretty much the same thing:

  1. mkdocs-literate-nav
    Wrong because it forgets to un-stash stashed HTML, or run other postprocessors
  2. mkdocs 1.5
    Wrong because it forgets to remove tags after unstashing HTML

We've been talking how to safely strip tags after unstashing HTML, and that we may need a new implementation, but Python-Markdown has that already (though not sure if it's 100% reliable and optimal)

Now, it might just seem like I'm saying that we should just switch every implementation to the one from the 'toc' extension. And that's probably true. But it does have its own deficiencies!

  • It runs as a treeprocessor at priority 5 and yanks the content before treeprocessors with later priorities had a chance to run. Specifically the 'smarty' treeprocessor (at priority 2) is what I noticed to be a problem. Currently foo --- bar with 'smarty' extension remains just foo --- bar in the ToC title. My implementation for the nav titles avoided this bug just by being at a later priority - it got the correct foo &mdash; bar.

    I think this is pretty much just a bug to be reported on Python-Markdown.

  • I was also saying that ideally <img alt="foo description" src="foo.png"> would be replaced by "foo description" rather than by nothing. And ideally I'd like to see this in both the nav and toc titles.

@waylan
Copy link
Member

waylan commented Feb 4, 2024

The fact that the first two ways to specify the nav item title do still produce unguarded raw HTML is a big pain, inconsistency and even danger, but maybe we should not be blinded by this. In some ways it is the bigger problem, but in ways of practicality it doesn't show up much and we don't have much choice but to keep existing behavior anyway.

You are correct that this hasn't been much of an issue over time. But that is because when a user defines their custom title, they immediately see how it is broken and make a change and remove the offending markup. As the title rarely appears anywhere outside of the navigation and <title>, the user doesn't feel the need to add the fancy markup to their titles. But, when the title comes from the body content of a page, the user understandably wants to keep their fancy markup in the body content. And so this is where the issue is prevalent.

That said, there is no reason why we couldn't just as easily sanitize titles regardless of where they come from. In fact, it annoys me that the change in 1.5 removed the setter and getter for Page.title. By using a setter and getter, we could easily ensure any page title, regardless of source, is always properly sanitized. For example, when a plugin does somepage.title = '<strong>custom</strong> title with weird&mdash;markup <svg>...</svg>', the setter (or perhaps the getter) could sanitize it.

As I see it, there are really only three issues being discussed here:

  1. How to obtain the title from the body content of the page so that it is in a sensible form.
  2. How to sanitize the title.
  3. Do we want to retain the unsanitized title along with the sanitized title and if so, what API do we implement to make both available?

The first issue does not need to concern itself with the other two. The only concern for the first issue is to ensure a fully rendered string is obtained. One could argue that that is not even on-point for this issue as this issue is really only about issues 2 and 3.

It was issue 3 which sent us off on the autoescape detour. It would be an elegant solution, and perhaps a good long-term goal for MkDocs in general, but that is a big change and this issue needs an immediate fix. For me, the hang-up is what API to use for issue 3 as I personally need the unsanitized title to be retained as I have explained in previous comments.

@oprypin
Copy link
Contributor

oprypin commented Feb 4, 2024

Meanwhile mkdocs-material wasn't so interested in solving the general challenge of exposing the unsanitized titles and instead has been selling a plugin that replaces the titles with rendered unsanitized versions, for a while already.

https://squidfunk.github.io/mkdocs-material/plugins/typeset/

It might be attractive to publish an equivalent plugin and suggest it to anyone who wants the unsanitized titles. In effect, such a plugin has 0 advantages over just adding a built-in config with this effect. But there's a benefit in terms of existing knowledge and discoverability of such a solution being available as a plugin.

@oprypin
Copy link
Contributor

oprypin commented Feb 5, 2024

In fact, it annoys me that the change in 1.5 removed the setter and getter for Page.title

There was never a setter, it was a plain attribute 🤔

@oprypin
Copy link
Contributor

oprypin commented Feb 5, 2024

there is no reason why we couldn't just as easily sanitize titles regardless of where they come from

Well for example if the title came from the nav config, then someone had to have intentionally put the markup in there, so unconditionally sanitizing it also doesn't make much sense

@squidfunk
Copy link
Sponsor Contributor

squidfunk commented Feb 5, 2024

Meanwhile mkdocs-material wasn't so interested in solving the general challenge of exposing the unsanitized titles and instead has been selling a plugin that replaces the titles with rendered unsanitized versions, for a while already.

https://squidfunk.github.io/mkdocs-material/plugins/typeset/

It might be attractive to publish an equivalent plugin and suggest it to anyone who wants the unsanitized titles. In effect, such a plugin has 0 advantages over just adding a built-in config with this effect. But there's a benefit in terms of existing knowledge and discoverability of such a solution being available as a plugin.

We'd be happy to deprecate that plugin, if MkDocs would expose the same functionality. The plugin was the only way we could make it work with reasonable effort. In fact, I started writing plugins because almost all of my feature request over at MkDocs got turned down by the maintainers, but if MkDocs or a plugin maintained by the MkDocs team offers the exact same functionality, we're going to deprecate the typeset plugin immediately. Always happy to have to maintain less 😉

@waylan
Copy link
Member

waylan commented Feb 5, 2024

there is no reason why we couldn't just as easily sanitize titles regardless of where they come from

Well for example if the title came from the nav config, then someone had to have intentionally put the markup in there, so unconditionally sanitizing it also doesn't make much sense

As has been covered multiple times in this discussion, there are use cases for a title with Markup. I'm assuming that the user wants the Markup for those use cases, but at the same time, in some situations (like the <title> element), the title should never have any markup. Therefore, I am suggesting that a page accept a title with markup, and then provide an API to retrieve both the title with markup and the title with all markup stripped. I thought we had established that I and was merely pointing out that should happen for regardless of the source of the title (nav config, meta-data, page content, plugin, or other).

@oprypin oprypin reopened this Feb 11, 2024
@vedranmiletic
Copy link
Contributor

FWIW, #3564 did not return to pre-1.5 behavior since it only affected H1; H2-H6 still retain HTML tags in the navigation sidebar, while in 1.4 they did not.

@oprypin
Copy link
Contributor

oprypin commented Feb 11, 2024

@vedranmiletic You would have to back up that claim, because this does not appear possible to me.

@vedranmiletic
Copy link
Contributor

@vedranmiletic You would have to back up that claim, because this does not appear possible to me.

I apologize, I was wrong and you are indeed right. There are no changes in behavior from pre-1.5 to 1.5 and newer.

To clarify, the issue that I thought of is that leftovers of HTML character references appears in navigation sidebar; for example, see first H2 on this page. It must be another tool that I remember that didn't have leftovers of character references as I use these Marp presentations in various Markdown tools.

It would be great to have this behavior fixed in some way for H2-H6, but now it doesn't seem to be the same issue as this one. Should I open a separate bug?

@oprypin

This comment was marked as outdated.

@oprypin
Copy link
Contributor

oprypin commented Feb 11, 2024

@vedranmiletic Aha yes, I can observe your bug in the default theme with zero extensions.

It is about writing an email in a heading such as ## <foo@example.org>
-applies to any theme and produces this in ToC: �amp�#102;�amp�#111; ...

This is a bug in Python-Markdown but a PR is open that happens to fix it:
Python-Markdown/markdown#1441 (comment)

@vedranmiletic
Copy link
Contributor

@oprypin thanks!

@oprypin
Copy link
Contributor

oprypin commented Mar 8, 2024

So to recap- the solution I'm going for is just to keep 1.5 behavior but fix all bugs with it. Python-Markdown will have the needed implementation and it will be possible to depend on it directly but only from future versions. For now I will instead duplicate the implementation that Python-Markdown does.

There is one blocker though - @waylan could you accept this comment in some way #3578 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment