New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header and navigation sidebars display incorrectly arbitrary markup #3357
Comments
@oprypin friendly ping, since this is a pretty serious bug. As of MkDocs 1.5, the page title contains HTML, which does not only break our project Material for MkDocs, but other themes like ReadTheDocs, and potentially many plugins that are not prepared to handle HTML in The changelog of MkDocs 1.5.0 only talks about a new method for deducing the page title, but not about this change in behavior which, in my opinion, is a breaking change that requires the ecosystem to adapt. For this reason, I assume it's an unintended bug that we need to fix. |
Just to make sure I get it right: what would be the expected behavior here? Silently ignore any |
The behavior of MkDocs 1.4 – filter HTML tags from page titles. Edit: If that is the new desired behavior, it should go into the docs and it should actually be a breaking release (= MkDocs 2.0) because all plugins that process |
Possibly related: waylan/mkdocs-nature#5. Only recently did we introduce HTML UPDATE: well nope, So IIUC, the behavior differs whether a title is given in |
Maybe we need to modify the python-markdown preprocessing for the title parser. Gentle ping to @facelessuser who I suspect is a master in python-markdown processors. It's quite obscure to me I'm afraid so far... Could we tweak the title processor to ignore :emoji: formatting? |
@ultrabug I'm not sure I understand the question. Can you break this down for me? |
MkDocs 1.5 have a title processor which is now in charge of rendering the title of the page using a custom python-markdown Emojis in page title which are rendered as SVG (for example) do break the page structure and CSS as demonstrated here. I was wondering if we could alter our custom |
It might be an issue of ordering – the title processor will run before emoji is being processed, and then after the title processor removed all markup, emojis keeps adding stuff to it. That's at least my theory. |
Emoji is an inline processor. These get run before tree processors.
|
Python Markdown allows you to specify special labels if you need to control exactly what is used for TOC etc.: https://python-markdown.github.io/extensions/toc/#custom-labels |
If you are hoping for TOC to accept some way to exclude specific syntax, you will probably be disappointed. It should not have knowledge of specific (especially 3rd party) extensions. I'm sure you can imagine the maintenance nightmare that would be as other extensions popup that also want to be excluded. Help me out, what exactly is the end result you want? What is your expectation of how things should work and why would it solve your problem? If I am to be of any help, I need to understand the end goal. Maybe there is a creative way to workaround this. Maybe one I could implement into emoji (if necessary). I just need to understand what you are hoping to accomplish. |
The endgoal is to sanitize the |
I've investigated and think I know what the problem is: mkdocs/mkdocs/structure/pages.py Lines 439 to 448 in f94ab3f
A brute force approach would be to strip all tags again, but maybe there's a better solution. |
First off let's make a reproduction case and clarify the behavior of markup in nav headers. mkdocs.yml: site_name: My Docs
markdown_extensions:
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg docs/index.md: # Title *title* <i>title</i> :material-book: There is no change in the content itself but there is change in the nav rendering. Each theme behaves consistently with each other.
All in all both behaviors are wildly inconsistent. If I had to pick one, probably the one in MkDocs 1.5 makes more sense, but that doesn't help much. Emojis were producing poor results both before and after, just differently. And there is the repro case from the issue on mkdocs-material, indeed the behavior is somehow different.
There's also the separate matter that there is a "material/typeset" plugin (not public) that was relying on the behavior that Markdown is being ignored/preserved as it's being moved to the nav title, and then re-applying that Markdown in its own way. So emojis were fully supported only on mkdocs-material-insiders in its own way, and this change made that not function anymore. And it is definitely a huge issue that the HTML now reaches directly into search results |
So some things to note:
|
Regarding why the current behavior is like that, I'm quite sure that @squidfunk has it spot on - #3357 (comment) But it is unclear what the actual best outcome would be, seeing as, again, actually stripping all HTML would just make emojis actually impossible to support in titles. Whereas in the current state, to support them, only some CSS fixes would be needed. |
And yet another aspect - indeed, specifying the page title in the |
#3357 (comment) But this was always partly the case if one were to write raw HTML for the page title, just that now Markdown can also insert a huge SVG emoji and it will end up in the page title as raw markup |
One pretty straightforward resolution here would be to indeed almost totally rollback to pre-1.5 behavior, even if it's not ideal. There's actually a way to do that that doesn't require rolling back the new title detection.
|
Always amazed by your thoroughness @oprypin, thank you for doing the heavy lifting of investigating the whys and hows. |
Hmm. Can't we change the priority of Markdown extensions, so that the stashes are restored before the tags are stripped? IMHO, this would be a better solution, but I'm probably missing something why this is not possible. |
The typeset plugin is actually an attempt to allow for a controlled variant of what's reported in this issue, particularly that themes can opt into allowing for rich text in specific parts without forcing HTML tags down every theme and plugin. Quite a few popular plugins in the ecosystem would need to be adapted if we would opt for MkDocs 1.5 behavior, which is why I consider this to be a breaking change that mandates a major release (if not declared a bug). |
@squidfunk I'm just quite lost regarding what the output should be conceptually. In terms of how to implement it, anything is doable. E.g.
|
Ideally a title without Markdown formatting and HTML tags. If that's not possible, MkDocs 1.4 behavior. |
Thanks for clarifying things better @oprypin I allowed myself to illustrate the effects of the proposed PR to link it to this discussion |
As @pawamoy mentioned above, we encountered a similar issue and addressed it in waylan/mkdocs-nature#5. In our case the HTML tags were coming from connect generated from a script so we were not actually affected by this exact issue (in that the source our page titles were not parsed Markdown). Regardless, we had HTML tags in our page titles which were being inserted in our templates in places they shouldn't be. The solution we used was to add the <a href="{{ page.url }}" title="{{ page.title|striptags }}">{{ page.title }}</a> If the current behavior is reversed, then all titles all of the time would never contain any HTML tags. In our case, we wanted I realize that a breaking change was introduced to MkDocs here and now there are compatibility issues for previous releases of themes. So, leaving the behavior as-is is probably not a reasonable way forward. However, from my perspective, the current behavior is actually preferred for my specific use case. Just throwing that out there as a consideration. As an aside, it occurred to me that a future update could reverse MkDocs behavior and now we would have unnecessary calls to the Finally, I will note that any future reversal could have no effect on our issue as we are not getting our titles from Markdown text. If so, that is good for us as we can continue to use our generated titles with HTML tags. |
@squidfunk |
I think any downstream project maintainer following this thread worries a bit about having to deal with non-easily-fixable (or not fixable-at-all) breaking changes. I also think we know that you (@oprypin and @waylan, since you both seem to be the most invested in this issue) are being careful about backward compatibility. Honestly I'm lost here so thank you for doing this hard work. I think it's safe to assume that downstream maintainers are simply waiting for clear instructions once MkDocs maintainers have reached consensus on a solution, whether there are breaking changes or not. I'll continue to test PR branches locally in any case and report any issue I find in relation with my own plugins (and I invite every other plugin/theme maintainer to do the same of course) 🙂 Maybe an updated issue body with a call-to-test disclaimer and some communication would bring more downstream maintainers to do just that, once there's a champion PR? Currently it seems there are 4 PRs related to this issue:
Or maybe it's not needed at all and we should all just keep an eye on the regressions pipeline, maybe adding relevant projects to it to make sure each case is covered. (Feel free to hide this meta comment) |
I did not want to hurt you. Being condescending or disrespectful is never my goal. Could you please point me to the exact formulation in my comment that you feel was off? I'm not sure where you read that I thought you or your actions were as you phrase it 'bad'. Maybe it's good to understand how all of this played out from my vantage point. We started the conversation several months ago, and then nothing happened for a long time. No doubt, this is a complex matter. Waylan came in, and although he had some good points, we did not seem to find an agreement. I retracted myself from the further discussion, knowing that the maintainers of MkDocs will find a good solution for this. I continued to receive email notifications, and the latest changes caught my attention, because they sounded potentially breaking to my work on Material for MkDocs. Now, as you know, I maintain a project with tens of thousands of users, and as many of the support requests land on our issue tracker, since for many users Material for MkDocs is MkDocs (nothing we promote), I wanted to point out that for the ecosystem to remain successful and growing, IMHO, a clear and easy migration path is essential. Several of the Material for MkDocs major releases where necessary because there were breaking changes in MkDocs in patch- or minor-level versions. The impending changes sound very fundamental to me, as they touch plugins and themes. Since Material for MkDocs is one of the biggest downstream projects, I wanted to raise awareness. A few hours later, a user created an issue on our issue tracker, pointing out that the git-revision-date-localized plugin integration is broken. A new revision of the plugin got released, which included a commit by you, talking about preparing the plugin for the new approach of escaping in templates. Downgrading the plugin resolved the error. I didn't see any reaction to my comment, so I felt that I needed to raise awareness about the issue, which was exactly the thing I talked about in my previous comment, hoping for some reaction, given that this change affected thousands and thousands of projects, as the git-revision-date-localized plugin has 340k downloads a month. In my humble view, fixing this was urgent. Please tell me which words got you upset, so I can chose them more wisely next time. Additionally, I tried to be helpful, but it seems that my input on this issue, albeit it affects my work deeply, is of no help to you or waylan, as I can read from his and your comments. I'm happy about any suggestion on how you feel we can better gravitate towards a solution. I hope we can resolve this matter quickly, so we can actually work on the problem |
Here's a summary of what's happening with titles. A title of a nav item can come from one of three sources, in this order of precedence.
A title of a ToC item comes from any heading in the document:
As you can see, MkDocs 1.4 had "consistent" nav item titles in that it consistently put them as raw HTML even though in one of the contexts the source material is clearly Markdown. Then in MkDocs 1.5 instead we got a different form of consistency: the way to determine a nav item title from a Markdown heading became the same as determining the ToC item title from a Markdown heading (though with subtle differences and a bug). The original bug report correctly identifies this failure to remove some of the tags from ToC titles (highlighted in bold above). But the followup reply then totally wrongly claimed that MkDocs 1.4 was perfect and was stripping tags. But as you can see it really wasn't stripping anything whatsoever, just not bothering to render Markdown or do anything at all otherwise. In terms of the latest replies in this thread, they went in this direction:
And I actually no longer think that this is a feasible way to proceed unless we also have plans to rework how ToC item headings are determined. Also it's otherwise quite reckless. Instead I'm more inclined to just fully unify these two behaviors (remember, the ToC item determination is only subtly different from current MkDocs 1.5 behavior for nav items), meaning that mainly just the bug at the beginning of this thread is to be fixed through better removal of tags. Other changes (such as autoescape for templates) become unnecessary then, and kind of a separate topic, to be considered on its own merits. The fact that the first two ways to specify the nav item title do still produce unguarded raw HTML is a big pain, inconsistency and even danger, but maybe we should not be blinded by this. In some ways it is the bigger problem, but in ways of practicality it doesn't show up much and we don't have much choice but to keep existing behavior anyway. |
If your proposed fix is implemented, would there still be a possibility of changing the resolution order with which titles are derived? Specifically, I'm thinking about this issue: #3532 |
@ofek Yes that will be #3532 (comment) #3533 |
It also seems to me that there is a high demand for a good implementation of this functionality: "Take an etree element in the middle of Markdown rendering, finish rendering it and convert it to a reasonable string with HTML entities but without any HTML tags." I only now realize that the 'toc' extension implements exactly this by necessity and instead I had made two separate implementations of pretty much the same thing:
We've been talking how to safely strip tags after unstashing HTML, and that we may need a new implementation, but Python-Markdown has that already (though not sure if it's 100% reliable and optimal) Now, it might just seem like I'm saying that we should just switch every implementation to the one from the 'toc' extension. And that's probably true. But it does have its own deficiencies!
|
You are correct that this hasn't been much of an issue over time. But that is because when a user defines their custom title, they immediately see how it is broken and make a change and remove the offending markup. As the title rarely appears anywhere outside of the navigation and That said, there is no reason why we couldn't just as easily sanitize titles regardless of where they come from. In fact, it annoys me that the change in 1.5 removed the setter and getter for As I see it, there are really only three issues being discussed here:
The first issue does not need to concern itself with the other two. The only concern for the first issue is to ensure a fully rendered string is obtained. One could argue that that is not even on-point for this issue as this issue is really only about issues 2 and 3. It was issue 3 which sent us off on the autoescape detour. It would be an elegant solution, and perhaps a good long-term goal for MkDocs in general, but that is a big change and this issue needs an immediate fix. For me, the hang-up is what API to use for issue 3 as I personally need the unsanitized title to be retained as I have explained in previous comments. |
Meanwhile mkdocs-material wasn't so interested in solving the general challenge of exposing the unsanitized titles and instead has been selling a plugin that replaces the titles with rendered unsanitized versions, for a while already. https://squidfunk.github.io/mkdocs-material/plugins/typeset/ It might be attractive to publish an equivalent plugin and suggest it to anyone who wants the unsanitized titles. In effect, such a plugin has 0 advantages over just adding a built-in config with this effect. But there's a benefit in terms of existing knowledge and discoverability of such a solution being available as a plugin. |
There was never a setter, it was a plain attribute 🤔 |
Well for example if the title came from the nav config, then someone had to have intentionally put the markup in there, so unconditionally sanitizing it also doesn't make much sense |
We'd be happy to deprecate that plugin, if MkDocs would expose the same functionality. The plugin was the only way we could make it work with reasonable effort. In fact, I started writing plugins because almost all of my feature request over at MkDocs got turned down by the maintainers, but if MkDocs or a plugin maintained by the MkDocs team offers the exact same functionality, we're going to deprecate the typeset plugin immediately. Always happy to have to maintain less 😉 |
As has been covered multiple times in this discussion, there are use cases for a title with Markup. I'm assuming that the user wants the Markup for those use cases, but at the same time, in some situations (like the |
FWIW, #3564 did not return to pre-1.5 behavior since it only affected H1; H2-H6 still retain HTML tags in the navigation sidebar, while in 1.4 they did not. |
@vedranmiletic You would have to back up that claim, because this does not appear possible to me. |
I apologize, I was wrong and you are indeed right. There are no changes in behavior from pre-1.5 to 1.5 and newer. To clarify, the issue that I thought of is that leftovers of HTML character references appears in navigation sidebar; for example, see first H2 on this page. It must be another tool that I remember that didn't have leftovers of character references as I use these Marp presentations in various Markdown tools. It would be great to have this behavior fixed in some way for H2-H6, but now it doesn't seem to be the same issue as this one. Should I open a separate bug? |
This comment was marked as outdated.
This comment was marked as outdated.
@vedranmiletic Aha yes, I can observe your bug in the default theme with zero extensions. It is about writing an email in a heading such as This is a bug in Python-Markdown but a PR is open that happens to fix it: |
@oprypin thanks! |
So to recap- the solution I'm going for is just to keep 1.5 behavior but fix all bugs with it. Python-Markdown will have the needed implementation and it will be possible to depend on it directly but only from future versions. For now I will instead duplicate the implementation that Python-Markdown does. There is one blocker though - @waylan could you accept this comment in some way #3578 (comment) |
Hello!
Since when does the bug occur?
I recently upgraded my
mkdocs
package from1.4.2
to1.5.2
, and I noticed this bug when...mkdocs.yml
file or in the page metadata (in yaml).# Intégration et Déploiement Continu sur :simple-gitlab: Gitlab
For example (with
mkdocs-material
theme andreadthedocs
theme):... while it was displayed like that before:
More generally, some parts like the search page of
readthedocs
theme don't handle well the HTML in title sections.External links:
readthedocs
theme pictures and bug explanations are from @squidfunk.The text was updated successfully, but these errors were encountered: