Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ampersands seen in search results #3213

Open
jidanni opened this issue May 6, 2023 · 6 comments · May be fixed by #3560
Open

Ampersands seen in search results #3213

jidanni opened this issue May 6, 2023 · 6 comments · May be fixed by #3560

Comments

@jidanni
Copy link

jidanni commented May 6, 2023

I do not own this site. All I know is I did a search and I saw these raw ampersands in the results.
Screenshot_20230506_115822_Chrome Beta.jpg

The site was https://vespucci.io/ and the search was for "copy".

At the bottom of the page there's indication that it was using your software package. Hence I'm reporting the problem directly to you, in the hope that future editions of the software will avoid the problem.

@theoneGUI
Copy link

It looks like this is the offending line. I'm new here, so I don't know the ins and outs of the program.

return '<article><h3><a href="' + joinUrl(base_url, location) + '">'+ escapeHtml(title) + '</a></h3><p>' + escapeHtml(summary) +'</p></article>';

I was able to replicate the problem by adding an ampersand to a title of a markdown document. I don't know if removing the
escapeHtml() around title is the answer, but it's an answer.

I see that the search result is getting formatted as if it's getting inserted directly into HTML with escapes, but is the formatResult() necessary for title and summary if it's putting the escape code in the document?

@oprypin oprypin linked a pull request Apr 10, 2024 that will close this issue
@oprypin
Copy link
Contributor

oprypin commented Apr 10, 2024

#3560 is an attempted fix but it's also rather confusing. Instead maybe this simpler fix could be applied.

diff --git a/mkdocs/contrib/search/search_index.py b/mkdocs/contrib/search/search_index.py
index 770f24fe2..c90eeb0dc 100644
--- a/mkdocs/contrib/search/search_index.py
+++ b/mkdocs/contrib/search/search_index.py
@@ -8,6 +8,8 @@ import subprocess
 from html.parser import HTMLParser
 from typing import TYPE_CHECKING
 
+from markupsafe import Markup
+
 if TYPE_CHECKING:
     from mkdocs.structure.pages import Page
     from mkdocs.structure.toc import AnchorLink, TableOfContents
@@ -90,7 +92,9 @@ class SearchIndex:
 
         text = ' '.join(section.text) if self.config['indexing'] == 'full' else ''
         if toc_item is not None:
-            self._add_entry(title=toc_item.title, text=text, loc=abs_url + toc_item.url)
+            self._add_entry(
+                title=Markup(toc_item.title).striptags(), text=text, loc=abs_url + toc_item.url
+            )
 
     def generate_search_index(self) -> str:
         """Python to json conversion."""

This fixes escaping in the search index and causes no other changes for MkDocs' own site at least

-      "title": "&lt;script&gt; tags can specify type=\"module\" and other attributes"
+      "title": "<script> tags can specify type=\"module\" and other attributes"

@waylan
Copy link
Member

waylan commented Apr 10, 2024

And how does this proposal address the issues raised in the first comment on #3560?

@oprypin
Copy link
Contributor

oprypin commented Apr 10, 2024

I don't know what exact issue the rest of #3560 is supposed to address, to be honest. It makes several changes in behavior and claims that they are for the better but I don't think I agree that it's better.

@tomchristie
Copy link
Member

uh oh

trouble

@tomchristie
Copy link
Member

tomchristie commented Apr 15, 2024

What are we here for if not for the content (& lols?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants