Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: index document by custom metadata #3174

Open
4 tasks done
garrettheel opened this issue Nov 3, 2021 · 32 comments
Open
4 tasks done

Search: index document by custom metadata #3174

garrettheel opened this issue Nov 3, 2021 · 32 comments
Labels
change request Issue requests a new feature or improvement

Comments

@garrettheel
Copy link
Contributor

Contribution guidelines

I want to suggest an idea and checked that ...

  • ... to my best knowledge, my idea wouldn't break something for other users
  • ... the documentation does not mention anything about my idea
  • ... there are no open or closed issues that are related to my idea

Description

It would be really helpful if keywords could be included within the metadata of a page and used to help provide more accurate search results. Oftentimes a generic search term (e.g "service" for me) will return a ton of irrelevant results, seeming to overweight the use of one of the words in a heading.

I see that the boost feature does exist, however I'm looking for something more granular. It's not so much that I have one really important page that should always come up, but in certain contexts I want to ensure the right page does.

Tags are the other feature that are possibly related, however I don't really want to group content together or make this information prominent on the page. Also I don't want to give the impression that the rest of the content has been accurately tagged if folks try to click into them.

Example:

---
search:
  keywords: foo,bar
---

Use Cases

This would allow finer-grained control of search results. Coupled with search analytics (related: #3169), it would allow me to see what folks are searching and where they might have had trouble finding the right results.

Screenshots / Mockups

No response

@squidfunk squidfunk added the needs investigation Issue must be investigated by the maintainers label Nov 3, 2021
@squidfunk
Copy link
Owner

Hmm, I'm unsure whether this is something that a lot of users need. Let's leave it open for a while and see whether there's some feedback from other users.

Oftentimes a generic search term (e.g "service" for me) will return a ton of irrelevant results, seeming to overweight the use of one of the words in a heading

Could you provide some reproducible examples? I'd be curious to learn if we can improve the default settings.

@squidfunk
Copy link
Owner

I want to generalize this issue to check whether users would find it helpful to index author-define document metadata with the help of the search plugin. This would address the things mentioned in the OP and more, maybe something like:

  • Tags/categories (already supported)
  • Keywords (essentially like tags, but not rendered)
  • Author
  • ...?

@squidfunk squidfunk changed the title Specify keywords on pages to boost search ranking Feature suggestion: make document metadata searchable Nov 7, 2021
@squidfunk squidfunk added change request Issue requests a new feature or improvement and removed needs investigation Issue must be investigated by the maintainers labels Nov 7, 2021
@garrettheel
Copy link
Contributor Author

Could you provide some reproducible examples? I'd be curious to learn if we can improve the default settings.

It's a little tough to share the actual site that I'm working with. Let me try to create a small demo site that demonstrates the issue.

I want to generalize this issue to check whether users would find it helpful to index author-define document metadata with the help of the search plugin.

That makes sense to me!

@squidfunk
Copy link
Owner

There's no response from other users, so I guess this is currently not worth pursuing. I may re-evaluate this in the future, though. Closing for now.

@feasgal
Copy link

feasgal commented Mar 2, 2023

This would be really useful for us at the section/heading level. For one example, our FAQ has many entries on each FAQ category's page. I'm writing an entry about disabling accounts. I don't want to say 'delete' anywhere in the text of the entry because I don't want the user to have any possibility to misunderstand and think the account actually ceases to exist. But I do want to help the user find this entry when they search the site for words like 'delete' or 'GDPR'.

I'm envisioning something like
### How do I disable my account? { data-search-keyword="delete";"GDPR" }

@squidfunk
Copy link
Owner

Thanks for the input. We're actually considering making arbitrary metadata searchable with the next iteration of search. If somebody searches for "foo", but it's not contained in the text (so there's nothing to highlight), how would you imagine the interface to tell the user that it's a legit match?

@wez
Copy link

wez commented Mar 21, 2023

Here's my concrete issue where I think this will help me.

I have a page with a filename changelog.md It is my "Change Log". It is linked in my navigation as ChangeLog. If the user types in "changelog" the search index will not suggest this page, but it will if they type in ChangeLog and I have my search separator set to distinguish by case.

My workaround is to put this invisible content at the top of my page:

<p style="display:none">
changelog
</p>

which renders like this; it doesn't know that the paragraph is hidden, and I think that that result looks OK:

image

I'd rather put keywords in the front matter / metadata at the top of the page, or just index the summary metadata, but this is workable.

In terms of rendering, if I put this in my front matter:

summary: this is the changelog page, which holds the log of changes

I think showing that summary tag and the matching excerpt as the highlight would make it clear that it was a summary and wouldn't be super confusing when they land on the page itself.

Sort of off-topic from this issue, but related: It would be great if the markdown document filename could also participate in the search index; I think that might save me from having to write keywords at all in this instance.

wez added a commit to wez/wezterm that referenced this issue Mar 21, 2023
It's a little bit inelegant, but the mkdocs-material folks seem to have
a nicer solution in the works, so this will do for now.

closes: #3305
refs: squidfunk/mkdocs-material#3174
@smartYSC
Copy link

smartYSC commented Mar 29, 2023

We have the use case to search for numbers. We have a documentation where pages have an integer ID. When searching for the ID, lets say 10, you find all kinds of pages that contain a 10. We already boost the search importance of the page a bit but we would like to boost that ID, so when you search for the ID, the page about that ID shows up on top.

I would be fine to add the ID as a keyword to the page and having search to prioritize keywords somehow.

Edit: Adding the ID to the page headline achieves this. However, I would prefer to not have the ID in the headline.

@squidfunk
Copy link
Owner

@smartYSC could you create a minimal reproduction that showcases what you imagine?

@smartYSC
Copy link

Sure, here is an example:

example.zip

If you search for foo or bar, the right page is on top, if you search for 123, page bar is on top, which is expected. If you search for 456, page bar is on the top as well, but it should be page foo.

For page bar the ID in the header is enough to lead the search to the correct page.

I added a keywords section to the top (ignored by mkdocs right now). I would like these keywords to be taken strongly into account when a page is considered as a search result for a given input.

@squidfunk
Copy link
Owner

Thanks! We'll investigate when working on the new search.

@squidfunk
Copy link
Owner

I've started working on the next big search update which will include the ability to list and better select available metadata like tags, authors, etc. This will allow to customize the search in a way that wasn't possible before. Sneak peak here.

@squidfunk squidfunk reopened this Apr 9, 2023
@feasgal
Copy link

feasgal commented Jun 19, 2023

That sneak peak looks pretty cool. Are you considering being able to tag a heading/section for search? Something like this, maybe?

### How do I disable my account? { keyword="delete";"GDPR" }

@squidfunk
Copy link
Owner

squidfunk commented Jun 19, 2023

Could be done. Could you elaborate some use cases? Would the keywords be visible? Would they be a specific category? I'm very interested in learning about different use cases, so we can fulfill them all, at best ☺️

Edit: just scrolled through this issue and saw my earlier post from Nov 2021:

  • Keywords (essentially like tags, but not rendered)

So I imagine that was what you're looking for 😅

@feasgal
Copy link

feasgal commented Jun 19, 2023

Similar... but I believe you're talking there about keywords at the page level. I'm talking about per heading within a page. Similar to how you can exclude sections from search as well as whole pages, I'd like to be able to tag sections by a similar method of adding some pragma or other to the end of a Markdown heading.

I could partially solve this by including a keywords paragraph with visibility: hidden, but I would rather have keyword metadata so that heading-level search results could be boosted (and the boost customized) for keyword just like they are for tag and title.

Examples from my situation:

  1. In the docs set for one product, I have a FAQ with multiple entries per page, each beginning with a ##### Markdown heading that contains the question that the entry answers. I want the entry entitled "How do I deactivate my account?" to appear high in the search results for both delete and GDPR, but for various reasons I don't want to put those terms in the text of the entry. If I could populate tags (keywords?) for that individual entry, then the entry could be boosted in search results for one of those terms.
  2. In the docs set for another product, I have a new feature that replaces a deprecated one, and don't want to have to mention the old one every single time I say something about the new one, but I know my existing customer base is going to search for the old feature name because they are used to it. I want the search to point them to the thing they need, when they search for the old name.

In both these cases, yes, I would want it to show as a keyword somehow in the search results, something like how the tags do today, so that the user would know the result was actually relevant:
image

@smartYSC
Copy link

I like @feasgal idea, usage is also similar to the exclusion of sections already present.

@squidfunk
Copy link
Owner

Similar to how you can exclude sections from search as well as whole pages, I'd like to be able to tag sections by a similar method of adding some pragma or other to the end of a Markdown heading.

Great input, I also though of that. Either via the attribute extension, or via custom blocks.

I would rather have keyword metadata so that heading-level search results could be boosted (#4512 (comment)) for keyword just like they are for tag and title.

All search results are tied to headings, as documents are disassembled into sections. If you boost a custom field, e.g. keyword, and add this field on a document level (in front matter) it's tied to h1, and in subsections it would be tied to the respective subsections h2-5. I'm not sure it would make sense to add heading level as another dimension for boost configuration, but what you could easily do is just to use another metadata property with a different boost configuration.

However, I'm pretty sure that the new search will make boosting less necessary and should provide much more relevant results without much configuration. That's at least my goal, but use cases may differ, so providing degrees of freedom is absolutely essential.

In the docs set for one product, I have a FAQ with multiple entries per page, each beginning with a ##### Markdown heading that contains the question that the entry answers. I want the entry entitled "How do I deactivate my account?" to appear high in the search results for both delete and GDPR, but for various reasons I don't want to put those terms in the text of the entry. If I could populate tags (keywords?) for that individual entry, then the entry could be boosted in search results for one of those terms.

I would take this one step further: we will add the ability to define precisely what metadata property will actually be shown to the user when searching. For example you would always want to show author, title, text, tags, but you would not want to show keywords. This will be completely configurable. We need to think of a good way to denote that a certain keyword matched an article, because we're not doing any highlighting if the term is not contained in the text but show it as a result nonetheless, but I'm very sure we'll find a good way to do this.

In the docs set for another product, I have a new feature that replaces a deprecated one, and don't want to have to mention the old one every single time I say something about the new one, but I know my existing customer base is going to search for the old feature name because they are used to it. I want the search to point them to the thing they need, when they search for the old name.

Same thing. The name of the old product could be defined as a keyword. E.g. "installation" will bring up the new product, but "installation foo" will bring up the old product before the new one. We might even extend the functionality to exclude certain entries when keywords are used, because now you could scope your search to specific parts of the documentation, but we'll leave that to after we shipped the first few versions.

I'm really excited about the new approach, because my testing already shows that it will be so much more powerful and customizable than what we currently have. I want to make it as awesome as possible with the help of you and other users after shipping the first iterations. It'll take some time, as it's a pretty big fish to fry, but I think it'll be worth it 😊 Same as with the new social cards, the second and third iteration are always way, way better than what we had before, offering tons of new options and flexibility that we previously didn't have, and didn't know we need.

@feasgal
Copy link

feasgal commented Jun 21, 2023

All search results are tied to headings, as documents are disassembled into sections. If you boost a custom field, e.g. keyword, and add this field on a document level (in front matter) it's tied to h1, and in subsections it would be tied to the respective subsections h2-5. I'm not sure it would make sense to add heading level as another dimension for boost configuration, but what you could easily do is just to use another metadata property with a different boost configuration.

Sorry, are you saying I could do this now? Or in the future when you release the new search you're working on? Have I missed that I could attach a custom field to a heading and then boost that field?

@squidfunk
Copy link
Owner

In the very near future. Currently only title, text and tags are supported. We're working on supporting any author-defined property in front matter and for sections, and make them configurable how they're used in search.

@squidfunk squidfunk changed the title Feature suggestion: make document metadata searchable Search: index document by custom metadata Aug 10, 2023
@JaxonWright
Copy link

@squidfunk where does this search overhaul stand as of today? Just wondering since the last message here was 2 months ago.

@squidfunk
Copy link
Owner

We're currently busy finishing the refactoring of the blog plugin. I'm sorry for the delay, but all the topics we're currently working on are pretty complex. Once the blog is stable (and 9.2 is out), we'll continue working on the search.

@kaddkaka
Copy link

Hi I created the issue https://wezfurlong.org/wezterm/config/lua/config/selection_word_boundary.html

Do you agree that this would be the suitable feature to use to get a "snake_case" title searchable?

Is work ongoing on search now or are there still other things to fix? (9.2 seems to be out?)

@squidfunk
Copy link
Owner

squidfunk commented Sep 25, 2023

@kaddkaka if you add _ to your separator, the title will be searchable with the current version of the plugin.

Is work ongoing on search now or are there still other things to fix? (9.2 seems to be out?)

9.2 is out, jup (actually we're at 9.4 already), but we had to squeeze in some other stuff, particularly restructuring our documentation to account for the growing number of options (still ongoing), setting up our examples repository and preparing everything to grow our team. With the funds, we're able to add further people to our core team that help out on discussion, issues, and other things, but you can probably imagine that it's quite an effort to scale from 1 person to more people, given that processes need to be established and some technicalities need to be put into place. Additionally, day-to-day ops like bugfixing, refactoring and issue triage eat up quite a significant portion of my time. With more people, I'll be able to focus on the search again ☺️ I'm very sorry for the delay and inconveniences the current search is causing, but I hope to push out a prototype before the end of the year.

@karengermond
Copy link

@squidfunk For the search, how long does it usually take before you go from prototype to making it available for all?

@squidfunk
Copy link
Owner

@karengermond as I already mentioned in my previous comment, later this year. I'm actively working on it right now, but I also need to set aside time for answering questions like this, fixing bugs and keeping the project in shape. Right now, I don't have 100% of my time available to work on it. If the funding situation would improve, we could hire more people to help.

That being said, I have a working prototype but it's still some way to got to turn it into being production ready, because we need to support the 60 languages that we currently have + all functionality that we implemented in the current solution.

Rebuilding a central feature like search from scratch takes time. If you need to have another solution now that solves your problems better, you may check paid solutions like Algolia which several users have integrated successfully. Once we release our new version, trying it out and switching back is trivial – just add it back to mkdocs.yml.

What specifically are you missing in the current implementation?

@karengermond
Copy link

karengermond commented Oct 12, 2023 via email

@squidfunk
Copy link
Owner

@karengermond if you can share your docs, or at least how huge it is, i.e., some metrics, that would be very helpful.

@karengermond
Copy link

karengermond commented Oct 12, 2023 via email

@squidfunk
Copy link
Owner

We're working hard on all things you requested.

On a side note, if your company is not a sponsor of the project, you might consider sponsoring the project on the organization tier, as this would directly help us to speed up development by allowing us to compensate other users to help out on issues, discussions and questions.

@andy-apptweak
Copy link

hi all :-) interesting long read.
The issue i'm having is related to indexing html files generated, rather than the source markdown files, and i wonder if that is something that will be possible with the new big search update?

Use case: we use the swagger-ui-tag plugin to embed swagger docs. But as it's just a tag in a markdown document, the search index is empty.
(My fix for now will be to update the search index manually after the build step.)

Should i open another issue ticket for this? 🤔

@squidfunk
Copy link
Owner

@andy-apptweak please open a new ticket, as it has nothing to do with the matters discussed in this issue. It should be possible to implement, but we will definitely need a minimal reproduction that we can work with. Please also explain the workarounds you're currently undertaking, so we have a complete and clear picture. Thank you!

@squidfunk
Copy link
Owner

Please see the announcement in #6307.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change request Issue requests a new feature or improvement
Projects
None yet
Development

No branches or pull requests

9 participants