Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search returning entire page in search box [mkdocs-material insiders] #3053

Closed
5 tasks done
albertkun opened this issue Sep 25, 2021 · 19 comments
Closed
5 tasks done
Labels
bug Issue reports a bug resolved Issue is resolved, yet unreleased if open

Comments

@albertkun
Copy link

albertkun commented Sep 25, 2021

Contribution guidelines

I've found a bug and checked that ...

  • ... the problem doesn't occur with the mkdocs or readthedocs themes
  • ... the problem persists when all overrides are removed, i.e. custom_dir, extra_javascript and extra_css
  • ... the documentation does not mention anything about my problem
  • ... there are no open or closed issues that are related to my problem

Description

Using mkdocs-material insiders search function seems to be returning the entire pages content alongside the paragraph symbol in the search results. This only occurs using deployment on GitHub pages, not locally.

Expected behaviour

I expect the search function to work the same on deployment as it does locally and not display the entire content of the page in the search tool, as follows:

image

Actual behaviour

The insiders version of mkdocs-material makes the search result return all of the page in its entirety.
image

Steps to reproduce

  1. Add tags to pages using mkdocs-material insiders
  2. Commit to GitHub repository
  3. Use GitHub actions to deploy:
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: 3.x
        env:
      - run: pip install git+https://${ALBERT_KEY}@github.com/squidfunk/mkdocs-material-insiders.git
        env:
          ALBERT_KEY: ${{secrets.ALBERT_KEY}}      
  1. Test out search
  2. Search bug appears.

Package versions

  • Python: Python 3.9.5
  • MkDocs: mkdocs, version 1.2.2
  • Material: 7.2.8+insiders.3.0.1

Configuration

site_name: LA Metro Digital Services Team Docs
repo_url: https://github.com/LACMTA/digital-services-team-docs/

theme:
  palette:
    - scheme: default
      primary: black
      toggle:
        icon: material/toggle-switch-off-outline
        name: Switch to dark mode
    - scheme: slate
      primary: black
      accent: white
      toggle:
        icon: material/toggle-switch
        name: Switch to light mode  
  font:
    text: Open Sans
  favicon: assets/media/favicon.png
  logo: assets/media/favicon-reverse.png
  name: material
  features:
      - navigation.instant
      - navigation.tabs
      - content.code.annotate
      - search.highlight
      - toc.integrate
      - navigation.indexes
      - navigation.top
      - search.suggest      

  icon:
    repo: fontawesome/brands/github

markdown_extensions:
    - pymdownx.superfences
    - pymdownx.magiclink
    - smarty
    - meta
    - toc:
        permalink: True
        toc_depth: 2
    - sane_lists
    - admonition
    - footnotes
    - attr_list

edit_uri: edit/dev/docs/
plugins:
  - git-revision-date
  - autorefs
  - table-reader
  - macros
  - tags
  - search:
      lang: en

System information

  • Operating system: Windows 10
  • Browser: Firefox 92.0
@squidfunk
Copy link
Owner

Thanks for the reporting. That looks off indeed. It's weird that it only happens when run from GitHub Actions. Could you please include the Markdown of the page and upload the contents of the search/search_index.json file? That could be a great help in finding and isolating the issue.

@squidfunk squidfunk added the bug Issue reports a bug label Sep 25, 2021
@squidfunk
Copy link
Owner

The project in question is here, correct?
https://github.com/LACMTA/digital-services-team-docs/

@squidfunk
Copy link
Owner

squidfunk commented Sep 25, 2021

Okay, here's what I think is happening:

I took a look at your repository and realized that the search_index.json has the old format, meaning that it was not generated by the new search plugin. The old format includes exactly what you're seeing – the contents of the entire page with permalink markers etc. in the main entry of the page:

Expand
{
  "config":{
    "indexing":"full",
    "lang":[
      "en"
    ],
    "min_search_length":3,
    "prebuild_index":false,
    "separator":"[\\s\\-]+"
  },
  "docs":[
    {
      "location":"",
      "text":"Welcome to LA Metro Digital Design System Docs \u00b6 This repository is an internal resource for the Digital Services Team at Metro. Documentation either does not exist, is outdated, exists on old platforms, or exists across multiple platforms used by different people. There is a need to centralize and keep documentation up-to-date for the entire team to use and reference. Do not put any account credentials or private keys here. Contributing \u00b6 To provide more information on the various Digital Design Projects at LA Metro, people can contribute to this documentation by following these steps: Pre-requisites \u00b6 Software \u00b6 Make sure you have the following installed: - GitSVN - VS Code (or other text based editor) Accounts \u00b6 -GitHub.com account Clone the Repository \u00b6 Open your editor and run in the terminal: git clone https://github.com/albertkun/digital-design-system-docs.git Categories \u00b6 Web Properties Tools/Services (through P-Card or ITS) Processes Shakeup updates Questions \u00b6 Feel free to reach out if you have any questions!",
      "title":"Welcome to LA Metro Digital Design System Docs"
    },
    {
      "location":"#welcome-to-la-metro-digital-design-system-docs",
      "text":"This repository is an internal resource for the Digital Services Team at Metro. Documentation either does not exist, is outdated, exists on old platforms, or exists across multiple platforms used by different people. There is a need to centralize and keep documentation up-to-date for the entire team to use and reference. Do not put any account credentials or private keys here.",
      "title":"Welcome to LA Metro Digital Design System Docs"
    },
    {
      "location":"#contributing",
      "text":"To provide more information on the various Digital Design Projects at LA Metro, people can contribute to this documentation by following these steps:",
      "title":"Contributing"
    },
    {
      "location":"#pre-requisites",
      "text":"",
      "title":"Pre-requisites"
    },
    {
      "location":"#clone-the-repository",
      "text":"Open your editor and run in the terminal: git clone https://github.com/albertkun/digital-design-system-docs.git",
      "title":"Clone the Repository"
    },
    {
      "location":"#categories",
      "text":"Web Properties Tools/Services (through P-Card or ITS) Processes Shakeup updates",
      "title":"Categories"
    },
    {
      "location":"#questions",
      "text":"Feel free to reach out if you have any questions!",
      "title":"Questions"
    }
  ]
} 

This is also (a subset?) of what you've posted in the screenshot – the main entry lists all contents including headlines and permalink markers. My guess is that when you've visited the page, somehow a stale version of the search_index.json was loaded. It means that your browser probably hasn't requested the new search_index.json due to some cache headers telling it not to, but loaded the new JavaScript and CSS sources, because they include content hashes, which the search index doesn't.

I would kindly ask you to verify two things:

  1. Please check that the search_index.json which is loaded in the browser has the correct format. Please see the latest blog post, which explains in detail how the old and new search_index.json differ, and how it should look, and inspect the contents of the search_index.json in the developer tools.

  2. Try to deploy your project to another domain, and see whether the issue persists. If it goes away, it's very likely related to cache headers, because the cache requests will now miss due to the new domain.


On a side note: a few days ago, somebody also reported this on Gitter, but the problem seems to have just gone away because I didn't hear back from him, which also sounds like a caching problem that has solved itself.

@squidfunk squidfunk added needs input Issue needs further input by the reporter and removed bug Issue reports a bug labels Sep 25, 2021
@albertkun
Copy link
Author

The project in question is here, correct?
https://github.com/LACMTA/digital-services-team-docs/

yes, that is correct!

@albertkun
Copy link
Author

Okay, here's what I think is happening:

I took a look at your repository and realized that the search_index.json has the old format, meaning that it was not generated by the new search plugin. The old format includes exactly what you're seeing – the contents of the entire page with permalink markers etc. in the main entry of the page:
Expand

This is also (a subset?) of what you've posted in the screenshot – the main entry lists all contents including headlines and permalink markers. My guess is that when you've visited the page, somehow a stale version of the search_index.json was loaded. It means that your browser probably hasn't requested the new search_index.json due to some cache headers telling it not to, but loaded the new JavaScript and CSS sources, because they include content hashes, which the search index doesn't.

I would kindly ask you to verify two things:

1. Please check that the `search_index.json` which is loaded in the browser has the correct format. Please see the [latest blog post](https://squidfunk.github.io/mkdocs-material/blog/2021/search-better-faster-smaller/), which explains in detail how the old and new `search_index.json` differ, and how it should look, and inspect the contents of the `search_index.json` in the developer tools.

2. Try to deploy your project to another domain, and see whether the issue persists. If it goes away, it's very likely related to cache headers, because the cache requests will now miss due to the new domain.

On a side note: a few days ago, somebody also reported this on Gitter, but the problem seems to have just gone away because I didn't hear back from him, which also sounds like a caching problem that has solved itself.

the search.json has the incorrect format I believe, here is how it looks in the browser:
image

In addition to the link:
https://lacmta.github.io/digital-services-team-docs/search/search_index.json

Regarding deploying to another domain, how should I approach this using GitHub pages? Should I just clone into a new repository?

Thank you for taking the time to look into this!!

@squidfunk
Copy link
Owner

squidfunk commented Sep 26, 2021

Thanks for investigating. Yes, that looks broken indeed. If you compare it to the search_index.json that you get when you do a local mkdocs build, it will be very different. For this reason, the reported behavior can be clearly attributed to a broken deployment or invalid cache headers. As you're using GitHub, the former is more likely to be true, because they use very short durations (just checked: cache-control: max-age=600 for JSON).

As the site folder is part of master branch, it looks like you're using something different from the default behavior of MkDocs. The simplest fix is probably just to use the recommended approach, which will deploy to a dedicated branch called gh-pages. If that is not what you're aiming for, you should probably ask a team member or somebody else in your organization with the right privileges to fix the configuration. A good start is the GitHub Actions workflow from the docs.

I'm closing this issue since this is not an error that is related to Material for MkDocs, but to a deployment misconfiguration.


Regarding deploying to another domain, how should I approach this using GitHub pages? Should I just clone into a new repository?

You could fork the repository and deploy it again, yes.

@albertkun
Copy link
Author

albertkun commented Sep 28, 2021

@squidfunk hmm, working on this a little more... I'm still unable to resolve this, even by forking or using the default github actions... I was able to resolve this by running mkdocs gh-deploy --force locally...

However, this solution is not ideal since we would like to run github actions to build it.. Here's the current github actions .yml file:

name: Deploy Documentation 
on:
  push:
    branches: 
      - main
jobs:
  deploy:
    runs-on: ubuntu-latest
    if: github.event.repository.fork == false
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: 3.x
      - run: pip install git+https://${GH_TOKEN}@github.com/LACMTA/mkdocs-material-insiders.git
      - run: mkdocs gh-deploy --force

env:
  GH_TOKEN: ${{secrets.ALBERT_KEY}}          
  METRO_DOCS_PASS: ${{secrets.METRO_DOCS_PASS}}

https://github.com/LACMTA/digital-services-team-docs/blob/dev/.github/workflows/main.yml

Any other insights would be appreciated!! Thank you!

@squidfunk squidfunk reopened this Sep 28, 2021
@squidfunk squidfunk added bug Issue reports a bug and removed needs input Issue needs further input by the reporter labels Sep 28, 2021
@squidfunk
Copy link
Owner

Okay, so I can finally reproduce this issue and think I've found the cause of the problem.

Insight and observations

  1. Insiders provides its own search plugin implementation. It exposes the plugin via entry_points with the same name, because it is a full replacement for MkDocs built-in search plugin. In fact, the built-in search plugin causes the behavior that can be seen (full page content in first result), which means that since Insiders 3.0 it is essentially not supported anymore, as it is superseded by the custom implementation.

  2. Exporting a plugin with the same name works in most, but not call cases. I haven't found out why it sometimes works and why it doesn't. The problem manifests in the get_plugins function in MkDocs core, which loads the plugin entry points via importlib_metadata. On my system, importlib_metadata.entry_points() returns:

    [
      EntryPoint(name='search', value='mkdocs.contrib.search:SearchPlugin', group='mkdocs.plugins'),
      EntryPoint(name='search', value='material.plugins.search.plugin:SearchPlugin', group='mkdocs.plugins'),
      EntryPoint(name='social', value='material.plugins.social.plugin:SocialPlugin', group='mkdocs.plugins'),
      EntryPoint(name='tags', value='material.plugins.tags.plugin:TagsPlugin', group='mkdocs.plugins')
    ]
    

    Since the built-in plugin is loaded first, it is overwritten by the custom search plugin. However, I've managed to reproduce the described error in GitHub Actions, where importlib_metadata.entry_points() returns:

    [
      EntryPoint(name='search', value='material.plugins.search.plugin:SearchPlugin', group='mkdocs.plugins'),
      EntryPoint(name='social', value='material.plugins.social.plugin:SocialPlugin', group='mkdocs.plugins'),
      EntryPoint(name='tags', value='material.plugins.tags.plugin:TagsPlugin', group='mkdocs.plugins'),
      EntryPoint(name='search', value='mkdocs.contrib.search:SearchPlugin', group='mkdocs.plugins')
    ]
    

    This means that the built-in search plugin takes precedence. I don't yet understand why, it seems unpredictable.

  3. Why renaming the search plugin is not a good option. So, we could just rename the search plugin or prefix it, right? Theoretically, yes. However, this would probably lead to more confusion, because the search plugin is enabled by default, so users now always have to explicitly list the new search plugin. It would also mean that some templates might break, because we're explicitly checking for the search plugin in some places. Several users override the header partial, so we would have to carry out a major release, with the potential for a lot of issues rolling in, since search is so widely used.

Workaround

As a temporary workaround, this installation method seems to work reliably:

git clone --depth 1 https://${GH_TOKEN}@github.com/squidfunk/mkdocs-material-insiders.git
pip install -e mkdocs-material-insiders

Mitigations

The best idea I could think of would be to patch this function in MkDocs to effectively allow for overriding the search plugin. Personally, I believe this is a sound case. After all, the MkDocs team always advertised that the built-in search plugin is only meant for demonstration purposes, and they've always encouraged users to provide alternate implementations. Here we have an alternate implementation that is superior in many ways.

The necessary patch would be:

def get_plugins():
    """ Return a dict of all installed Plugins as {name: EntryPoint}. """

    plugins = importlib_metadata.entry_points(group='mkdocs.plugins')

    pluginmap = {}
    for plugin in plugins:
        if plugin.name in pluginmap and "mkdocs.contrib" in plugin.value:
            continue

        pluginmap[plugin.name] = plugin

    return pluginmap

@oprypin, @facelessuser, @ultrabug – looping you in for a constructive discussion on this topic. If there's any other possibility without patching core I'm happy to discuss. Nonetheless, I'd expect the proposed patch to not break anything, and to be sound in terms of community-contributed plugins.

@albertkun
Copy link
Author

@squidfunk Thank you so much for looking into this so detailed and providing a work around/solution!

I can confirm that this workaround works! ✔️

git clone --depth 1 https://${GH_TOKEN}@github.com/squidfunk/mkdocs-material-insiders.git
pip install -e mkdocs-material-insiders

I'll continue to monitor this thread to see if/when the patch becomes available!

@facelessuser
Copy link
Contributor

@squidfunk I don't see any immediate concerns with your approach. It seems that allows overrides of mkdocs plugins specifically, which seems okay. If mkdocs plugins come first, they'll be overridden by user plugins, and if user plugins come first, they will take priority over mkdocs plugins. I don't see anything wrong with it.

I've seen it mentioned that if people don't like the search they can write their own plugin, is there really no way to have a search plugin identified the same way in templates unless they override the MkDocs search? If so, then I think this is a fine workaround. If not, then I'd love to hear how it can be done. I always assumed that if you wrote your own search, it would work as a drop-in replacement, if that isn't how it works, then it seems a reasonable request to MkDocs to make it so.

@squidfunk
Copy link
Owner

@facelessuser thanks for your take on this!

I've seen it mentioned that if people don't like the search they can write their own plugin, is there really no way to have a search plugin identified the same way in templates unless they override the MkDocs search?

The search plugin is just a plugin, so the only way to identify it in templates is by its name. I know of no other reliable way. For example, there is the MkDocs localsearch plugin, which post-processes the search index built by the search plugin and must come after it in mkdocs.yml. Luckily, as it relies on the search plugin, the template must not be adjusted. However, a plugin that replaces the search plugin would need to be accounted for in the template, unless my proposal would be merged.

I always assumed that if you wrote your own search, it would work as a drop-in replacement, if that isn't how it works, then it seems a reasonable request to MkDocs to make it so.

I will open a PR for this, which can be used as a further basis for discussion. I really hope for this to make its way into master, because it would make migration a breeze 😊

@squidfunk
Copy link
Owner

squidfunk commented Oct 1, 2021

I've opened a pull request in mkdocs/mkdocs#2591 with the necessary changes. I've also updated the OP to list the workaround to new users experiencing this issue.

@squidfunk squidfunk added the resolved Issue is resolved, yet unreleased if open label Oct 1, 2021
@oprypin
Copy link
Contributor

oprypin commented Oct 6, 2021

This suggested change to MkDocs makes sense to me and will be approved.

I still have to comment that I don't like that what you're developing totally bypasses upstream. Your expertise could've been so useful, but instead we're gathered here to solve an issue that exists just because this is not upstreamed. I mean, how can you brag that it's so much better when you never offered any of that to upstream and it's paywalled. Anyway, I'll also admit it's not as simple as I make it sound, there are arguments to your decision, even good technical arguments, but yeah...

@squidfunk
Copy link
Owner

@oprypin thanks for your review of mkdocs/mkdocs#2591!

I still have to comment that I don't like that what you're developing totally bypasses upstream. Your expertise could've been so useful, but instead we're gathered here to solve an issue that exists just because this is not upstreamed. I mean, how can you brag that it's so much better when you never offered any of that to upstream and it's paywalled. Anyway, I'll also admit it's not as simple as I make it sound, there are arguments to your decision, even good technical arguments, but yeah...

I understand that it looks like it makes sense to upstream it. Let me explain why I believe that at this point in time, it doesn't:

  • Breaking change – all themes (not only the built-in ones, but all third-party themes) would need to adapt to this change if this change would be upstreamed, i.e., when the built-in search plugin would be replaced by the new implementation. This is why Insiders also had a major release. I believe it makes more sense to first release it as a sister plugin of Material for MkDocs, fix any issues we discover on the way, and then at a later point check if this change can be applied to all themes.

    Note that I needed to patch the following components to make it work:

    • MkDocs' search plugin – as noted, I did a complete rewrite of the indexing strategy
    • Lunr.js' tokenizer – also, a complete rewrite, since we're now indexing HTML, not plain text
    • Material for MkDocs search implementation – a rewrite of the indexing strategy and how search results are resolved

    The fact that we're now indexing HTML and not text implied the Lunr.js tokenizer rewrite, which a theme must correctly anticipate and handle. From what I've learned on the way, switching over to indexing HTML instead of text is not a boolean decision – it needs to be carefully accounted for. Yes, we could also try to upstream the tokenizer to Lunr.js, but same as for MkDocs – too much potential breakage, and probably too opinionated.

    Actually, I've wanted to write another blog article how indexing and highlighting now works from a theme perspective, as I think it's quite interesting from a technical standpoint. I'll do that when I find some time.

  • Faster iterations – Historically, MkDocs has had a rather slow and conservative release cycle, which I think is good in general, as it more or less guarantees that the core is stable. Unfortunately, this doesn't foster innovation to happen. Luckily, the plugin system is so well designed that innovation can happen inside plugins, which allows for faster iteration. I decided to make this plugin part of Insiders because it took significant effort to pull this off, and I believe that it is better to first test a change of this magnitude in a constrained environment before it is made generally available.

    Furthermore, if this would be upstreamed or released as a standalone plugin, I, personally, would also have to support compatibility with other third-party themes which I currently can't commit to. Material for MkDocs had a custom search implementation since its first public release because IMHO, the UX of the mkdocs and readthedocs themes is not that great. The new implementation would need to be factored out and abstracted into something that can be re-integrated into Material for MkDocs and other third-party themes. I don't have the time to do it, and given the demographics of this project, I don't see how there are many front-end developers who will be willing to invest their spare time to pull this off. In my experience, front-end developers rather use Docusaurus, Gatsby, or Vuepress.


Now, regarding the "bragging": It was not my intention to brag. If that's what you get from the blog article, I'd be curious to learn what exactly you read as bragging, because I tried to make it as scientific and data-driven as possible. I also explained the rationale behind the changes, diving into the details on how search is currently working, and how I believe it could be improved. My intention was to show the huge potential that lies in rethinking this topic. If you wish to give constructive criticism, ping me at martin.donath@squidfunk.com (or on Gitter), as I don't want to hijack this thread for something offtopic.

Also, I get it: you don't like what I'm doing with Insiders. We've had a previous discussion on this regarding your section index plugin. But please, understand: I'm not forcing anybody to use Insiders. It's entirely optional. Even without Insiders, this theme is probably the most feature-rich in the MkDocs ecosystem and there's nothing that forces users into Insiders. Everything is either an improved drop-in replacement or behind a feature flag.

Additionally, I'm doing the best I can to triage issues quickly, answer discussions, etc. I'm doing many bug fixes (check the commit history and last releases) on this repository and add new features (hello new content tabs!) which I don't put into Insiders, behind a – as you say – "paywall". Please take a look at the list of features that were only possible to develop with the financial support of my sponsors, because otherwise, I wouldn't have the time. Also, you're using at least some previously exclusive features on your documentation projects, so you're actually benefiting from it.

Again, if you feel there's something to be discussed, please ping me (email above). I'm happy to discuss any concerns.

@oprypin
Copy link
Contributor

oprypin commented Oct 7, 2021

Thanks for the detailed response. Indeed these are solid arguments.

@squidfunk
Copy link
Owner

squidfunk commented Oct 7, 2021

@albertkun a920222 switches the MkDocs dependency over to the commit that fixes the issue with the non-determinism. If you reinstall Insiders cleanly, the issue should be fixed, so the workaround is no longer needed.

@squidfunk
Copy link
Owner

I just released 7.3.3-insiders-3.1.3, bumping the minimum MkDocs version to 1.2.3 which contains the fix. Updating to this version resolves this issue, which means it can be closed. Thanks to all of you for helping to work towards a solution!

@feasgal
Copy link

feasgal commented Apr 15, 2022

I started getting this behavior the minute I installed Insiders, less than a month ago. Today I found this thread, cleared my browser cache, and uninstalled mkdocs and material and reinstalled them, a clean install of mkdocs version 1.3.0 and mkdocs-material-8.2.8+insiders.4.12.0.

None of this fixed the problem. I still have full-page search results in both mkdocs serve and mkdocs build.

Here is what I got before Insiders:
image

And here is what I get now:
image

I thought it could have something to do with the offline plugin, but I get the same full-page results, in both the serve and the build, when I disable it.

Help? I can't share my content for business security reasons, but can answer any questions you might have and supply the mkdocs.yml privately if needed.

@squidfunk
Copy link
Owner

Thanks for reporting. In order to look into it, I'm going to need a reproduction, i.e. the Markdown that makes up the page that is being returned in its entirety (like you're saying). I don't need the content – you can replace it with bogus. If you manage to create a reproduction, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue reports a bug resolved Issue is resolved, yet unreleased if open
Projects
None yet
Development

No branches or pull requests

6 participants
@oprypin @squidfunk @facelessuser @albertkun @feasgal and others