Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: support searching for (sub)titles #10717

Merged
merged 4 commits into from Sep 9, 2022

Conversation

marxin
Copy link
Contributor

@marxin marxin commented Jul 28, 2022

Collect all titles from all pages and utilize a full match
(case insensitive) in Search page.

Fixes: #10689

@marxin marxin force-pushed the search-support-title-search branch 2 times, most recently from 4684bcb to dfba5c3 Compare July 28, 2022 10:03
@marxin marxin marked this pull request as ready for review July 28, 2022 10:15
@AA-Turner
Copy link
Member

Does this work for partial matches?

A

@marxin marxin force-pushed the search-support-title-search branch 2 times, most recently from b66cda8 to 9d23ee2 Compare August 1, 2022 06:01
@marxin
Copy link
Contributor Author

marxin commented Aug 1, 2022

Does this work for partial matches?

Yes, I've just added partial match support.

@marxin marxin changed the title search: support exact match searching for titles search: support searching for (sub)titles Aug 1, 2022
@marxin marxin force-pushed the search-support-title-search branch from 9d23ee2 to 180f91a Compare August 8, 2022 09:14
@AA-Turner
Copy link
Member

Please add tests and a CHANGES entry.

A

@marxin marxin force-pushed the search-support-title-search branch from 180f91a to a5f07bb Compare August 9, 2022 12:08
@marxin
Copy link
Contributor Author

marxin commented Aug 9, 2022

I've just added CHANGES, but for a working test, I would need an HTML inspection of rendered search.html page.
Do we have such tests?

@marxin
Copy link
Contributor Author

marxin commented Aug 10, 2022

As an improvement, it should be possible to emit a link directly to the title in a page.
But I can't find a function that would give me for a nodes.title Node an anchor:

{'rawsource': 'Demo documentation', 'children': [<#text: 'Demo documentation'>], 'attributes': {'ids': [], 'classes': [], 'names': [], 'dupnames': [], 'backrefs': []}, 'tagname': 'title', 'parent': <section "demo documentation": <title...><compound...><paragraph...><substitution_defin ...>, '_document': <document: <substitution_definition "gol"...><section "demo documen ...>, 'source': '/home/marxin/Programming/texi2rst-generated/sphinx/demo/index.rst', 'line': 2}

Can you please help me?

@jbms
Copy link
Contributor

jbms commented Sep 1, 2022

I think you may need to look at the parent section element to find the ids.

@jbms
Copy link
Contributor

jbms commented Sep 1, 2022

Note: In https://github.com/jbms/sphinx-immaterial I have implemented something similar entirely client side, but there are a few differences:

  • The sub-sections are parsed from the HTML document itself, while extracting snippets.
  • If the search text is found within the page, the result link is to the nearest containing section.

This is the source code of my implementation, for reference:
https://github.com/jbms/sphinx-immaterial/blob/main/src/assets/javascripts/sphinx_search.ts

In general handling this when building the index is probably better, though given that the HTML must be parsed anyway to handle the snippets I'm not sure.

@marxin
Copy link
Contributor Author

marxin commented Sep 7, 2022

In general handling this when building the index is probably better, though given that the HTML must be parsed anyway to handle the snippets I'm not sure.

Yes, I do prefer the server side implementation and I'm still curious about the title links as mentioned in my previous comment. One should be able to get a link to them.

@AA-Turner
Copy link
Member

One should be able to get a link to them.

Something like node.parent["names"] should work? The anchor link is on the docutils.nodes.section node as I recall.

A

@marxin
Copy link
Contributor Author

marxin commented Sep 7, 2022

Something like node.parent["names"] should work? The anchor link is on the docutils.nodes.section node as I recall.

Yep, that almost works:

diff --git a/sphinx/search/__init__.py b/sphinx/search/__init__.py
index bbb28c0b9..7916d26f0 100644
--- a/sphinx/search/__init__.py
+++ b/sphinx/search/__init__.py
@@ -216,6 +216,8 @@ class WordCollector(nodes.NodeVisitor):
         elif isinstance(node, nodes.title):
             title = node.astext()
             self.found_titles.append(title)
+            print('node:', node)
+            print('node.parent[names]:', node.parent['names'])
             self.found_title_words.extend(self.lang.split(title))
         elif isinstance(node, Element) and self.is_meta_keywords(node):
             keywords = node['content']

emits something like:

node: <title>Comparison of GCC docs in Texinfo and Sphinx</title>
node.parent[names]: ['comparison of gcc docs in texinfo and sphinx']
node: <title>HTML output</title>
node.parent[names]: ['html output']
node: <title>Formatting</title>
node.parent[names]: ['formatting']

So the last missing piece is probably an escaping that will emit e.g. comparison-of-gcc-docs-in-texinfo-and-sphinx?

@AA-Turner
Copy link
Member

Ahh, can you try ["ids"]?

A

@marxin
Copy link
Contributor Author

marxin commented Sep 7, 2022

Ahh, can you try ["ids"]?

Works for me, added that.

@marxin marxin force-pushed the search-support-title-search branch 2 times, most recently from b160baa to 3f7d25c Compare September 7, 2022 18:19
@marxin
Copy link
Contributor Author

marxin commented Sep 8, 2022

Can you please @AA-Turner review the pull request now?

@AA-Turner AA-Turner added type:enhancement enhance or introduce a new feature html search labels Sep 8, 2022
@AA-Turner AA-Turner added this to the 5.2.0 milestone Sep 8, 2022
@AA-Turner
Copy link
Member

Something seems to be wrong:

https://sphinx--10717.org.readthedocs.build/en/10717/search.html?q=More+topics+to+be+covered

https://www.sphinx-doc.org/en/master/search.html?q=More+topics+to+be+covered

The PR only shows 5 results, and doesn't highlight the title, whereas the current master shows the title, albeit as the third result.

A

@marxin
Copy link
Contributor Author

marxin commented Sep 8, 2022

The PR only shows 5 results, and doesn't highlight the title, whereas the current master shows the title, albeit as the third result.

Yeah, it's a fancy feature of Read the Docs, it must be a plug-in that is used for Sphinx docs.

Please compare it with another pull request:
https://sphinx--10807.org.readthedocs.build/en/10807/search.html?q=More+topics+to+be+covered

Collect all titles from all pages and utilize a contains match
(case insensitive) in Search page.

Fixes: sphinx-doc#10689
@AA-Turner
Copy link
Member

Rebased.

A

CHANGES Outdated Show resolved Hide resolved
@AA-Turner
Copy link
Member

If running an incremental build, searchindex.js is only updated after loading it, so we need to bump the environment version.

A

@AA-Turner AA-Turner merged commit 7da60f2 into sphinx-doc:5.x Sep 9, 2022
@AA-Turner
Copy link
Member

Thanks @marxin!

A

@marxin
Copy link
Contributor Author

marxin commented Sep 9, 2022

Thanks for merging that!

@marxin marxin deleted the search-support-title-search branch September 9, 2022 10:29
@larsoner larsoner mentioned this pull request Sep 22, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
html search type:enhancement enhance or introduce a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

search: increase priority for full (sub)title match when searching
3 participants