HTML Search: omit anchor reference from document titles in the search index. #12047

jayaddison · 2024-03-04T11:28:54Z

Feature or Bugfix

Bugfix

Purpose

Adjusts the contents of the search-index provided to the client so that the browser correctly de-duplicates search results that link to the top-level title in documents.

Detail

Adapts the transformation of docutils nodes.title nodes into the title index by omitting their id that is used as the hyperlink anchor in the special-case of the document title.
Note: there is no explicit markup for the document title in reStructuredText.

Relates

Resolves HTML Search: Contains duplicates based on title and content search #11961.
Alternative approach to client-side fix in HTML Search: Fix duplicate results #11942.

… title

Note: reStructuredText document titles are implicit; ref: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#document

…alues

…n headings may exist

…anchor values" This reverts commit d44dc11.

wlach · 2024-03-09T15:37:45Z

This seems preferable to my solution in #11942, just tested and it has the same results while making the search index smaller. 👍 from me.

picnixz · 2024-03-15T14:56:40Z

Fine for me (I'm not an expert in that search-related aspect so I'll leave it to you guys). Should I understand that this PR would replace #11942 ? @jayaddison ping me when you want it to be merged

jayaddison · 2024-03-15T15:56:51Z

Fine for me (I'm not an expert in that search-related aspect so I'll leave it to you guys). Should I understand that this PR would replace #11942 ?

That's correct. I think there are three things that I don't like about this pull request, in order of priority:

It doesn't offer JavaScript test coverage to demonstrate the fix -- only Python test coverage that confirms the expected index format.
I haven't been able to think of a way that this would happen yet, but in theory I expect some downstream tools that use the current index format could be broken by this - in particular the possibility of null values where previously everything was a string.
Related to (2), the index format isn't as small as it could be. In raw text, the JavaScript empty string can be represented by either '' or "" (two characters), whereas a null value is represented by the token null (four characters). It may not matter a lot, but over a long enough duration of time the cumulative cost could be significant (or not! maybe it's a waste of time considering it).

jayaddison · 2024-03-15T16:05:53Z

It doesn't offer JavaScript test coverage to demonstrate the fix -- only Python test coverage that confirms the expected index format.

I think I'll begin work on a small JavaScript refactor PR to make test coverage easier to add.

picnixz · 2024-03-15T16:09:35Z

particular the possibility of null values where previously everything was a string.

Is it possible to keep a string or is the null type needed?

jayaddison · 2024-03-15T16:17:11Z

particular the possibility of null values where previously everything was a string.

Is it possible to keep a string or is the null type needed?

If I remember correctly, this conditional needs to be adjusted if we're using empty strings, but otherwise it should be fine.

What I would really like is a way to generate the indexes used in the JavaScript tests from the same Python code that builds searchindex.js when projects are built. It feels fragile that the test indexes are written by hand.

jayaddison · 2024-03-15T16:30:12Z

What I would really like is a way to generate the indexes used in the JavaScript tests from the same Python code that builds searchindex.js when projects are built. It feels fragile that the test indexes are written by hand.

Moved into #12099.

jayaddison · 2024-03-16T11:08:43Z

I'd like to check whether we can get #12102 in place before progressing this pull request further. If that can be added, then I think adding test coverage here will be much easier and more reliable (I'll be able to create a sample Sphinx project that returns duplicate search results, and add test coverage against that).

jayaddison · 2024-03-16T11:09:47Z

(and maybe do the null to '' refactoring at the same time and more safely, given the test coverage)

picnixz · 2024-03-17T10:07:03Z

Removing the "awaiting review" label until this PR is ready

jayaddison · 2024-03-17T11:11:23Z

Removing the "awaiting review" label until this PR is ready

Oops, thanks. I forgot about that.

Conflicts: CHANGES.rst tests/test_search.py

jayaddison · 2024-05-09T22:26:17Z

While I think it would be sensible to merge #12102 first (if-and-when that pull request is considered acceptable!), I'm going to remove the blocker label and draft status from this pull request.

jayaddison added 2 commits March 4, 2024 01:28

searchindex: omit the section reference when indexing each document's…

e951f65

… title

refactor: move the fix to the wordcollector node visitor

5e05a27

Note: reStructuredText document titles are implicit; ref: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#document

jayaddison mentioned this pull request Mar 4, 2024

HTML Search: Fix duplicate results #11942

Closed

jayaddison added 3 commits March 4, 2024 12:42

typing: emit empty-string instead of null for document title anchor v…

d44dc11

…alues

tests: add failing test to catch a regression; other top-level sectio…

a978110

…n headings may exist

fixup: only omit the anchor for the first-encountered title

ed47ab9

jayaddison added the html search label Mar 4, 2024

jayaddison added 2 commits March 4, 2024 13:59

Revert "typing: emit empty-string instead of null for document title …

26d6e3c

…anchor values" This reverts commit d44dc11.

linting: add ignore-E501 to overlong line

5c75c81

jayaddison changed the title ~~Draft: [HTML search] omit anchor reference from document titles in the search index.~~ [HTML search] omit anchor reference from document titles in the search index. Mar 4, 2024

jayaddison marked this pull request as ready for review March 4, 2024 14:03

jayaddison requested a review from jakobandersen March 6, 2024 01:21

jayaddison changed the title ~~[HTML search] omit anchor reference from document titles in the search index.~~ HTML Search: omit anchor reference from document titles in the search index. Mar 14, 2024

jayaddison added type:bug awaiting:response Waiting for a response from the author of this issue awaiting:review PR waiting for a review by a maintainer. and removed awaiting:response Waiting for a response from the author of this issue labels Mar 14, 2024

jayaddison added 2 commits March 15, 2024 13:58

Merge branch 'master' into issue-11961/searchindex-omit-doctitle-anchors

3c557cd

Add CHANGES.rst entry

b8177c5

jayaddison added the DO NOT MERGE label Mar 16, 2024

jayaddison marked this pull request as draft March 16, 2024 11:08

jayaddison mentioned this pull request Mar 16, 2024

[search] Refactor: make the search code more testable. #12107

Closed

picnixz removed the awaiting:review PR waiting for a review by a maintainer. label Mar 17, 2024

Merge branch 'master' into issue-11961/searchindex-omit-doctitle-anchors

d3a40d8

jayaddison mentioned this pull request Apr 9, 2024

7.3.0 release plan #12242

Closed

Merge branch 'master' into issue-11961/searchindex-omit-doctitle-anchors

f794764

Conflicts: CHANGES.rst tests/test_search.py

jayaddison mentioned this pull request Apr 25, 2024

[tests] JavaScript: extract searchindex.js-format test fixtures. #12102

Open

jayaddison removed the DO NOT MERGE label May 9, 2024

jayaddison marked this pull request as ready for review May 9, 2024 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML Search: omit anchor reference from document titles in the search index. #12047

HTML Search: omit anchor reference from document titles in the search index. #12047

jayaddison commented Mar 4, 2024

wlach commented Mar 9, 2024

picnixz commented Mar 15, 2024

jayaddison commented Mar 15, 2024

jayaddison commented Mar 15, 2024

picnixz commented Mar 15, 2024

jayaddison commented Mar 15, 2024

jayaddison commented Mar 15, 2024

jayaddison commented Mar 16, 2024

jayaddison commented Mar 16, 2024

picnixz commented Mar 17, 2024

jayaddison commented Mar 17, 2024

jayaddison commented May 9, 2024

HTML Search: omit anchor reference from document titles in the search index. #12047

Are you sure you want to change the base?

HTML Search: omit anchor reference from document titles in the search index. #12047

Conversation

jayaddison commented Mar 4, 2024

Feature or Bugfix

Purpose

Detail

Relates

wlach commented Mar 9, 2024

picnixz commented Mar 15, 2024

jayaddison commented Mar 15, 2024

jayaddison commented Mar 15, 2024

picnixz commented Mar 15, 2024

jayaddison commented Mar 15, 2024

jayaddison commented Mar 15, 2024

jayaddison commented Mar 16, 2024

jayaddison commented Mar 16, 2024

picnixz commented Mar 17, 2024

jayaddison commented Mar 17, 2024

jayaddison commented May 9, 2024