3033 Introduced V4 Opinion Search API #4007

albertisfu · 2024-05-02T02:10:34Z

This PR adds the Opinions o search type to the V4 Search API.

It includes the same new V4 features as the RECAP search type described in #3975, with some variations detailed below.

The results structure is as follows:

{
            "absolute_url": "/opinion/1243/howard-v-honda/",
            "attorney": "a bunch of crooks!",
            "caseName": "Howard v. Honda",
            "caseNameFull": "Harvey Howard v. Antonin Honda",
            "citation": [
                "22 AL 339",
                "33 state 1",
                "1 Yeates 1",
                "56 F.2d 9"
            ],
            "citeCount": 6,
            "cluster_id": 1243,
            "court": "Testing Supreme Court",
            "court_citation_string": "Test",
            "court_id": "test",
            "dateArgued": "2015-08-15",
            "dateFiled": "1895-06-09",
            "dateReargued": null,
            "dateReargumentDenied": "2015-08-15",
            "date_created": "2024-05-15T00:27:01.293279Z",
            "docketNumber": "docket number 2",
            "docket_id": 1981,
            "judge": "David",
            "lexisCite": "",
            "neutralCite": "22 AL 339",
            "non_participating_judge_ids": [],
            "opinions": [
                {
                    "author_id": 1559,
                    "cites": [
                        1823
                    ],
                    "date_created": "2024-05-15T00:27:01.293279Z",
                    "download_url": null,
                    "id": 1822,
                    "joined_by_ids": [
                        1559
                    ],
                    "local_path": "test/search/opinion_pdf_image_based.pdf",
                    "per_curiam": false,
                    "sha1": "8c99509631108e5909f322258da042f8713afe1d",
                    "snippet": "",
                    "timestamp": "2024-05-15T00:27:01.293279Z",
                    "type": "combined-opinion"
                }
            ],
            "panel_ids": [],
            "panel_names": [],
            "posture": "",
            "procedural_history": "some rando history",
            "scdb_id": "",
            "sibling_ids": [
                1822
            ],
            "source": "C",
            "status": "Published",
            "suitNature": "copyright",
            "syllabus": "some rando syllabus",
            "timestamp": "2024-05-15T00:27:01.293279Z"
        }

At the first level, OpinionCluster fields are displayed. Within the opinions key, Opinions matching the query are shown. Up to 5 matched nested opinions are displayed per result; this setting is defined by CHILD_HITS_PER_RESULT.

In the frontend, we don't have a button to display if more than 5 opinions are matched by the query.
Therefore, my question is whether a more_docs field, similar to the one in the V4 RECAP Search API, is necessary when there are more than 5 Opinions matched?
Perhaps it doesn't make sense since we don't have an op type that users could use to query all Opinions matched by a query.

Count

It only has the count that matches OpinionCluster and also relies on the cardinality query to get the approximate count when hits exceed 10,000.

Sorting

The supported sorting keys for Opinions are the same as those in the frontend:

"score desc"
"dateFiled desc"
"dateFiled asc"
"citeCount desc"
"citeCount asc"

To support cursor pagination, the secondary sorting key is cluster_id desc.

One difference to note regarding sorting from the RECAP search type is that in Opinions, dateFiled and citeCount do not require the use of a custom function score as a workaround for score computation and search_after on None values. This is because date_filed is a mandatory field in the OpinionCluster model and citation_count defaults to 0 in the model.

Thus, sorting directly relies on the values returned by ES, avoiding the use of the custom function score.

Highlighting

As in the RECAP search type, highlighting is disabled by default and can be enabled by passing highlight=on.
The supported HL fields are the same as in the frontend:

caseName
citation
suitNature
court_citation_string
docketNumber
text (snippet)

When highlighting is disabled, the snippet is retrieved from the DB similar to RECAP. However, for Opinions, it is a bit more complex, as the text field during indexing can be filled with different values according to their availability and prioritization, as follows:

html_columbia
html_lawbox
xml_harvard
html_anon_2020
html
plain_text

So the same prioritization is used within the merge_unavailable_fields_on_parent_document method to extract the snippet from the DB, up to NO_MATCH_HL_SIZE characters. It uses a single query per page relying on Case When queries' conditional expressions.

semgrep-app · 2024-05-02T02:11:52Z

Semgrep found 4 avoid-query-set-extra findings:

cl/search/api_utils.py
- L343-345 - Triage
cl/lib/elasticsearch_utils.py
- L2703 - Triage
- L2728 - Triage
- L2763 - Triage

QuerySet.extra' does not provide safeguards against SQL injection and requires very careful use. SQL injection can lead to critical data being stolen by attackers. Instead of using '.extra', use the Django ORM and parameterized queries such as People.objects.get(name='Bob').

_{Ignore this finding from avoid-query-set-extra.}

- Also merge the snippet content from DB when highlighting is disabled in the API request. - Included more V4 Opinions Search API

mlissner · 2024-05-03T21:41:50Z

Therefore, my question is whether a more_docs field, similar to the one in the V4 RECAP Search API, is necessary when there are more than 5 Opinions matched?

This will always be limited to a few different opinions, so I'd say that both the API and the front end should always show all of them. If you set it to 20, items, that'd surely be enough.

The rest sounds perfect!

albertisfu · 2024-05-03T23:16:07Z

This will always be limited to a few different opinions, so I'd say that both the API and the front end should always show all of them. If you set it to 20, items, that'd surely be enough.

while working on this, I got one question: Do you mean a cluster should always show all their opinions (up to 20) regardless of whether they were matched by the search query?

Or should it should show only the opinions that matched a query (up to 20)?

Currently, only positions that match are displayed in the fronted. If users perform a match-all query or query by cluster fields, the cluster will show "all" the opinions up to 5.
However, if a user queries by an opinion field, such as the text field, only the opinions that match the query are shown within the cluster.

mlissner · 2024-05-03T23:24:18Z

Do you mean a cluster should always show all their opinions (up to 20) regardless of whether they were matched by the search query?

Ideally, only the opinions that match should show in the results. If a cluster matches (but the opinion doesn't), then showing all or none of the sub-opinions seems fine. Probably best in that case to not show any opinion at all.

albertisfu · 2024-05-04T00:01:04Z

Ideally, only the opinions that match should show in the results.

Yeah, this is how it currently works.

If a cluster matches (but the opinion doesn't), then showing all or none of the sub-opinions seems fine. Probably best in that case to not show any opinion at all.

Well, due to the cluster fields (except for non_participating_judge_ids and source, but they're not searchable) being indexed into the sub-opinions, every time a cluster matches, at least one sub-opinion will also be matched. The only scenario where a cluster can be matched without matching a sub-opinion is if the cluster doesn't have any sub-opinion.

So, I believe the remaining option is to display all the sub-opinions when the query involves only cluster fields (this will happen automatically) or a match-all query.

mlissner · 2024-05-04T00:03:13Z

Displaying all the subopinions in that case is fine too!

…to 20 in search results.

albertisfu · 2024-05-06T16:40:36Z

Displaying all the subopinions in that case is fine too!

Great, I've set the limit for sub-opinions to 20. Hoping this limit is enough to display all the possible sub-opinions when they all match in a query. This applies to both the frontend and the API.

mlissner · 2024-05-14T17:06:57Z

Looks like we have some conflicts here, @alberto. Want to get them cleaned up, and then I think we're good to have Eduardo review, right?

…n Search API

albertisfu · 2024-05-14T20:01:16Z

Sure, I've resolved the conflicts and added the meta key to the Opinions serializers as well. So this is now ready for review!

…objects.

ERosendo · 2024-05-14T23:04:49Z

cl/lib/elasticsearch_utils.py

+        and cd["type"]
+        in [
+            SEARCH_TYPES.RECAP,
+            SEARCH_TYPES.DOCKETS,
+            SEARCH_TYPES.RECAP_DOCUMENT,
+        ]


Let's refactor this code to store the membership check in a boolean variable. We can call it is_recap_search and reuse it in both if statements.

def build_sort_results: ... is_recap_search = cd["type"] in [ SEARCH_TYPES.RECAP, SEARCH_TYPES.DOCKETS, SEARCH_TYPES.RECAP_DOCUMENT, ] if api_version == "v4" and is_recap_search: ... if ( toggle_sorting and api_version == "v4" and is_recap_search ): ...

Great! I've applied the suggestion and named the variable: require_v4_function_score since it also includes PEOPLE in #4021

ERosendo · 2024-05-14T23:25:21Z

cl/lib/elasticsearch_utils.py

+                .annotate(
+                    text_to_show=Case(
+                        When(
+                            ~QObject(html_columbia__exact=""),


We don't need to add the __exact lookup. According to the documentation, it is assumed to be exact if you don’t provide a lookup type.

thanks I've removed __exact from the lookups

ERosendo · 2024-05-15T17:59:10Z

cl/lib/elasticsearch_utils.py

    if search_type not in [SEARCH_TYPES.RECAP, SEARCH_TYPES.DOCKETS]:
-        return frontend_hits_limit, query_hits_limit
+        return display_hits_limit, query_hits_limit

    if search_type == SEARCH_TYPES.DOCKETS:
-        frontend_hits_limit = 1
+        display_hits_limit = 1


We should refactor these if statements into the pattern matching block.

ERosendo · 2024-05-15T18:33:14Z

cl/search/api_views.py

    def list(self, request, *args, **kwargs):
        search_form = SearchForm(request.GET, is_es_form=True)
        if search_form.is_valid():
            cd = search_form.cleaned_data
            search_type = cd["type"]
+            search_query = self.document_search_classes[search_type].search()


I like this approach, but there's potential overlap with the pattern matching block starting at line 276. Both use elements from the document_search_classes dictionary.

Considering this, should we handle a potential KeyError exception here? Line 293 adds a case _clause to the pattern matching, but a simple dictionary like document_search_classes won't handle unexpected keys.

Yeah, you're right. There was a potential KeyError for types that are not supported yet, like pa and oa. I've refactored the code and centralized the supported types in a dictionary called supported_search_types, raising the unsupported error earlier. Let me know what you think.

ERosendo

LGTM. 👍 We can merge this code after addressing the comments

albertisfu · 2024-05-16T15:43:47Z

thanks! @ERosendo I've applied your suggestions.
Also, I've converted list fields in Opinions and RECAP to NoneToListField and added tests to handle the bug related to ES DSL partial updates.

mlissner · 2024-05-16T15:56:52Z

Sounds like consensus. Merging!

fix(api): Introduced V4 Opinion Search API

248ef4c

albertisfu changed the base branch from main to 3033-develop-v4-recap-search-api May 2, 2024 02:10

albertisfu added 2 commits May 2, 2024 20:08

fix(api): Fixed Opinions Serializer content and highlighting.

c6e445f

- Also merge the snippet content from DB when highlighting is disabled in the API request. - Included more V4 Opinions Search API

fix(api): Improved API tests helpers and added more tests and fixes

21ecfd8

albertisfu force-pushed the 3033-develop-v4-opinions-search-api branch from 6dc94bb to 21ecfd8 Compare May 3, 2024 18:10

albertisfu marked this pull request as ready for review May 3, 2024 18:10

albertisfu requested a review from mlissner May 3, 2024 18:10

fix(elasticsearch): Increase the number of nested opinions displayed …

eefdf09

…to 20 in search results.

Base automatically changed from 3033-develop-v4-recap-search-api to main May 6, 2024 23:16

albertisfu added 4 commits May 13, 2024 16:58

Merge branch 'main' into 3033-develop-v4-opinions-search-api

89695a8

fix(api): Fixed merge conflicts after updating the branch

37d05f8

fix(api): Fix test_o_results_api_pagination failing test

1a5bd06

Merge branch 'main' into 3033-develop-v4-opinions-search-api

5faf3ed

albertisfu added 2 commits May 14, 2024 12:23

Merge branch 'main' into 3033-develop-v4-opinions-search-api

6b4dbf1

fix(api): Solved merge conflicts and introduced meta key to V4 Opinio…

bb527b1

…n Search API

fix(api): Removed date_created and timestamp from main-level Opinion …

1ad7e5c

…objects.

ERosendo reviewed May 14, 2024

View reviewed changes

ERosendo reviewed May 15, 2024

View reviewed changes

Merge branch 'main' into 3033-develop-v4-opinions-search-api

fc126f4

ERosendo reviewed May 15, 2024

View reviewed changes

albertisfu added 2 commits May 15, 2024 19:34

fix(api): Applied suggestions and code improvements.

8797b96

fix(api): Fix failing test and added NoneToListFields

f8e30a2

mlissner merged commit 56521b0 into main May 16, 2024
13 checks passed

mlissner deleted the 3033-develop-v4-opinions-search-api branch May 16, 2024 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3033 Introduced V4 Opinion Search API #4007

3033 Introduced V4 Opinion Search API #4007

albertisfu commented May 2, 2024 •

edited

semgrep-app bot commented May 2, 2024

mlissner commented May 3, 2024

albertisfu commented May 3, 2024

mlissner commented May 3, 2024

albertisfu commented May 4, 2024

mlissner commented May 4, 2024

albertisfu commented May 6, 2024

mlissner commented May 14, 2024

albertisfu commented May 14, 2024

ERosendo May 14, 2024

albertisfu May 16, 2024

ERosendo May 14, 2024

albertisfu May 16, 2024

ERosendo May 15, 2024

albertisfu May 16, 2024

ERosendo May 15, 2024

albertisfu May 16, 2024

ERosendo left a comment

albertisfu commented May 16, 2024

mlissner commented May 16, 2024

3033 Introduced V4 Opinion Search API #4007

3033 Introduced V4 Opinion Search API #4007

Conversation

albertisfu commented May 2, 2024 • edited

Count

Sorting

Highlighting

semgrep-app bot commented May 2, 2024

mlissner commented May 3, 2024

albertisfu commented May 3, 2024

mlissner commented May 3, 2024

albertisfu commented May 4, 2024

mlissner commented May 4, 2024

albertisfu commented May 6, 2024

mlissner commented May 14, 2024

albertisfu commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ERosendo left a comment

Choose a reason for hiding this comment

albertisfu commented May 16, 2024

mlissner commented May 16, 2024

albertisfu commented May 2, 2024 •

edited