3033 Introduced V4 People Search API #4021

albertisfu · 2024-05-04T01:38:54Z

This PR introduces the PEOPLE search type to the V4 Search API (Judges).

The object's structure looks as follows:

{
   "aba_rating":[
      
   ],
   "absolute_url":"/person/15466/frank-aguilar/",
   "alias":[
      
   ],
   "alias_ids":[
      
   ],
   "date_granularity_dob":"",
   "date_granularity_dod":"",
   "dob":null,
   "dob_city":"",
   "dob_state":"",
   "dob_state_id":"",
   "dod":null,
   "fjc_id":"None",
   "gender":"Male",
   "id":15466,
   "meta":{
      "timestamp":"2024-05-15T16:28:47.694956Z",
      "date_created":"2022-06-17T17:44:50.557685Z"
   },
   "name":"Frank Aguilar",
   "political_affiliation":[
      "Democratic"
   ],
   "political_affiliation_id":[
      "d"
   ],
   "positions":[
      {
         "appointer":null,
         "court":"Tex. 228th Jud. Dist. Ct.",
         "court_citation_string":"",
         "court_exact":"texdistct229",
         "court_full_name":"Texas 228th Judicial District Court",
         "date_confirmation":null,
         "date_elected":"2018-11-06",
         "date_granularity_start":"%Y-%m-%d",
         "date_granularity_termination":"%Y-%m-%d",
         "date_hearing":null,
         "date_judicial_committee_action":null,
         "date_nominated":null,
         "date_recess_appointment":null,
         "date_referred_to_judicial_committee":null,
         "date_retirement":null,
         "date_start":"2019-01-01",
         "date_termination":"2023-01-01",
         "job_title":"",
         "judicial_committee_action":"",
         "meta":{
            "timestamp":"2024-05-15T16:28:49.391488Z",
            "date_created":"2022-06-17T17:44:50.626627Z"
         },
         "nomination_process":"",
         "organization_name":null,
         "position_type":"Presiding Judge",
         "predecessor":null,
         "selection_method":"",
         "selection_method_id":"",
         "supervisor":null,
         "termination_reason":""
      }
   ],
   "races":[
      "Hispanic/Latino"
   ],
   "religion":"",
   "school":[
      "The University of Texas at Austin"
   ]
}

It displays PersonDocument as the main document with their nested PositionDocument.

As in RECAP and Opinions, due to PersonDocument fields being indexed into PositionDocument, if a query only involves a PersonDocument field, all the Person Positions will be matched. This also happens with match-all queries, so that each Person will show all their positions. To ensure all of them are shown, the inner_hits size is set to 1000.

By default, the max inner hits that can be queried is 100, so we'd need to update this setting in the people_vectors index to 1000 before merging this PR:

PUT  /people_vectors/_settings

{
  "index": {
    "max_inner_result_window": 1000
  }
}

If the query matches a position field specifically, only the positions that match the query will be displayed within the Person as nested objects.

Originally, the People search on the frontend and the V3 API was not using the same query approach as other parent-child documents like RECAP and Opinions. This was because People search was not required to show nested documents in the frontend or the V3 API, and it was using a simpler approach that didn't return nested documents. Now, in V4, we need to show nested documents. To centralize the code base for building the People queries, I've migrated the frontend and V3 queries to use build_full_join_es_queries, which is the same approach used in RECAP and Opinions. The difference is that for the frontend and V3, the number of inner hits to return is 0, while in V4 it is 1000.

Sorting

The supported sorting keys for People are the same as those in the fronted:

"score desc"
"name_reverse asc"
"dob desc,name_reverse asc"
"dob asc,name_reverse asc"
"dod desc,name_reverse asc"

Due to dob and dod dates can be None, it was necessary to apply the same approach (custom function score) as in RECAP as a workaround to sort documents by these fields and use them as the search_after param.

Also, we can notice that the dob and dod sorting keys by default have a secondary sorting key, which is name_reverse asc. This means that in the V4 API, the sorting looks like:

1° function score for dob or dod
2° name_reverse asc
3° id desc (as the tiebreaker key)

Highlighting

As in the other search types, highlighting is disabled by default. When enabled by passing highlight=on, the HL fields are the same as in the frontend:

name
dob_city
dob_state_id
school
political_affiliation

All of them parent-level fields.

Empty list fields

I noticed that empty list fields in the people_vectors index were being indexed as None:

prepare_political_affiliation
prepare_alias
prepare_aba_rating
prepare_school
prepare_races
prepare_alias_ids
prepare_political_affiliation_id

In other search types, we display empty list fields as []. So I fixed the indexing to index them as [] when empty. This will be corrected in the next re-index of people_vectors.

However, I also found that on partial updates that involve a list field, the field is re-indexed as None even though it's explicitly passed an empty list. The issue is described here: elastic/elasticsearch-dsl-py#1819

Once the fix is released, we can update the client.

In the meantime, as a workaround, we're using the NoneToListField to display these fields as empty lists instead of None.

Let me know what do you think.

- Make common tests async

…v4-people-search-api

…_es_queries approach. - Improved People serializers. - Fixed a bug related to empty lists values after partial updates.

…v4-people-search-api

…r accordingly

mlissner

This all sounds and looks good to me at a skim. @ERosendo, do you for full review.

Thank you both!

…v4-people-search-api

ERosendo

The code looks good. I tested using different filter combinations and it worked properly. There's just one minor suggestion for refactoring the get_child_top_hits_limit method. After that, I think we can merge this PR 👍

ERosendo · 2024-05-16T20:46:47Z

cl/lib/elasticsearch_utils.py

 def get_child_top_hits_limit(
-    search_params: QueryDict | CleanData, search_type: str
+    search_params: QueryDict | CleanData,
+    search_type: str,
+    api_version: Literal["v3", "v4"] | None = None,


I believe we can combine the match-case statements. They seem to share similar logic.

Sure! I've applied the suggestion, thanks!

…query. - The plain_text was not being merged from the database when HL was disabled in the V4 RECAP Search API.

albertisfu · 2024-05-17T02:34:12Z

thanks, @ERosendo I've applied your suggestion.

While working on that, I noticed a bug in the V4 RECAP Search API, related to get_search_query, where match-all queries are built for nested search types. The problem was that when HL was disabled and performing a match-all query, the snippet was being retrieved from ES using the HL no_match_size feature. As a result, these types of queries wouldn't get the performance boost of disabling HL completely. So, I refactored the method to use build_has_child_query to build all the has_child queries with the same properties, allowing HL to be disabled in the V4 API and getting the snippet from the DB.

Additionally, I noticed that the rd type in the frontend (where it is not supported) was throwing a 500 error instead of failing gracefully. So, I applied a fix to show the search error page instead.

If everything seems good, this can be merged. However, before we proceed, we need to apply this setting in production, which is required to accept the maximum number of positions set to 1000.

PUT  /people_vectors/_settings

{
  "index": {
    "max_inner_result_window": 1000
  }
}

mlissner · 2024-05-17T05:40:23Z

I applied the setting in prod yesterday, so if you are both happy, let's merge!

ERosendo · 2024-05-17T17:05:39Z

@mlissner The latest commit successfully resolved the issue identified by @albertisfu in their comment. Everything is working properly now

albertisfu added 4 commits May 3, 2024 14:57

fix(api): Split People Search API tests into their own class.

d6f6f29

- Make common tests async

fix(api): Introduced V4 People Search API

fdf617d

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

83a6491

…v4-people-search-api

fix(api): Refactored People Search queries to use the build_full_join…

72951de

…_es_queries approach. - Improved People serializers. - Fixed a bug related to empty lists values after partial updates.

albertisfu force-pushed the 3033-develop-v4-people-search-api branch from b866c26 to 72951de Compare May 6, 2024 19:31

albertisfu mentioned this pull request May 8, 2024

4029 Fix V4 Search API Serializer errors and other bugs #4033

Merged

albertisfu added 7 commits May 13, 2024 19:03

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

2ed601e

…v4-people-search-api

fix(api): Solved merge conflicts after updating branch.

2650a4a

Merge branch 'main' into 3033-develop-v4-people-search-api

2bddc62

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

cfb83c3

…v4-people-search-api

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

52543af

…v4-people-search-api

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

31dd40e

…v4-people-search-api

fix(api): Added more V4 People Search API tests and fixed the behavio…

c1aa1d3

…r accordingly

albertisfu marked this pull request as ready for review May 15, 2024 22:43

albertisfu requested a review from mlissner May 15, 2024 22:43

mlissner approved these changes May 15, 2024

View reviewed changes

albertisfu added 2 commits May 15, 2024 19:56

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

9557838

…v4-people-search-api

fix(api): Updated branch and solved merge conflicts

8f06bd9

albertisfu mentioned this pull request May 16, 2024

3033 Introduced V4 Opinion Search API #4007

Merged

Base automatically changed from 3033-develop-v4-opinions-search-api to main May 16, 2024 15:56

albertisfu added 2 commits May 16, 2024 09:57

Merge branch '3033-develop-v4-opinions-search-api' into 3033-develop-…

f301d2b

…v4-people-search-api

Merge branch 'main' into 3033-develop-v4-people-search-api

77a16ba

ERosendo approved these changes May 16, 2024

View reviewed changes

fix(api): Applied suggestion and fixed the bug related to get_search_…

4e5acf0

…query. - The plain_text was not being merged from the database when HL was disabled in the V4 RECAP Search API.

mlissner merged commit 72bdb60 into main May 17, 2024
13 checks passed

mlissner deleted the 3033-develop-v4-people-search-api branch May 17, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3033 Introduced V4 People Search API #4021

3033 Introduced V4 People Search API #4021

albertisfu commented May 4, 2024 •

edited

mlissner left a comment

ERosendo left a comment •

edited

ERosendo May 16, 2024

albertisfu May 17, 2024

albertisfu commented May 17, 2024

mlissner commented May 17, 2024

ERosendo commented May 17, 2024 •

edited

3033 Introduced V4 People Search API #4021

3033 Introduced V4 People Search API #4021

Conversation

albertisfu commented May 4, 2024 • edited

Sorting

Highlighting

Empty list fields

mlissner left a comment

Choose a reason for hiding this comment

ERosendo left a comment • edited

Choose a reason for hiding this comment

ERosendo May 16, 2024

Choose a reason for hiding this comment

albertisfu May 17, 2024

Choose a reason for hiding this comment

albertisfu commented May 17, 2024

mlissner commented May 17, 2024

ERosendo commented May 17, 2024 • edited

albertisfu commented May 4, 2024 •

edited

ERosendo left a comment •

edited

ERosendo commented May 17, 2024 •

edited