-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3033 Introduced V4 People Search API #4021
Conversation
- Make common tests async
…v4-people-search-api
…_es_queries approach. - Improved People serializers. - Fixed a bug related to empty lists values after partial updates.
b866c26
to
72951de
Compare
…v4-people-search-api
…v4-people-search-api
…v4-people-search-api
…v4-people-search-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all sounds and looks good to me at a skim. @ERosendo, do you for full review.
Thank you both!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. I tested using different filter combinations and it worked properly. There's just one minor suggestion for refactoring the get_child_top_hits_limit
method. After that, I think we can merge this PR 👍
def get_child_top_hits_limit( | ||
search_params: QueryDict | CleanData, search_type: str | ||
search_params: QueryDict | CleanData, | ||
search_type: str, | ||
api_version: Literal["v3", "v4"] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can combine the match-case statements. They seem to share similar logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I've applied the suggestion, thanks!
…query. - The plain_text was not being merged from the database when HL was disabled in the V4 RECAP Search API.
thanks, @ERosendo I've applied your suggestion. While working on that, I noticed a bug in the V4 RECAP Search API, related to Additionally, I noticed that the If everything seems good, this can be merged. However, before we proceed, we need to apply this setting in production, which is required to accept the maximum number of positions set to 1000.
|
I applied the setting in prod yesterday, so if you are both happy, let's merge! |
@mlissner The latest commit successfully resolved the issue identified by @albertisfu in their comment. Everything is working properly now |
This PR introduces the
PEOPLE
search type to the V4 Search API (Judges).The object's structure looks as follows:
It displays
PersonDocument
as the main document with their nestedPositionDocument
.As in RECAP and Opinions, due to
PersonDocument
fields being indexed intoPositionDocument
, if a query only involves aPersonDocument
field, all the Person Positions will be matched. This also happens with match-all queries, so that each Person will show all their positions. To ensure all of them are shown, theinner_hits
size is set to 1000.By default, the max inner hits that can be queried is 100, so we'd need to update this setting in the
people_vectors
index to 1000 before merging this PR:If the query matches a position field specifically, only the positions that match the query will be displayed within the Person as nested objects.
Originally, the People search on the frontend and the V3 API was not using the same query approach as other parent-child documents like RECAP and Opinions. This was because People search was not required to show nested documents in the frontend or the V3 API, and it was using a simpler approach that didn't return nested documents. Now, in V4, we need to show nested documents. To centralize the code base for building the People queries, I've migrated the frontend and V3 queries to use
build_full_join_es_queries
, which is the same approach used in RECAP and Opinions. The difference is that for the frontend and V3, the number of inner hits to return is 0, while in V4 it is 1000.Sorting
The supported sorting keys for People are the same as those in the fronted:
Due to
dob
anddod
dates can beNone
, it was necessary to apply the same approach (custom function score) as in RECAP as a workaround to sort documents by these fields and use them as thesearch_after
param.Also, we can notice that the
dob
anddod
sorting keys by default have a secondary sorting key, which isname_reverse asc
. This means that in the V4 API, the sorting looks like:1° function score for
dob
ordod
2°
name_reverse asc
3°
id desc
(as the tiebreaker key)Highlighting
As in the other search types, highlighting is disabled by default. When enabled by passing
highlight=on
, the HL fields are the same as in the frontend:name
dob_city
dob_state_id
school
political_affiliation
All of them parent-level fields.
Empty list fields
I noticed that empty list fields in the
people_vectors
index were being indexed asNone
:In other search types, we display empty list fields as
[]
. So I fixed the indexing to index them as[]
when empty. This will be corrected in the next re-index ofpeople_vectors
.However, I also found that on partial updates that involve a list field, the field is re-indexed as
None
even though it's explicitly passed an empty list. The issue is described here: elastic/elasticsearch-dsl-py#1819Once the fix is released, we can update the client.
In the meantime, as a workaround, we're using the
NoneToListField
to display these fields as empty lists instead ofNone
.Let me know what do you think.