New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search with nested sort results in 0 results #2480
Comments
Hello @cjdoumas, and thanks for the report. Sorting within nested objects requires special syntax, please see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/sort-search-results.html#nested-sorting. Do you get results when using Also note that using version 8.x of the client against a 7.x cluster can cause issues. (But probably not in this specific case.) Clients are forward-compatible, though, so the opposite is OK. |
So, I tried that syntax, and have had either empty or not sorted results. Honestly, though I am pretty sure I am not using it correctly. So, here are 4 records:
So, for the end result, I want to sort by 3 vars in the following order: _common.entryID, _common.entryGroup, and _common.entryNumber. However, first, I am just trying to sort on _common.entryNumber to figure out the nested syntax. The code below does not throw an error, but results in 0 results. I am certain I am misunderstanding something.
So, first step is getting the nested to work for just one var, then to get it to work with all three in that order. (Afterwards, I'd need to then incorporate the search_after argument to be able datasets beyond 10,000 entries) Am I providing what you need? |
Note, I have also formatted the sort_instructions without the all encompassing quotes, and that also returned 0 results, as such:
|
Any updates on this issue? I'm also not familiar with submitting issues in Github, so if just let me know if I need to be patient or there is something else I should be doing. |
Sorry for the delay, you've been more than patient enough. Your code wasn't really complete enough as I did not get the mapping and had to guess about them. I believe the issue was that sort instruction should not be a string and that your query also needs to take into account that We begin by setting up the connection to the cluster. I tested this both with Elasticsearch 8.x and Elasticsearch 7.17, but only with the latest version of the client. from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
print(client.info()["version"]) Next we define the mappings and index your 4 sample docs: client.options(ignore_status=[404]).indices.delete(index="my_index")
client.indices.create(
index="my_index",
mappings={
"properties": {
"_common": {
"type": "nested",
"properties": {
"eventTime": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS"},
"entryId": {"type": "keyword"},
"entryGroup": {"type": "integer"},
"entryNumber": {"type": "integer"},
"name": {"type": "keyword"},
},
}
}
},
)
for doc in [
{
"_index": "my_index",
"_type": "_doc",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "asdfe553-0gj4-998f-fd34-vnbmc643hfd2",
"_source": {
"_common": {
"eventTime": "2024-03-22 08:59:03.258",
"entryId": "l57fhvg",
"entryGroup": 77,
"entryNumber": 3,
"name": "ecd_1234",
}
},
},
]:
client.create(index=doc["_index"], id=doc["_id"], body=doc["_source"]) And finally we execute the query, modifying import json
query_model = {
"nested": {
"path": "_common",
"query": {
"range": {
"_common.eventTime": {
"gte": "2024-03-22 09:00:00.000",
"lte": "now",
}
}
},
}
}
# Set the sort instructions
sort_instructions = {
"_common.entryNumber": {
"order": "asc",
"nested": {
"path": "_common",
},
}
}
index_data = client.search(
index="my_index", query=query_model, sort=sort_instructions, size=10
)
print(json.dumps(index_data.body, indent=2)) Which prints 3 out of 4 docs, sorted by {
"took": 79,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my_index",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234"
}
},
"sort": [
7
]
},
{
"_index": "my_index",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234"
}
},
"sort": [
11
]
},
{
"_index": "my_index",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234"
}
},
"sort": [
58
]
}
]
}
} Now you can take that code and use |
So, maybe I just don't understand the mapping or know how to find the mapping. I tried your syntax, however that was returning 0 values still. However, by just simply removing the nested argument, it seemed to work.
I thought that the period and structure of the record meant that entryNumber is nested under _common. However, does the fact that the above code worked imply that is is not nested, but something else? |
Oh, when you mentioned nested I assumed you meant the nested data type: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html. Sorry for the misleading response. I will try to take a look later taking this new information into account. |
So, as mentioned in the previous comment, "nested" is a very specific Elasticsearch term. In the context, of Elasticsearch, what you have is not "nested", it's simply an object field, which is the default for arrays if you don't specify mappings: https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/ Here's the complete working example from above with object fields. We begin by setting up the connection to the cluster. I tested this both with Elasticsearch 8.x and Elasticsearch 7.17, but only with the latest version of elasticsearch-py, the client. from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
print(client.info()["version"]) Let's setup the index from scratch, define mappings to be explicit, and then index the docs you sent me: client.options(ignore_status=[404]).indices.delete(index="my_index")
client.indices.create(
index="my_index",
mappings={
"properties": {
"_common": {
"properties": {
"eventTime": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS"},
"entryId": {"type": "keyword"},
"entryGroup": {"type": "integer"},
"entryNumber": {"type": "integer"},
"name": {"type": "keyword"},
},
}
}
},
)
for doc in [
{
"_index": "my_index",
"_type": "_doc",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "asdfe553-0gj4-998f-fd34-vnbmc643hfd2",
"_source": {
"_common": {
"eventTime": "2024-03-22 08:59:03.258",
"entryId": "l57fhvg",
"entryGroup": 77,
"entryNumber": 3,
"name": "ecd_1234",
}
},
},
]:
client.create(index=doc["_index"], id=doc["_id"], body=doc["_source"]) And, finally, let's make the query: import json
query_model = {
"range": {
"_common.eventTime": {
"gte": "2024-03-22 09:00:00.000",
"lte": "now",
}
}
}
# Set the sort instructions
sort_instructions = {
"_common.entryNumber": {
"order": "asc",
}
}
index_data = client.search(
index="my_index", query=query_model, sort=sort_instructions, size=10
)
print(json.dumps(index_data.body, indent=2)) Which still prints 3 out of 4 docs, sorted by entryNumber: {
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234"
}
},
"sort": [
7
]
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234"
}
},
"sort": [
11
]
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234"
}
},
"sort": [
58
]
}
]
}
} Again, you can take that code and use |
Elasticsearch version (
7.17
):elasticsearch-py
version (8.12.1
):Description of the problem including expected versus actual behavior:
I cannot retrieve results when using a nested field in the sort argument of the search function.
I want to pull a large amount of data from Elasticsearch. Based on my understanding of the documentation, the most recent recommendation is to use the search function, sort by a unique field, then update the search_after argument after each pull to set up the next search until complete. This appears to work when using the _id field. However, this also results in a warning that says sorting by _id is deprecated and will go away, and to use a unique identifier or combination of identifiers within the body. There is a unique combination of fields that are nested within a parent. However, when I try to use a nested sort option, I receive 0 results.
I understand there is likely a by including this in the body, but I wanted to use the function as intended instead.
Steps to reproduce:
This is how I am sorting by id:
This get results and the following warning:
ElasticsearchWarning: Loading the fielddata on the _id field is deprecated and will be removed in future versions. If you require sorting or aggregating on this field you should also include the id in the body of your documents, and map this field as a keyword field that has [doc_values] enabled
Here I am trying to sort by a nested field:
And that has 0 results. Note that there is no error, just no results. I have tried a bunch of different fields, but none of the nested ones work, only the parents.
What I would eventually like to build to is something like the following:
The text was updated successfully, but these errors were encountered: