New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Add retrievers overview #107959
Merged
Merged
[DOCS] Add retrievers overview #107959
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
131 changes: 131 additions & 0 deletions
131
docs/reference/search/search-your-data/retrievers-overview.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
[[retrievers-overview]] | ||
== Retrievers | ||
|
||
// Will move to a top level "Retrievers and reranking" section once reranking is live | ||
|
||
preview::[] | ||
|
||
A retriever is an abstraction that was added to the Search API in *8.14.0*. | ||
This abstraction enables the configuration of multi-stage retrieval | ||
pipelines within a single `_search` call. This simplifies your search | ||
application logic, because you no longer need to configure complex searches via | ||
multiple {es} calls or implement additional client-side logic to | ||
combine results from different queries. | ||
|
||
This document provides a general overview of the retriever abstraction. | ||
For implementation details, including notable restrictions, check out the | ||
<<retriever,reference documentation>> in the `_search` API docs. | ||
|
||
[discrete] | ||
[[retrievers-overview-types]] | ||
=== Retriever types | ||
|
||
Retrievers come in various types, each tailored for different search operations. | ||
The following retrievers are currently available: | ||
|
||
* <<standard-retriever,*Standard Retriever*>>. Returns top documents from a | ||
traditional https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl.html[query]. | ||
Mimics a traditional query but in the context of a retriever framework. This | ||
ensures backward compatibility as existing `_search` requests remain supported. | ||
That way you can transition to the new abstraction at your own pace without | ||
mixing syntaxes. | ||
* <<knn-retriever,*kNN Retriever*>>. Returns top documents from a <<search-api-knn,knn search>>, | ||
in the context of a retriever framework. | ||
* <<rrf-retriever,*RRF Retriever*>>. Combines and ranks multiple standard retrievers using | ||
the reciprocal rank fusion (RRF) algorithm. Allows you to combine multiple result sets | ||
with different relevance indicators into a single result set. | ||
An RRF retriever is a *compound retriever*, where its `filter` element is | ||
propagated to its sub retrievers. | ||
+ | ||
Sub retrievers may not use elements that | ||
are restricted by having a compound retriever as part of the retriever tree. | ||
See the <<rrf-using-multiple-standard-retrievers,RRF documentation>> for detailed | ||
examples and information on how to use the RRF retriever. | ||
|
||
[NOTE] | ||
==== | ||
Stay tuned for more retriever types in future releases! | ||
==== | ||
|
||
[discrete] | ||
=== What Makes Retrievers Useful? | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Here's an overview of what makes retrievers useful and how they differ from | ||
regular queries. | ||
|
||
. *Simplified user experience*. Retrievers simplify the user experience by | ||
allowing entire retrieval pipelines to be configured in a single API call. This | ||
maintains backward compatibility with traditional query elements by | ||
automatically translating them to the appropriate retriever. | ||
. *Structured retrieval*. Retrievers provide a more structured way to define search | ||
operations. They allow searches to be described using a "retriever tree", a | ||
hierarchical structure that clarifies the sequence and logic of operations, | ||
making complex searches more understandable and manageable. | ||
. *Composability and flexibility*. Retrievers enable flexible composability, | ||
allowing you to build pipelines and seamlessly integrate different retrieval | ||
strategies into these pipelines. Retrievers make it easy to test out different | ||
retrieval strategy combinations. | ||
. *Compound operations*. A retriever can have sub retrievers. This | ||
allows complex nested searches where the results of one retriever feed into | ||
another, supporting sophisticated querying strategies that might involve | ||
multiple stages or criteria. | ||
. *Retrieval as a first-class concept*. Unlike | ||
traditional queries, where the query is a part of a larger search API call, | ||
retrievers are designed as standalone entities that can be combined or used in | ||
isolation. This enables a more modular and flexible approach to constructing | ||
searches. | ||
. *Enhanced control over document scoring and ranking*. Retrievers | ||
allow for more explicit control over how documents are scored and filtered. For | ||
instance, you can specify minimum score thresholds, apply complex filters | ||
without affecting scoring, and use parameters like `terminate_after` for | ||
performance optimizations. | ||
. *Integration with existing {es} functionalities*. Even though | ||
retrievers can be used instead of existing `_search` API syntax (like the | ||
`query` and `knn`), they are designed to integrate seamlessly with things like | ||
pagination (`search_after`) and sorting. They also maintain compatibility with | ||
aggregation operations by treating the combination of all leaf retrievers as | ||
`should` clauses in a boolean query. | ||
. *Cleaner separation of concerns*. When using compound retrievers, only the | ||
query element is allowed, which enforces a cleaner separation of concerns | ||
and prevents the complexity that might arise from overly nested or | ||
interdependent configurations. | ||
|
||
[discrete] | ||
[[retrievers-overview-example]] | ||
=== Example: Before and after | ||
|
||
The following example demonstrates how using retrievers can | ||
simplify building and testing complex search pipelines. | ||
|
||
// TODO: Add concrete example(s) sourced from the hive mind | ||
|
||
[discrete] | ||
[[retrievers-overview-glossary]] | ||
=== Glossary | ||
// TODO: Probably remove this, is it useful? | ||
|
||
Here are some important terms: | ||
|
||
* *Retrieval Pipeline*. Defines the entire retrieval and ranking logic to | ||
produce top hits. | ||
* *Compound Retriever*. Builds on one or more retrievers, | ||
enhancing document retrieval and ranking logic. | ||
* *Combiners*. Compound retrievers that merge top hits | ||
from multiple sub-retrievers. | ||
//* NOT YET *Rerankers*. Special compound retrievers that reorder hits and may adjust the number of hits, with distinctions between first-stage and second-stage rerankers. | ||
|
||
[discrete] | ||
[[retrievers-overview-play-in-search]] | ||
=== Retrievers in action | ||
|
||
//Playground will be renamed | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The Search [Playground], builds Elasticsearch queries using the retriever abstraction. | ||
It automatically detects the fields and types in your index and builds a retriever tree based on your selections. | ||
|
||
You can use the [Playground] to experiment with different retriever configurations and see how they affect search results. | ||
|
||
Refer to the {kibana-ref}/playground.html[[Playground] documentation] for more information. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "standard" is misleading here - the RRF retriever can nest multiple retrievers, including standard and kNN (and possibly later a reranker as well).
I suggest either removing "standard" or replacing it with "first-stage" (but then you'd need to clarify what a first stage retriever is 😉 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good one for the glossary, let me know if there's other terms you think should be in there :)