Skip to content

Commit

Permalink
[DOCS] Add retrievers overview (elastic#107959)
Browse files Browse the repository at this point in the history
  • Loading branch information
leemthompo committed May 7, 2024
1 parent 2382fb5 commit 5848d24
Show file tree
Hide file tree
Showing 2 changed files with 209 additions and 1 deletion.
207 changes: 207 additions & 0 deletions docs/reference/search/search-your-data/retrievers-overview.asciidoc
@@ -0,0 +1,207 @@
[[retrievers-overview]]
== Retrievers

// Will move to a top level "Retrievers and reranking" section once reranking is live

preview::[]

A retriever is an abstraction that was added to the Search API in *8.14.0*.
This abstraction enables the configuration of multi-stage retrieval
pipelines within a single `_search` call. This simplifies your search
application logic, because you no longer need to configure complex searches via
multiple {es} calls or implement additional client-side logic to
combine results from different queries.

This document provides a general overview of the retriever abstraction.
For implementation details, including notable restrictions, check out the
<<retriever,reference documentation>> in the `_search` API docs.

[discrete]
[[retrievers-overview-types]]
=== Retriever types

Retrievers come in various types, each tailored for different search operations.
The following retrievers are currently available:

* <<standard-retriever,*Standard Retriever*>>. Returns top documents from a
traditional https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl.html[query].
Mimics a traditional query but in the context of a retriever framework. This
ensures backward compatibility as existing `_search` requests remain supported.
That way you can transition to the new abstraction at your own pace without
mixing syntaxes.
* <<knn-retriever,*kNN Retriever*>>. Returns top documents from a <<search-api-knn,knn search>>,
in the context of a retriever framework.
* <<rrf-retriever,*RRF Retriever*>>. Combines and ranks multiple first-stage retrievers using
the reciprocal rank fusion (RRF) algorithm. Allows you to combine multiple result sets
with different relevance indicators into a single result set.
An RRF retriever is a *compound retriever*, where its `filter` element is
propagated to its sub retrievers.
+
Sub retrievers may not use elements that
are restricted by having a compound retriever as part of the retriever tree.
See the <<rrf-using-multiple-standard-retrievers,RRF documentation>> for detailed
examples and information on how to use the RRF retriever.

[NOTE]
====
Stay tuned for more retriever types in future releases!
====

[discrete]
=== What makes retrievers useful?

Here's an overview of what makes retrievers useful and how they differ from
regular queries.

. *Simplified user experience*. Retrievers simplify the user experience by
allowing entire retrieval pipelines to be configured in a single API call. This
maintains backward compatibility with traditional query elements by
automatically translating them to the appropriate retriever.
. *Structured retrieval*. Retrievers provide a more structured way to define search
operations. They allow searches to be described using a "retriever tree", a
hierarchical structure that clarifies the sequence and logic of operations,
making complex searches more understandable and manageable.
. *Composability and flexibility*. Retrievers enable flexible composability,
allowing you to build pipelines and seamlessly integrate different retrieval
strategies into these pipelines. Retrievers make it easy to test out different
retrieval strategy combinations.
. *Compound operations*. A retriever can have sub retrievers. This
allows complex nested searches where the results of one retriever feed into
another, supporting sophisticated querying strategies that might involve
multiple stages or criteria.
. *Retrieval as a first-class concept*. Unlike
traditional queries, where the query is a part of a larger search API call,
retrievers are designed as standalone entities that can be combined or used in
isolation. This enables a more modular and flexible approach to constructing
searches.
. *Enhanced control over document scoring and ranking*. Retrievers
allow for more explicit control over how documents are scored and filtered. For
instance, you can specify minimum score thresholds, apply complex filters
without affecting scoring, and use parameters like `terminate_after` for
performance optimizations.
. *Integration with existing {es} functionalities*. Even though
retrievers can be used instead of existing `_search` API syntax (like the
`query` and `knn`), they are designed to integrate seamlessly with things like
pagination (`search_after`) and sorting. They also maintain compatibility with
aggregation operations by treating the combination of all leaf retrievers as
`should` clauses in a boolean query.
. *Cleaner separation of concerns*. When using compound retrievers, only the
query element is allowed, which enforces a cleaner separation of concerns
and prevents the complexity that might arise from overly nested or
interdependent configurations.

[discrete]
[[retrievers-overview-example]]
=== Example

The following example demonstrates how using retrievers
simplify the composability of queries for RRF ranking.

[source,js]
----
GET example-index/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"text_expansion": {
"vector.tokens": {
"model_id": ".elser_model_2",
"model_text": "What blue shoes are on sale?"
}
}
}
}
},
{
"standard": {
"query": {
"match": {
"text": "blue shoes sale"
}
}
}
}
]
}
}
}
----
//NOTCONSOLE

This example demonstrates how you can combine different
retrieval strategies into a single `retriever` pipeline.

Compare to `RRF` with `sub_searches` approach:

.*Expand* for example
[%collapsible]
==============
[source,js]
----
GET example-index/_search
{
"sub_searches":[
{
"query":{
"match":{
"text":"blue shoes sale"
}
}
},
{
"query":{
"text_expansion":{
"vector.tokens":{
"model_id":".elser_model_2",
"model_text":"What blue shoes are on sale?"
}
}
}
}
],
"rank":{
"rrf":{
"window_size":50,
"rank_constant":20
}
}
}
----
//NOTCONSOLE
==============

[discrete]
[[retrievers-overview-glossary]]
=== Glossary

Here are some important terms:

* *Retrieval Pipeline*. Defines the entire retrieval and ranking logic to
produce top hits.
* *Retriever Tree*. A hierarchical structure that defines how retrievers interact.
* *First-stage Retriever*. Returns an initial set of candidate documents.
* *Compound Retriever*. Builds on one or more retrievers,
enhancing document retrieval and ranking logic.
* *Combiners*. Compound retrievers that merge top hits
from multiple sub-retrievers.
//* NOT YET *Rerankers*. Special compound retrievers that reorder hits and may adjust the number of hits, with distinctions between first-stage and second-stage rerankers.

[discrete]
[[retrievers-overview-play-in-search]]
=== Retrievers in action

The Search Playground builds Elasticsearch queries using the retriever abstraction.
It automatically detects the fields and types in your index and builds a retriever tree based on your selections.

You can use the Playground to experiment with different retriever configurations and see how they affect search results.

Refer to the {kibana-ref}/playground.html[Playground documentation] for more information.
// Content coming in https://github.com/elastic/kibana/pull/182692



Expand Up @@ -43,10 +43,11 @@ DSL, with a simplified user experience. Create search applications based on your
results directly in the Kibana Search UI.

include::search-api.asciidoc[]
include::search-application-overview.asciidoc[]
include::knn-search.asciidoc[]
include::semantic-search.asciidoc[]
include::retrievers-overview.asciidoc[]
include::learning-to-rank.asciidoc[]
include::search-across-clusters.asciidoc[]
include::search-with-synonyms.asciidoc[]
include::search-application-overview.asciidoc[]
include::behavioral-analytics/behavioral-analytics-overview.asciidoc[]

0 comments on commit 5848d24

Please sign in to comment.