Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into lucene_snapshot
Browse files Browse the repository at this point in the history
  • Loading branch information
elasticsearchmachine committed May 9, 2024
2 parents f194fd5 + f5b356d commit a2c947e
Show file tree
Hide file tree
Showing 142 changed files with 9,872 additions and 1,818 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/106820.yaml
@@ -0,0 +1,5 @@
pr: 106820
summary: Add a capabilities API to check node and cluster capabilities
area: Infra/REST API
type: feature
issues: []
6 changes: 6 additions & 0 deletions docs/changelog/107891.yaml
@@ -0,0 +1,6 @@
pr: 107891
summary: Fix `startOffset` must be non-negative error in XLMRoBERTa tokenizer
area: Machine Learning
type: bug
issues:
- 104626
6 changes: 6 additions & 0 deletions docs/changelog/108238.yaml
@@ -0,0 +1,6 @@
pr: 108238
summary: "Nativeaccess: try to load all located libsystemds"
area: Infra/Core
type: bug
issues:
- 107878
5 changes: 5 additions & 0 deletions docs/changelog/108300.yaml
@@ -0,0 +1,5 @@
pr: 108300
summary: "ESQL: Add more time span units"
area: ES|QL
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/108431.yaml
@@ -0,0 +1,5 @@
pr: 108431
summary: "ESQL: Disable quoting in FROM command"
area: ES|QL
type: bug
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/108444.yaml
@@ -0,0 +1,5 @@
pr: 108444
summary: "Apm-data: ignore malformed fields, and too many dynamic fields"
area: Data streams
type: enhancement
issues: []
73 changes: 9 additions & 64 deletions docs/internal/DistributedArchitectureGuide.md
Expand Up @@ -10,70 +10,7 @@

### ActionListener

Callbacks are used extensively throughout Elasticsearch because they enable us to write asynchronous and nonblocking code, i.e. code which
doesn't necessarily compute a result straight away but also doesn't block the calling thread waiting for the result to become available.
They support several useful control flows:

- They can be completed immediately on the calling thread.
- They can be completed concurrently on a different thread.
- They can be stored in a data structure and completed later on when the system reaches a particular state.
- Most commonly, they can be passed on to other methods that themselves require a callback.
- They can be wrapped in another callback which modifies the behaviour of the original callback, perhaps adding some extra code to run
before or after completion, before passing them on.

`ActionListener` is a general-purpose callback interface that is used extensively across the Elasticsearch codebase. `ActionListener` is
used pretty much everywhere that needs to perform some asynchronous and nonblocking computation. The uniformity makes it easier to compose
parts of the system together without needing to build adapters to convert back and forth between different kinds of callback. It also makes
it easier to develop the skills needed to read and understand all the asynchronous code, although this definitely takes practice and is
certainly not easy in an absolute sense. Finally, it has allowed us to build a rich library for working with `ActionListener` instances
themselves, creating new instances out of existing ones and completing them in interesting ways. See for instance:

- all the static methods on [ActionListener](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/ActionListener.java) itself
- [`ThreadedActionListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/ThreadedActionListener.java) for forking work elsewhere
- [`RefCountingListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/RefCountingListener.java) for running work in parallel
- [`SubscribableListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java) for constructing flexible workflows

Callback-based asynchronous code can easily call regular synchronous code, but synchronous code cannot run callback-based asynchronous code
without blocking the calling thread until the callback is called back. This blocking is at best undesirable (threads are too expensive to
waste with unnecessary blocking) and at worst outright broken (the blocking can lead to deadlock). Unfortunately this means that most of our
code ends up having to be written with callbacks, simply because it's ultimately calling into some other code that takes a callback. The
entry points for all Elasticsearch APIs are callback-based (e.g. REST APIs all start at
[`org.elasticsearch.rest.BaseRestHandler#prepareRequest`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/rest/BaseRestHandler.java#L158-L171),
and transport APIs all start at
[`org.elasticsearch.action.support.TransportAction#doExecute`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/TransportAction.java#L65))
and the whole system fundamentally works in terms of an event loop (a `io.netty.channel.EventLoop`) which processes network events via
callbacks.

`ActionListener` is not an _ad-hoc_ invention. Formally speaking, it is our implementation of the general concept of a continuation in the
sense of [_continuation-passing style_](https://en.wikipedia.org/wiki/Continuation-passing_style) (CPS): an extra argument to a function
which defines how to continue the computation when the result is available. This is in contrast to _direct style_ which is the more usual
style of calling methods that return values directly back to the caller so they can continue executing as normal. There's essentially two
ways that computation can continue in Java (it can return a value or it can throw an exception) which is why `ActionListener` has both an
`onResponse()` and an `onFailure()` method.

CPS is strictly more expressive than direct style: direct code can be mechanically translated into continuation-passing style, but CPS also
enables all sorts of other useful control structures such as forking work onto separate threads, possibly to be executed in parallel,
perhaps even across multiple nodes, or possibly collecting a list of continuations all waiting for the same condition to be satisfied before
proceeding (e.g.
[`SubscribableListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java)
amongst many others). Some languages have first-class support for continuations (e.g. the `async` and `await` primitives in C#) allowing the
programmer to write code in direct style away from those exotic control structures, but Java does not. That's why we have to manipulate all
the callbacks ourselves.

Strictly speaking, CPS requires that a computation _only_ continues by calling the continuation. In Elasticsearch, this means that
asynchronous methods must have `void` return type and may not throw any exceptions. This is mostly the case in our code as written today,
and is a good guiding principle, but we don't enforce void exceptionless methods and there are some deviations from this rule. In
particular, it's not uncommon to permit some methods to throw an exception, using things like
[`ActionListener#run`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/ActionListener.java#L381-L390)
(or an equivalent `try ... catch ...` block) further up the stack to handle it. Some methods also take (and may complete) an
`ActionListener` parameter, but still return a value separately for other local synchronous work.

This pattern is often used in the transport action layer with the use of the
[ChannelActionListener](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/ChannelActionListener.java)
class, which wraps a `TransportChannel` produced by the transport layer. `TransportChannel` implementations can hold a reference to a Netty
channel with which to pass the response back to the network caller. Netty has a many-to-one association of network callers to channels, so a
call taking a long time generally won't hog resources: it's cheap. A transport action can take hours to respond and that's alright, barring
caller timeouts.
See the [Javadocs for `ActionListener`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/ActionListener.java)

(TODO: add useful starter references and explanations for a range of Listener classes. Reference the Netty section.)

Expand Down Expand Up @@ -133,6 +70,14 @@ are only used for internode operations/communications.

### Work Queues

### RestClient

The `RestClient` is primarily used in testing, to send requests against cluster nodes in the same format as would users. There
are some uses of `RestClient`, via `RestClientBuilder`, in the production code. For example, remote reindex leverages the
`RestClient` internally as the REST client to the remote elasticsearch cluster, and to take advantage of the compatibility of
`RestClient` requests with much older elasticsearch versions. The `RestClient` is also used externally by the `Java API Client`
to communicate with Elasticsearch.

# Cluster Coordination

(Sketch of important classes? Might inform more sections to add for details.)
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/alias.asciidoc
Expand Up @@ -358,6 +358,8 @@ POST _aliases
----
// TEST[s/^/PUT my-index-2099.05.06-000001\n/]

NOTE: Filters are only applied when using the <<query-dsl,Query DSL>>, and are not applied when <<docs-get,retrieving a document by ID>>.

[discrete]
[[alias-routing]]
=== Routing
Expand Down
17 changes: 9 additions & 8 deletions docs/reference/esql/esql-syntax.asciidoc
Expand Up @@ -160,14 +160,15 @@ Datetime intervals and timespans can be expressed using timespan literals.
Timespan literals are a combination of a number and a qualifier. These
qualifiers are supported:

* `millisecond`/`milliseconds`
* `second`/`seconds`
* `minute`/`minutes`
* `hour`/`hours`
* `day`/`days`
* `week`/`weeks`
* `month`/`months`
* `year`/`years`
* `millisecond`/`milliseconds`/`ms`
* `second`/`seconds`/`sec`/`s`
* `minute`/`minutes`/`min`
* `hour`/`hours`/`h`
* `day`/`days`/`d`
* `week`/`weeks`/`w`
* `month`/`months`/`mo`
* `quarter`/`quarters`/`q`
* `year`/`years`/`yr`/`y`

Timespan literals are not whitespace sensitive. These expressions are all valid:

Expand Down
18 changes: 9 additions & 9 deletions docs/reference/high-availability/cluster-design.asciidoc
Expand Up @@ -7,14 +7,14 @@ nodes to take over their responsibilities, an {es} cluster can continue
operating normally if some of its nodes are unavailable or disconnected.

There is a limit to how small a resilient cluster can be. All {es} clusters
require:
require the following components to function:

- One <<modules-discovery-quorums,elected master node>> node
- At least one node for each <<modules-node,role>>.
- At least one copy of every <<scalability,shard>>.
- One <<modules-discovery-quorums,elected master node>>
- At least one node for each <<modules-node,role>>
- At least one copy of every <<scalability,shard>>

A resilient cluster requires redundancy for every required cluster component.
This means a resilient cluster must have:
This means a resilient cluster must have the following components:

- At least three master-eligible nodes
- At least two nodes of each role
Expand Down Expand Up @@ -375,11 +375,11 @@ The cluster will be resilient to the loss of any zone as long as:
- There are at least two zones containing data nodes.
- Every index that is not a <<searchable-snapshots,searchable snapshot index>>
has at least one replica of each shard, in addition to the primary.
- Shard allocation awareness is configured to avoid concentrating all copies of
a shard within a single zone.
- <<shard-allocation-awareness,Shard allocation awareness>> is configured to
avoid concentrating all copies of a shard within a single zone.
- The cluster has at least three master-eligible nodes. At least two of these
nodes are not voting-only master-eligible nodes, and they are spread evenly
across at least three zones.
nodes are not <<voting-only-node,voting-only master-eligible nodes>>,
and they are spread evenly across at least three zones.
- Clients are configured to send their requests to nodes in more than one zone
or are configured to use a load balancer that balances the requests across an
appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 29 additions & 8 deletions docs/reference/modules/cluster/allocation_awareness.asciidoc
Expand Up @@ -5,7 +5,7 @@ You can use custom node attributes as _awareness attributes_ to enable {es}
to take your physical hardware configuration into account when allocating shards.
If {es} knows which nodes are on the same physical server, in the same rack, or
in the same zone, it can distribute the primary shard and its replica shards to
minimise the risk of losing all shard copies in the event of a failure.
minimize the risk of losing all shard copies in the event of a failure.

When shard allocation awareness is enabled with the
<<dynamic-cluster-setting,dynamic>>
Expand All @@ -19,22 +19,27 @@ allocated in each location. If the number of nodes in each location is
unbalanced and there are a lot of replicas, replica shards might be left
unassigned.

TIP: Learn more about <<high-availability-cluster-design-large-clusters,designing resilient clusters>>.

[[enabling-awareness]]
===== Enabling shard allocation awareness

To enable shard allocation awareness:

. Specify the location of each node with a custom node attribute. For example,
if you want Elasticsearch to distribute shards across different racks, you might
set an awareness attribute called `rack_id` in each node's `elasticsearch.yml`
config file.
. Specify the location of each node with a custom node attribute. For example,
if you want Elasticsearch to distribute shards across different racks, you might
use an awareness attribute called `rack_id`.
+
You can set custom attributes in two ways:

- By editing the `elasticsearch.yml` config file:
+
[source,yaml]
--------------------------------------------------------
node.attr.rack_id: rack_one
--------------------------------------------------------
+
You can also set custom attributes when you start a node:
- Using the `-E` command line argument when you start a node:
+
[source,sh]
--------------------------------------------------------
Expand All @@ -56,17 +61,33 @@ cluster.routing.allocation.awareness.attributes: rack_id <1>
+
You can also use the
<<cluster-update-settings,cluster-update-settings>> API to set or update
a cluster's awareness attributes.
a cluster's awareness attributes:
+
[source,console]
--------------------------------------------------
PUT /_cluster/settings
{
"persistent" : {
"cluster.routing.allocation.awareness.attributes" : "rack_id"
}
}
--------------------------------------------------

With this example configuration, if you start two nodes with
`node.attr.rack_id` set to `rack_one` and create an index with 5 primary
shards and 1 replica of each primary, all primaries and replicas are
allocated across the two nodes.
allocated across the two node.

.All primaries and replicas allocated across two nodes in the same rack
image::images/shard-allocation/shard-allocation-awareness-one-rack.png[All primaries and replicas are allocated across two nodes in the same rack]

If you add two nodes with `node.attr.rack_id` set to `rack_two`,
{es} moves shards to the new nodes, ensuring (if possible)
that no two copies of the same shard are in the same rack.

.Primaries and replicas allocated across four nodes in two racks, with no two copies of the same shard in the same rack
image::images/shard-allocation/shard-allocation-awareness-two-racks.png[Primaries and replicas are allocated across four nodes in two racks with no two copies of the same shard in the same rack]

If `rack_two` fails and takes down both its nodes, by default {es}
allocates the lost shard copies to nodes in `rack_one`. To prevent multiple
copies of a particular shard from being allocated in the same location, you can
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/rest-api/common-parms.asciidoc
Expand Up @@ -1062,8 +1062,8 @@ end::stats[]

tag::stored_fields[]
`stored_fields`::
(Optional, Boolean) If `true`, retrieves the document fields stored in the
index rather than the document `_source`. Defaults to `false`.
(Optional, string)
A comma-separated list of <<mapping-store,`stored fields`>> to include in the response.
end::stored_fields[]

tag::sync[]
Expand Down

0 comments on commit a2c947e

Please sign in to comment.