Skip to content

Commit

Permalink
Move conceptual docs about ActionListener (#107875)
Browse files Browse the repository at this point in the history
This information is more discoverable as the class-level javadocs for
`ActionListener` itself rather than hidden away in a separate Markdown
file. Also this way the links all stay up to date.
  • Loading branch information
DaveCTurner committed May 9, 2024
1 parent fa2f813 commit 864543b
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 68 deletions.
65 changes: 1 addition & 64 deletions docs/internal/DistributedArchitectureGuide.md
Expand Up @@ -10,70 +10,7 @@

### ActionListener

Callbacks are used extensively throughout Elasticsearch because they enable us to write asynchronous and nonblocking code, i.e. code which
doesn't necessarily compute a result straight away but also doesn't block the calling thread waiting for the result to become available.
They support several useful control flows:

- They can be completed immediately on the calling thread.
- They can be completed concurrently on a different thread.
- They can be stored in a data structure and completed later on when the system reaches a particular state.
- Most commonly, they can be passed on to other methods that themselves require a callback.
- They can be wrapped in another callback which modifies the behaviour of the original callback, perhaps adding some extra code to run
before or after completion, before passing them on.

`ActionListener` is a general-purpose callback interface that is used extensively across the Elasticsearch codebase. `ActionListener` is
used pretty much everywhere that needs to perform some asynchronous and nonblocking computation. The uniformity makes it easier to compose
parts of the system together without needing to build adapters to convert back and forth between different kinds of callback. It also makes
it easier to develop the skills needed to read and understand all the asynchronous code, although this definitely takes practice and is
certainly not easy in an absolute sense. Finally, it has allowed us to build a rich library for working with `ActionListener` instances
themselves, creating new instances out of existing ones and completing them in interesting ways. See for instance:

- all the static methods on [ActionListener](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/ActionListener.java) itself
- [`ThreadedActionListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/ThreadedActionListener.java) for forking work elsewhere
- [`RefCountingListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/RefCountingListener.java) for running work in parallel
- [`SubscribableListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java) for constructing flexible workflows

Callback-based asynchronous code can easily call regular synchronous code, but synchronous code cannot run callback-based asynchronous code
without blocking the calling thread until the callback is called back. This blocking is at best undesirable (threads are too expensive to
waste with unnecessary blocking) and at worst outright broken (the blocking can lead to deadlock). Unfortunately this means that most of our
code ends up having to be written with callbacks, simply because it's ultimately calling into some other code that takes a callback. The
entry points for all Elasticsearch APIs are callback-based (e.g. REST APIs all start at
[`org.elasticsearch.rest.BaseRestHandler#prepareRequest`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/rest/BaseRestHandler.java#L158-L171),
and transport APIs all start at
[`org.elasticsearch.action.support.TransportAction#doExecute`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/TransportAction.java#L65))
and the whole system fundamentally works in terms of an event loop (a `io.netty.channel.EventLoop`) which processes network events via
callbacks.

`ActionListener` is not an _ad-hoc_ invention. Formally speaking, it is our implementation of the general concept of a continuation in the
sense of [_continuation-passing style_](https://en.wikipedia.org/wiki/Continuation-passing_style) (CPS): an extra argument to a function
which defines how to continue the computation when the result is available. This is in contrast to _direct style_ which is the more usual
style of calling methods that return values directly back to the caller so they can continue executing as normal. There's essentially two
ways that computation can continue in Java (it can return a value or it can throw an exception) which is why `ActionListener` has both an
`onResponse()` and an `onFailure()` method.

CPS is strictly more expressive than direct style: direct code can be mechanically translated into continuation-passing style, but CPS also
enables all sorts of other useful control structures such as forking work onto separate threads, possibly to be executed in parallel,
perhaps even across multiple nodes, or possibly collecting a list of continuations all waiting for the same condition to be satisfied before
proceeding (e.g.
[`SubscribableListener`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java)
amongst many others). Some languages have first-class support for continuations (e.g. the `async` and `await` primitives in C#) allowing the
programmer to write code in direct style away from those exotic control structures, but Java does not. That's why we have to manipulate all
the callbacks ourselves.

Strictly speaking, CPS requires that a computation _only_ continues by calling the continuation. In Elasticsearch, this means that
asynchronous methods must have `void` return type and may not throw any exceptions. This is mostly the case in our code as written today,
and is a good guiding principle, but we don't enforce void exceptionless methods and there are some deviations from this rule. In
particular, it's not uncommon to permit some methods to throw an exception, using things like
[`ActionListener#run`](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/ActionListener.java#L381-L390)
(or an equivalent `try ... catch ...` block) further up the stack to handle it. Some methods also take (and may complete) an
`ActionListener` parameter, but still return a value separately for other local synchronous work.

This pattern is often used in the transport action layer with the use of the
[ChannelActionListener](https://github.com/elastic/elasticsearch/blob/v8.12.2/server/src/main/java/org/elasticsearch/action/support/ChannelActionListener.java)
class, which wraps a `TransportChannel` produced by the transport layer. `TransportChannel` implementations can hold a reference to a Netty
channel with which to pass the response back to the network caller. Netty has a many-to-one association of network callers to channels, so a
call taking a long time generally won't hog resources: it's cheap. A transport action can take hours to respond and that's alright, barring
caller timeouts.
See the [Javadocs for `ActionListener`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/ActionListener.java)

(TODO: add useful starter references and explanations for a range of Listener classes. Reference the Netty section.)

Expand Down
85 changes: 81 additions & 4 deletions server/src/main/java/org/elasticsearch/action/ActionListener.java
Expand Up @@ -31,17 +31,94 @@
import static org.elasticsearch.action.ActionListenerImplementations.safeOnFailure;

/**
* A listener for action responses or failures.
* <p>
* Callbacks are used extensively throughout Elasticsearch because they enable us to write asynchronous and nonblocking code, i.e. code
* which doesn't necessarily compute a result straight away but also doesn't block the calling thread waiting for the result to become
* available. They support several useful control flows:
* </p>
* <ul>
* <li>They can be completed immediately on the calling thread.</li>
* <li>They can be completed concurrently on a different thread.</li>
* <li>They can be stored in a data structure and completed later on when the system reaches a particular state.</li>
* <li>Most commonly, they can be passed on to other methods that themselves require a callback.</li>
* <li>They can be wrapped in another callback which modifies the behaviour of the original callback, perhaps adding some extra code to run
* before or after completion, before passing them on.</li>
* </ul>
* <p>
* {@link ActionListener} is a general-purpose callback interface that is used extensively across the Elasticsearch codebase. {@link
* ActionListener} is used pretty much everywhere that needs to perform some asynchronous and nonblocking computation. The uniformity makes
* it easier to compose parts of the system together without needing to build adapters to convert back and forth between different kinds of
* callback. It also makes it easier to develop the skills needed to read and understand all the asynchronous code, although this definitely
* takes practice and is certainly not easy in an absolute sense. Finally, it has allowed us to build a rich library for working with {@link
* ActionListener} instances themselves, creating new instances out of existing ones and completing them in interesting ways. See for
* instance:
* </p>
* <ul>
* <li>All the static methods on {@link ActionListener} itself.</li>
* <li>{@link org.elasticsearch.action.support.ThreadedActionListener} for forking work elsewhere.</li>
* <li>{@link org.elasticsearch.action.support.RefCountingListener} for running work in parallel.</li>
* <li>{@link org.elasticsearch.action.support.SubscribableListener} for constructing flexible workflows.</li>
* </ul>
* <p>
* Callback-based asynchronous code can easily call regular synchronous code, but synchronous code cannot run callback-based asynchronous
* code without blocking the calling thread until the callback is called back. This blocking is at best undesirable (threads are too
* expensive to waste with unnecessary blocking) and at worst outright broken (the blocking can lead to deadlock). Unfortunately this means
* that most of our code ends up having to be written with callbacks, simply because it's ultimately calling into some other code that takes
* a callback. The entry points for all Elasticsearch APIs are callback-based (e.g. REST APIs all start at {@link
* org.elasticsearch.rest.BaseRestHandler}{@code #prepareRequest} and transport APIs all start at {@link
* org.elasticsearch.action.support.TransportAction}{@code #doExecute} and the whole system fundamentally works in terms of an event loop
* (an {@code io.netty.channel.EventLoop}) which processes network events via callbacks.
* </p>
* <p>
* {@link ActionListener} is not an <i>ad-hoc</i> invention. Formally speaking, it is our implementation of the general concept of a
* continuation in the sense of <a href="https://en.wikipedia.org/wiki/Continuation-passing_style"><i>continuation-passing style</i></a>
* (CPS): an extra argument to a function which defines how to continue the computation when the result is available. This is in contrast to
* <i>direct style</i> which is the more usual style of calling methods that return values directly back to the caller so they can continue
* executing as normal. There's essentially two ways that computation can continue in Java (it can return a value or it can throw an
* exception) which is why {@link ActionListener} has both an {@link #onResponse} and an {@link #onFailure} method.
* </p>
* <p>
* CPS is strictly more expressive than direct style: direct code can be mechanically translated into continuation-passing style, but CPS
* also enables all sorts of other useful control structures such as forking work onto separate threads, possibly to be executed in
* parallel, perhaps even across multiple nodes, or possibly collecting a list of continuations all waiting for the same condition to be
* satisfied before proceeding (e.g. {@link org.elasticsearch.action.support.SubscribableListener} amongst many others). Some languages have
* first-class support for continuations (e.g. the {@code async} and {@code await} primitives in C#) allowing the programmer to write code
* in direct style away from those exotic control structures, but Java does not. That's why we have to manipulate all the callbacks
* ourselves.
* </p>
* <p>
* Strictly speaking, CPS requires that a computation <i>only</i> continues by calling the continuation. In Elasticsearch, this means that
* asynchronous methods must have {@code void} return type and may not throw any exceptions. This is mostly the case in our code as written
* today, and is a good guiding principle, but we don't enforce void exceptionless methods and there are some deviations from this rule. In
* particular, it's not uncommon to permit some methods to throw an exception, using things like {@link ActionListener#run} (or an
* equivalent {@code try ... catch ...} block) further up the stack to handle it. Some methods also take (and may complete) an {@link
* ActionListener} parameter, but still return a value separately for other local synchronous work.
* </p>
* <p>
* This pattern is often used in the transport action layer with the use of the {@link
* org.elasticsearch.action.support.ChannelActionListener} class, which wraps a {@link org.elasticsearch.transport.TransportChannel}
* produced by the transport layer.{@link org.elasticsearch.transport.TransportChannel} implementations can hold a reference to a Netty
* channel with which to pass the response back to the network caller. Netty has a many-to-one association of network callers to channels,
* so a call taking a long time generally won't hog resources: it's cheap. A transport action can take hours to respond and that's alright,
* barring caller timeouts.
* </p>
* <p>
* Note that we explicitly avoid {@link java.util.concurrent.CompletableFuture} and other similar mechanisms as much as possible. They
* can achieve the same goals as {@link ActionListener}, but can also easily be misused in various ways that lead to severe bugs. In
* particular, futures support blocking while waiting for a result, but this is almost never appropriate in Elasticsearch's production code
* where threads are such a precious resource. Moreover if something throws an {@link Error} then the JVM should exit pretty much straight
* away, but {@link java.util.concurrent.CompletableFuture} can catch an {@link Error} which delays the JVM exit until its result is
* observed. This may be much later, or possibly even never. It's not possible to introduce such bugs when using {@link ActionListener}.
* </p>
*/
public interface ActionListener<Response> {
/**
* Handle action response. This response may constitute a failure or a
* success but it is up to the listener to make that decision.
* Complete this listener with a successful (or at least, non-exceptional) response.
*/
void onResponse(Response response);

/**
* A failure caused by an exception at some phase of the task.
* Complete this listener with an exceptional response.
*/
void onFailure(Exception e);

Expand Down

0 comments on commit 864543b

Please sign in to comment.