Proto-RFC: Futures support in Laminar #290

james7132 · 2021-05-31T23:03:17Z

I've been working on a port of GGPO to Rust: backroll-rs, and built a transport abstraction layer backroll-transport that I think generalizes very well and may be better served as a part of Laminar than as it's own isolated crate.

Motivation

Provide a reliable UDP implementation that works well in async executors for games (i.e. bevy_tasks) that works over any I/O layer implementation: raw UDP, WebRTC, or proprietary sockets like the one provided by Steamwork's ISteamNetworkingSockets or Epic Online Services. Laminar already has a fairly well defined reliability model built atop a raw std::net::UdpSocket, so adding this kind of abstraction can easily leverage the existing types and implementations in Laminar's codebase.

Design

The cornerstone of this design calls for the heavy use of a bidirectional version of async-channel to pass messages around independent tasks on a futures executor. This acts as three things: a representation of the connection state (is the channel still open or not), a buffered connection to pass messages along, and an point of abstraction. Channels are generic only on the type of message they pass, so there is no need to create nested structs with generic parameters to connect up two distinct layers of the network stack (a la ConnectionManager).

A specialization of this channel, known in backroll_transport as a Peer, is a newtype around BidirectionalAsyncChannel that solely focuses on Box<[u8]>, raw binary packets. By default, Peers are assumed to pass only unreliable, unordered packets: a direct UDP analogue. This has already been implemented into backroll_transport, and is used as the main abstraction point for the actual I/O layer (see backroll_transport_udp).

To help manage multiple ongoing connections, backroll_transport has a Peers<T>, a newtype wrapper around DashMap<T, Peer>, which only presents active and open connections by removing non-connected peers from public access. Dropping the entire Peers<T> struct also closes all associated Peers.

By using infinitely looping tasks around these bidrectional channels, it's possible to create a graph of interconnected tasks that construct the connection itself. Below is an example of one such connection (each box is a independent task, all outgoing and incoming arrows are separate multiple producer, multiple consumer channels):

The main game logic regularly polls the top level streams to get notifications about updated connection state and uses it to update the game state. All other tasks run independently of the game loop and continue to run as more I/O is performed. Any additional logic for creating reliability, ordering, or sequencing is performed inline in this graph of tasks. A current implementation is not available, but it may not be too much work to refactor a good number of Laminar's connection state machine components to work with this kind of design. It may be possible to provide a newtyped ReliablePeer, OrderedPeer, etc. that wrap an existing Peer to provide those connection properties as needed. Alternatively, if a I/O layer implementation supports reliable, ordered, or sequenced connections, it can return it's own newtyped peer (see Steam's ISteamNetworkingSockets) rather than relying on Laminar's. This also allows the connection stack to be only as long as needed: unreliable packets do not require the overhead of checking packet reliability headers.

These individual tasks can be very simple, and encapsulate the running connection state well. For example, the heartbeat/keep-alive task for keeping the connection open in backroll is written simply as the following:

async fn heartbeat(self, interval: Duration) {
    while let Ok(()) = self.send(MessageData::KeepAlive) {
        debug!("Sent keep alive packet");
        Delay::new(interval).await;
    }
}

As these tasks terminate when the associated channels are closed, disconnecting with a connection with a remote peer is as simple as closing the associated channels. The tasks will then terminate in a cascading fashion as each channel is closed, eventually removing the connection from the I/O layer Peers<T> tracker.

One additional benefit is that the I/O layer can be abstracted without replacing the actual types for communication. It does not matter if UDP, WebRTC, or the proprietary sockets for Steam or Epic Online Service's are used as the I/O implementation, they all return Peer. With one caveat, as a peer is considered valid so long as the connection is open, implementations where there is an initial handshake to establish a connection may need to return impl Future<Output=Result<Peer, ...>> instead.

Pros

Bidirectional channels are a very close approximation to how two way operations on connections operate.
async-channel types implement Stream and Sink, which makes it easy to use combinators to handle transformative logic.
As futures only poll when awoken, there is little to zero overhead while connections are not active. This may improve scalability and enable support for higher active connection counts per process (i.e. MMO servers).
This integrates well with existing Rust game engines like Bevy, which come with their own async executors.
This removes the need to manually poll each ConnectionManager every so often, outside of potentially flushing disconnected peers from Peers<T>.
If the underlying tasks are exposed as a part of Laminar's public API, it may be possible to allow users of the library to inject logic at a lower level (i.e. packet level encryption/compression).
Having an abstraction using Peer allows unit testing with an opaque in-process connection rather than requiring a generic fake socket implementation. FakeSocket can be replaced with a singular task that connects two distinct Peers that emulate packet loss and latency.

Cons

An futures executor is required.
For those setting up a runtime environment that supports this may need additional boilerplate to set up.
Error handling may not be so clear beyond sending events instead of raw binary packets and logging.
Unit tests for both Laminar and consumers of the API will need to run async tests.

Alternatives considered

Potentially this could be put under a separate async module enabled via feature flag (see how redis-rs handles the sync/async split) instead of a ground up rewrite.

For generalizing across different socket types, allowing DatagramSocket to be generically implemented any network remote ID, not just SocketAddr would be very useful.

The text was updated successfully, but these errors were encountered:

ValorZard · 2021-06-01T05:02:59Z

The main problem with async stuff is that it doesn't really work on the Web and with WebRTC. You would need to use wasm_bindgen to turn the Rust futures into the Javascript equivalent.

james7132 · 2021-06-01T05:11:41Z

A quick cursory glance around seems to show that async-global-executor's and async-std's executors do support running in wasm without problems, both of which derive from async-executor, which I think bevy is currently using for bevy_tasks, so I think that might be a generic question of what futures runtime is used. That said, this change should be agnostic to the actual executor that it's running on.

TimonPost · 2021-06-02T05:54:41Z

@james7132 thanks for the write-up, at this point in time, laminar is not actively-maintained, and the amethyst community is also at a low phase. Perhaps I can give you permissions for this repository such that you can work on it and continue the work-in-progress. Let me know if you're open to that?

So one question, do you think it is still possible to 'support' the non-async version at the same time? I ask this since the amethyst engine is currently not async, and making laminar async will likely introduce some issues. @erlend-sh said if that is so, that it would not be a huge deal since amethyst isn't really active anymore. I would like to add that although it isn't active we should likely keep in mind other engines and users as well. What do

james7132 · 2021-06-02T08:46:11Z

I am indeed open to it.

Re: maintaining the non-async version, would a synchronous public facing API be preferable, but using an async runtime underneath it all be OK? or would the idea to keep the two fundamentally separated (a la redis-rs's split between async/sync code).

async-channel exposes a try_recv and try_send option, which we can wrap to create synchronous polls. The underlying I/O will still be async, and to my knowledge, there isn't a platform Rust runs on that isn't currently supported well by using either bevy_tasks or a raw async-executor.

TimonPost · 2021-06-02T19:36:29Z

Excellent, I invited you as a maintainer.

My initial response is that there have been some people requesting WebRTC support in laminar, we have held the opinion that laminar should not be a higher-level protocol that works on the application layer. Meaning that it should not provide a general abstraction for various protocols but that it is to be a pure UDP-based transport layer protocol according to the specs of Gaffer on Games (similar to backroll_transport_udp). Laminar has defined some packet header and assumes incoming data is from laminar not from any other protocol.

That aside, the fact that this abstraction over various protocols is requested many times, the internal API already has abstracted away UDP, and it generally limits laminar usability, I would be for the change as proposed. At the same time, we do need to keep in mind users are perhaps using laminar with the current standpoint of a transport layer protocol instead application layer.

I do see a good case to support async for laminar. We currently use a sloppy poll solution that was made for a potential repace by async polling. Mio and futures support is something that is requested often, but because of amethyst was often neglected. I like the channel idea, currently laminar already works with standard lib channels with the similar arguments you provided (not async ofc.). I got one remark with Box<[u8]>, does this mean we heap-allocate each incoming packet?

So for the multiple ongoing connections do you mean 'opening up peers to various kinds of protocols (webrtc/epic games/stream, etc...)'?

All in all, I like the proposal at a high level, at lower level implications on how to rewrite laminar I am a bit unsure (since I have been out of the running for a year). This will probably have to be figured out while integrating async supportability and the proposed changes.

A question I like to have your opinion on as well, how do you think integrating secure connections will work with this model? This has been a huge question with big implications for the current design. If we are going to redesign anyways, I would like to bring this concern along as well.

james7132 · 2021-06-02T20:17:11Z

I got one remark with Box<[u8]>, does this mean we heap-allocate each incoming packet?

This is currently the case, and I can see that it might impact performance. However, I don't see a sane way to pass non-static references/slices in an async environment. Someone who has a better understanding of futures and lifetimes may need to check if that is doable.

So for the multiple ongoing connections do you mean 'opening up peers to various kinds of protocols (webrtc/epic games/stream, etc...)'?

It's more of a ConnectionManager analogue. It manages multiple connections from the local machine to some mapped remote network identity, keyed by the generic parameter. You use it to store and control the Peers associated with a specific network identity. For example, in backroll_transport_udp, the UdpManager has a Peers<SocketAddr> field that manages the local-remote connection status of all of the associated channels. For a hypothetical Steam transport implementation, it would likely have a Peers<SteamNetworkingIdentity> field for tracking those kinds of connections. It removes disconnected Peer channels from it's public interface, so you can use it to forward packets to open connections without needing to explicitly forward disconnect events from the game.

A question I like to have your opinion on as well, how do you think integrating secure connections will work with this model? This has been a huge question with big implications for the current design. If we are going to redesign anyways, I would like to bring this concern along as well.

One of the hopes that I have for this design is that the entire packet pipeline can be restructured easily: you simply add a task in between two others to inject functionality. (i.e. packet serialization -> adding packet headers can become packet serialization -> packet encryption -> adding packet headers, where each -> is a channel). That said the pipeline should be kept terse since channels do have a locking mechanism and that can kill performance, but general fan-in/fan-out task structures should be much easier to write.

james7132 · 2021-06-02T20:20:48Z

That aside, the fact that this abstraction over various protocols is requested many times, the internal API already has abstracted away UDP, and it generally limits laminar usability, I would be for the change as proposed. At the same time, we do need to keep in mind users are perhaps using laminar with the current standpoint of a transport layer protocol instead application layer.

One more thing I should note: I don't think this would make Laminar an application level protocol, but rather an abstraction of transport layer protocols. This provides a common interface for achieving the same transport level goals (getting packets from point A to B, providing varying levels of delivery guarantees), while giving a good extendable framework for developers to extend the pipeline. The aforementioned encryption method doesn't need to lie within Laminar, though it could, developers could make their own Peer based tasks that expose the same interface: a Peer (or some newtype wrapper around one).

TimonPost · 2021-06-03T05:53:49Z

Good, yea, we originally had purposed amethyst_network for that purpose as an 'application-layer application to use the various transport protocols. However, amethyst_network is ECS limited. So I do like this change, we keep the intentions the same for laminar (provide a way to sent different guarantees of packets), only the way it works will be different and better extendable.

Yea, passing data through an async pipeline with channels is a challenge to the nature of features. Anyhow, perhaps making it boxed isn't that bad since we are able to pass a pointer over the channels instead of copying the packet into the channel.

james7132 mentioned this issue Jun 1, 2021

Tracking Issue: Migrate transport level logic into backroll-transport HouraiTeahouse/backroll-rs#9

Open

5 tasks

james7132 mentioned this issue Jun 3, 2021

Async I/O based Laminar implementation: Initial Pass. #291

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proto-RFC: Futures support in Laminar #290

Proto-RFC: Futures support in Laminar #290

james7132 commented May 31, 2021 •

edited

ValorZard commented Jun 1, 2021

james7132 commented Jun 1, 2021

TimonPost commented Jun 2, 2021 •

edited

james7132 commented Jun 2, 2021

TimonPost commented Jun 2, 2021 •

edited

james7132 commented Jun 2, 2021

james7132 commented Jun 2, 2021 •

edited

TimonPost commented Jun 3, 2021 •

edited

Proto-RFC: Futures support in Laminar #290

Proto-RFC: Futures support in Laminar #290

Comments

james7132 commented May 31, 2021 • edited

Motivation

Design

Pros

Cons

Alternatives considered

ValorZard commented Jun 1, 2021

james7132 commented Jun 1, 2021

TimonPost commented Jun 2, 2021 • edited

james7132 commented Jun 2, 2021

TimonPost commented Jun 2, 2021 • edited

james7132 commented Jun 2, 2021

james7132 commented Jun 2, 2021 • edited

TimonPost commented Jun 3, 2021 • edited

james7132 commented May 31, 2021 •

edited

TimonPost commented Jun 2, 2021 •

edited

TimonPost commented Jun 2, 2021 •

edited

james7132 commented Jun 2, 2021 •

edited

TimonPost commented Jun 3, 2021 •

edited