From b6157e02acb8de9efa9adaa31d747f47dfb23001 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 28 Jul 2022 14:44:39 +0900 Subject: [PATCH 01/11] docs/coding-guidelines: Add document Add document outlining a set of coding guidelines followed and to be followed across the rust-libp2p code base. --- docs/coding-guidelines.md | 269 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 269 insertions(+) create mode 100644 docs/coding-guidelines.md diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md new file mode 100644 index 00000000000..e6d78b8ec34 --- /dev/null +++ b/docs/coding-guidelines.md @@ -0,0 +1,269 @@ +# Coding Guidelines + + +**Table of Contents** + +- [Coding Guidelines](#coding-guidelines) + - [Hierarchical State Machines](#hierarchical-state-machines) + - [Conventions for `poll` implementations](#conventions-for-poll-implementations) + - [Prioritize local work over new work from a remote](#prioritize-local-work-over-new-work-from-a-remote) + - [Bound everything](#bound-everything) + - [Channels](#channels) + - [Local queues](#local-queues) + - [Further reading](#further-reading) + - [No premature optimizations](#no-premature-optimizations) + - [Keep things sequential unless proven to be slow](#keep-things-sequential-unless-proven-to-be-slow) + - [Use `async/await` for sequential execution only](#use-asyncawait-for-sequential-execution-only) + - [Don't communicate by sharing memory; share memory by communicating.](#dont-communicate-by-sharing-memory-share-memory-by-communicating) + - [Further Reading](#further-reading) + + + + +Below is a set of coding guidelines followed across the rust-libp2p code base. + +## Hierarchical State Machines + +If you sqint, rust-libp2p is just a big hierarchy of [state +machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass +events down to their children and children pass events up to their parents. + +
+ Code + + ``` + @startuml + Swarm <|-- RootBehaviour + Swarm <|-- ConnectionPool + Swarm <|-- Transport + RootBehaviour <|-- PingBehaviour + RootBehaviour <|-- IdentifyBehaviour + RootBehaviour <|-- KademliaBehaviour + + Swarm : poll() + RootBehaviour : poll() + ConnectionPool : poll() + Transport : poll() + PingBehaviour : poll() + IdentifyBehaviour : poll() + KademliaBehaviour : poll() + @enduml + ``` +
+ +``` + ,------. + |Swarm | + |------| + |poll()| + `------' + | + | + ,-------------. ,--------------. ,---------. + |RootBehaviour| |ConnectionPool| |Transport| + |-------------| |--------------| |---------| + |poll() | |poll() | |poll() | + `-------------' `--------------' `---------' + | +,-------------. ,-----------------. ,-----------------. +|PingBehaviour| |IdentifyBehaviour| |KademliaBehaviour| +|-------------| |-----------------| |-----------------| +|poll() | |poll() | |poll() | +`-------------' `-----------------' `-----------------' +``` + +Using hierarchical state machines is a deliberate choice throughout the +rust-libp2p code base. It makes reasoning about control and data flow easy and +works well with Rust's `Future` model. The archetecture pattern of +hierarchical state machines should be used wherever possible. + +### Conventions for `poll` implementations + +The `poll` method of a single state machine can be complex especially when that +state machine itself `poll`s many child state machines. The patterns shown below +have proven useful and should be followed across the code base. + +``` rust +fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll{ + loop { + match self.child_1.poll(cx) { + // The child made progress. + Poll::Ready(_) => { + // Either return an event to the parent: + return Poll::Ready(todo!()); + // or `continue`, thus polling `child_1` again. `child_1` can potentially make more progress: + continue + // but NEVER move to the next child if the current child made progress. Given + // that the current child might be able to make more progress, it did not yet + // register the waker in order for the root task to be woken up later on. Moving + // on to the next child might result in the larger `Future` task to stall as it + // assumes that there is no more progress to be made. + } + + // The child did not make progress. It has registered the waker for a + // later wake up. Proceed with the other childs. + Poll::Ready(_) => { + } + + match self.child_2.poll(cx) { + // As above. + } + + match self.child_3.poll(cx) { + // As above. + } + + // ... + + // None of the child state machines can make any more progress. Each registered + // the waker in order for the root `Future` task to be woken up again. + return Poll::Pending + } +} +``` + +### Prioritize local work over new work from a remote + +When handling multiple work streams, prioritize local work items over +accepting new work items from a remote. Take the following state machine as an +example, reading and writing from a socket, returning result to its parent: + +``` rust +struct SomeStateMachine { + socket: Socket, + events_to_return_to_parent: VecDeque, + messages_to_send_on_socket: VecDeque, +} + +impl Stream for SomeStateMachine { + type Item = Event; + + fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll> { + // First priority is returning local finished work. + if let Some(event) = events_to_return_to_parent.pop_front() { + return Poll::Ready(Some(event)); + } + + // Second priority is finishing local work, i.e. sending on the socket. + if let Poll::Ready(()) = socket.poll_ready(cx) { + todo!("Send messages") + } + + // Last priority is accepting new work, i.e. reading from the socket. + if let Some(work_item) = socket.poll_next(cx) { + todo!("Start work on new work item") + } + } +} +``` + +This priotization provides: +- Low memory footprint as local queues (here `events_to_return_to_parent`) stay small. +- Low latency as accepted local work is not stuck in queues. +- DOS defense as a remote does not control the size of the local queue, nor starves local work with its remote work. + +## Bound everything + +The concept of unboundedness is an illusion. Use bounded mechanisms to prevent +unbounded memory growth and high latencies. + +### Channels + +When using channels (e.g. `futures::channel::mpsc` or `std::sync::mpsc`) +always use the bounded variant, never use the unbounded variant. When using a +bounded channel, a slow consumer eventually slows down a fast producer once +the channel bound is reached, ideally granting the slow consumer more system +resources e.g. CPU time, keeping queues small and thus latencies low. When +using an unbounded channel a fast producer continues being a fast producer, +growing the channel buffer indefinitely, increasing latency until the illusion +of unboundedness breaks and the system runs out of memory. + +One may use an unbounded channel if one enforces backpressure through an +out-of-band mechanism, e.g. the consumer granting the producer send-tokens +through a side-channel. + +### Local queues + +As for channels shared across potentially concurrent actors (e.g. future tasks +or OS threads), the same applies for queues owned by a single actor only. E.g. +reading events from a socket into a `Vec` without some mechanism +bounding the size of that `Vec` again can lead to unbounded memory +growth and high latencies. + +Note that rust-libp2p fails at this guideline, i.e. still has many unbounded +local queues. + +### Further reading + +- https://en.wikipedia.org/wiki/Bufferbloat +- https://apenwarr.ca/log/20170814 +- https://twitter.com/peterbourgon/status/1212800031406739456 + +## No premature optimizations + +Optimizations that add complexity need to be accompanied with a proof of their +effectiveness. + +This as well applies to increasing buffer or channel sizes, as the downside of +such pseudo optimizations is increased memory footprint and latency. + +## Keep things sequential unless proven to be slow + +Concurrency adds complexity. Concurrency adds overhead due to synchronization. +Thus unless proven to be a bottleneck, don't make things concurrent. As an example +the hierarchical `NetworkBehaviour` state machine, run sequentially. It is easy +to debug as it runs sequentially. Thus far there has been no proof that +shows a speed up when running it concurrently. + +## Use `async/await` for sequential execution only + +Using `async/await` does not allow accesing methods on the object being +`await`ed unless paired with some synchronization mechanism like an +`Arc>`. + +Using `async/await` for sequential execution makes things significantly simpler. + +Example: Read and once done write from/to a socket. Use `async/await`. + +``` rust +socket.read_exact(&mut read_buf).await; +socket.write(&write_buf).await; +``` + +Example: Read and concurrently write from/to a socket. Use `poll`. + +``` rust +loop { + match socket.poll_read(cx, &mut read_buf) { + Poll::Ready(_) => { + todo!(); + continue; + } + Poll::Pending => {} + } + match socket.poll_write(cx, &write_buf) { + Poll::Ready(_) => { + todo!(); + continue; + } + Poll::Pending => {} + } + + return Poll::Pending; +} +``` + +When providing `async` methods, make it explicit whether it is safe to cancel +the resulting `Future`, i.e. whether it is safe to drop the `Future` returned +by the `async` method. + +## Don't communicate by sharing memory; share memory by communicating. + +The majority of rust-libp2p's code base follows the above Golang philosophy, +e.g. using channels instead of mutexes. This pattern enforces single ownership +over data, which works well with Rust's ownership model and makes reasoning +about data flow easier. + +### Further Reading + +- https://go.dev/blog/codelab-share From 7714a30ffb34d5a1be811e5ca78c348403dd2a33 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 1 Aug 2022 16:13:17 +0900 Subject: [PATCH 02/11] docs/coding-guidelines: Document benefit controlling poll order --- docs/coding-guidelines.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index e6d78b8ec34..0a5224abd80 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -73,9 +73,10 @@ events down to their children and children pass events up to their parents. ``` Using hierarchical state machines is a deliberate choice throughout the -rust-libp2p code base. It makes reasoning about control and data flow easy and -works well with Rust's `Future` model. The archetecture pattern of -hierarchical state machines should be used wherever possible. +rust-libp2p code base. It makes reasoning about control and data flow easy. It +works well with Rust's `Future` model. It allows fine-grain control e.g. on the +order child state machines are polled. The archetecture pattern of hierarchical +state machines should be used wherever possible. ### Conventions for `poll` implementations From 3b07e71c4c8b9711e69b5def205f7f336bc95d89 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 1 Aug 2022 16:25:31 +0900 Subject: [PATCH 03/11] docs/coding-guidelines: Document favoring iteration over recursion --- docs/coding-guidelines.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index 0a5224abd80..b2ef6e0dbd0 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -16,6 +16,8 @@ - [Use `async/await` for sequential execution only](#use-asyncawait-for-sequential-execution-only) - [Don't communicate by sharing memory; share memory by communicating.](#dont-communicate-by-sharing-memory-share-memory-by-communicating) - [Further Reading](#further-reading) + - [Use iteration not recursion](#use-iteration-not-recursion) + - [Further Reading](#further-reading-1) @@ -268,3 +270,14 @@ about data flow easier. ### Further Reading - https://go.dev/blog/codelab-share + +## Use iteration not recursion + +Rust does not support tail call optimization, thus using recursion may grow the +stack potentially unboundedly. Instead use iteration e.g. via `loop` or `for`. + +### Further Reading + +- https://en.wikipedia.org/wiki/Tail_call +- https://stackoverflow.com/questions/65948553/why-is-recursion-not-suggested-in-rust +- https://stackoverflow.com/questions/59257543/when-is-tail-recursion-guaranteed-in-rust From 2b40afa41eeb2f9cb5d8891e88128162274f3db3 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 4 Aug 2022 16:01:11 +0900 Subject: [PATCH 04/11] docs/coding-guidelines: Fix typo Ready to Pending --- docs/coding-guidelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index b2ef6e0dbd0..99d317dcdd3 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -105,7 +105,7 @@ fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll{ // The child did not make progress. It has registered the waker for a // later wake up. Proceed with the other childs. - Poll::Ready(_) => { + Poll::Pending(_) => { } match self.child_2.poll(cx) { From 995fd8cdbc7422d9d9ddc1d43cbee698f36104c0 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Thu, 4 Aug 2022 16:08:32 +0900 Subject: [PATCH 05/11] docs/coding-guidelines: Loop on work prioritization --- docs/coding-guidelines.md | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index 99d317dcdd3..eb4a151bdcb 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -142,19 +142,27 @@ impl Stream for SomeStateMachine { type Item = Event; fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll> { - // First priority is returning local finished work. - if let Some(event) = events_to_return_to_parent.pop_front() { - return Poll::Ready(Some(event)); - } + loop { + // First priority is returning local finished work. + if let Some(event) = events_to_return_to_parent.pop_front() { + return Poll::Ready(Some(event)); + } - // Second priority is finishing local work, i.e. sending on the socket. - if let Poll::Ready(()) = socket.poll_ready(cx) { - todo!("Send messages") - } + // Second priority is finishing local work, i.e. sending on the socket. + if let Poll::Ready(()) = socket.poll_ready(cx) { + todo!("Send messages") + continue // Go back to the top. One might be able to send more. + } + + // Last priority is accepting new work, i.e. reading from the socket. + if let Poll::Ready(work_item) = socket.poll_next(cx) { + todo!("Start work on new item") + continue // Go back to the top. There might be more progress to be made. + } - // Last priority is accepting new work, i.e. reading from the socket. - if let Some(work_item) = socket.poll_next(cx) { - todo!("Start work on new work item") + // At this point in time, there is no more progress to be made. Return + // `Pending` and be woken up later. + return Poll::Pending; } } } From ed4d6330487e5a0e8bf8dc61bc9e14367b32c5c8 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Sat, 13 Aug 2022 09:57:21 +0900 Subject: [PATCH 06/11] docs/coding-guidelines: Fill out entire control flow on example --- docs/coding-guidelines.md | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index eb4a151bdcb..cb5dd69ee03 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -94,7 +94,8 @@ fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll{ Poll::Ready(_) => { // Either return an event to the parent: return Poll::Ready(todo!()); - // or `continue`, thus polling `child_1` again. `child_1` can potentially make more progress: + // or `continue`, thus polling `child_1` again. `child_1` can potentially make more progress. Try to exhaust + // it before moving on to the next child. continue // but NEVER move to the next child if the current child made progress. Given // that the current child might be able to make more progress, it did not yet @@ -104,20 +105,29 @@ fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll{ } // The child did not make progress. It has registered the waker for a - // later wake up. Proceed with the other childs. - Poll::Pending(_) => { + // later wake up. Proceed with the other children. + Poll::Pending(_) => {} } match self.child_2.poll(cx) { - // As above. + Poll::Ready(child_2_event) => { + // Events can be dispatched from one child to the other. + self.child_1.handle_event(child_2_event); + + // Either `continue` thus polling `child_1` again, or `return Poll::Ready` with a result to the parent. + todo!() + } + Poll::Pending(_) => {} } match self.child_3.poll(cx) { - // As above. + Poll::Ready(__) => { + // Either `continue` thus polling `child_1` again, or `return Poll::Ready` with a result to the parent. + todo!() + } + Poll::Pending(_) => {} } - // ... - // None of the child state machines can make any more progress. Each registered // the waker in order for the root `Future` task to be woken up again. return Poll::Pending From 0ef322a27252bd53d87205419969fcdcefdceade Mon Sep 17 00:00:00 2001 From: Max Inden Date: Sat, 13 Aug 2022 10:06:46 +0900 Subject: [PATCH 07/11] docs/coding-guidelines: Document hierarchical state machine downsides --- docs/coding-guidelines.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index cb5dd69ee03..b45dacc7854 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -75,10 +75,16 @@ events down to their children and children pass events up to their parents. ``` Using hierarchical state machines is a deliberate choice throughout the -rust-libp2p code base. It makes reasoning about control and data flow easy. It +rust-libp2p code base. It makes reasoning about control and data flow simple. It works well with Rust's `Future` model. It allows fine-grain control e.g. on the -order child state machines are polled. The archetecture pattern of hierarchical -state machines should be used wherever possible. +order child state machines are polled. + +The above comes with downsides. It feels more verbose. The mix of control flow (`loop`, `return`, +`break`, `continue`) in `poll` functions together with the asynchronous and thus decoupled +communication via events can be very hard to understand. Both are a form of complexity that we are +trading for correctness and performance which aligns with Rust's and rust-libp2p's goals. + +The archetecture pattern of hierarchical state machines should be used wherever possible. ### Conventions for `poll` implementations From 412f03e8760d2895ac48558c96a2885ea37edc72 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Sat, 13 Aug 2022 10:14:17 +0900 Subject: [PATCH 08/11] docs/coding-guidelines: Reword async/await --- docs/coding-guidelines.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index b45dacc7854..89db52f3510 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -238,17 +238,16 @@ such pseudo optimizations is increased memory footprint and latency. Concurrency adds complexity. Concurrency adds overhead due to synchronization. Thus unless proven to be a bottleneck, don't make things concurrent. As an example -the hierarchical `NetworkBehaviour` state machine, run sequentially. It is easy +the hierarchical `NetworkBehaviour` state machine runs sequentially. It is easy to debug as it runs sequentially. Thus far there has been no proof that shows a speed up when running it concurrently. ## Use `async/await` for sequential execution only -Using `async/await` does not allow accesing methods on the object being -`await`ed unless paired with some synchronization mechanism like an -`Arc>`. - Using `async/await` for sequential execution makes things significantly simpler. +Though unfortunately using `async/await` does not allow accesing methods on the +object being `await`ed unless paired with some synchronization mechanism like an +`Arc>`. Example: Read and once done write from/to a socket. Use `async/await`. From 87c3bb8f20adedf71b114db6d94d47874ecc5473 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Sat, 13 Aug 2022 10:20:19 +0900 Subject: [PATCH 09/11] docs/coding-guidelines: Replace ASCII with svg --- docs/architecture.svg | 39 +++++++++++++++++++++++++++++++++++++++ docs/coding-guidelines.md | 25 +++---------------------- 2 files changed, 42 insertions(+), 22 deletions(-) create mode 100644 docs/architecture.svg diff --git a/docs/architecture.svg b/docs/architecture.svg new file mode 100644 index 00000000000..354aaf3dda8 --- /dev/null +++ b/docs/architecture.svg @@ -0,0 +1,39 @@ +Swarmpoll()RootBehaviourpoll()ConnectionPoolpoll()Transportpoll()PingBehaviourpoll()IdentifyBehaviourpoll()KademliaBehaviourpoll() \ No newline at end of file diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index 89db52f3510..2d2e1be11ba 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -30,8 +30,10 @@ If you sqint, rust-libp2p is just a big hierarchy of [state machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass events down to their children and children pass events up to their parents. +![Archetecture](architecture.svg) +
- Code + Reproduce diagram ``` @startuml @@ -53,27 +55,6 @@ events down to their children and children pass events up to their parents. ```
-``` - ,------. - |Swarm | - |------| - |poll()| - `------' - | - | - ,-------------. ,--------------. ,---------. - |RootBehaviour| |ConnectionPool| |Transport| - |-------------| |--------------| |---------| - |poll() | |poll() | |poll() | - `-------------' `--------------' `---------' - | -,-------------. ,-----------------. ,-----------------. -|PingBehaviour| |IdentifyBehaviour| |KademliaBehaviour| -|-------------| |-----------------| |-----------------| -|poll() | |poll() | |poll() | -`-------------' `-----------------' `-----------------' -``` - Using hierarchical state machines is a deliberate choice throughout the rust-libp2p code base. It makes reasoning about control and data flow simple. It works well with Rust's `Future` model. It allows fine-grain control e.g. on the From 69ea476897dcbe5da673dd18d558e88e15233b7c Mon Sep 17 00:00:00 2001 From: Max Inden Date: Sat, 13 Aug 2022 11:30:23 +0900 Subject: [PATCH 10/11] docs/coding-guidelines: Fix typo --- docs/coding-guidelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index 2d2e1be11ba..c6da500d2e5 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -30,7 +30,7 @@ If you sqint, rust-libp2p is just a big hierarchy of [state machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass events down to their children and children pass events up to their parents. -![Archetecture](architecture.svg) +![Architecture](architecture.svg)
Reproduce diagram From eb0b75d8a9aaf9a280cc9b38a0bb901724695195 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 17 Aug 2022 06:51:05 +0200 Subject: [PATCH 11/11] Update docs/coding-guidelines.md Co-authored-by: Thomas Eizinger --- docs/coding-guidelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md index c6da500d2e5..a208dd2768c 100644 --- a/docs/coding-guidelines.md +++ b/docs/coding-guidelines.md @@ -65,7 +65,7 @@ The above comes with downsides. It feels more verbose. The mix of control flow ( communication via events can be very hard to understand. Both are a form of complexity that we are trading for correctness and performance which aligns with Rust's and rust-libp2p's goals. -The archetecture pattern of hierarchical state machines should be used wherever possible. +The architecture pattern of hierarchical state machines should be used wherever possible. ### Conventions for `poll` implementations