libp2p · mxinden · Aug 17, 2022 · Jul 28, 2022 · Aug 1, 2022 · Aug 1, 2022
diff --git a/docs/coding-guidelines.md b/docs/coding-guidelines.md
@@ -0,0 +1,269 @@
+# Coding Guidelines
+
+<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
+**Table of Contents**
+
+- [Coding Guidelines](#coding-guidelines)
+    - [Hierarchical State Machines](#hierarchical-state-machines)
+        - [Conventions for `poll` implementations](#conventions-for-poll-implementations)
+        - [Prioritize local work over new work from a remote](#prioritize-local-work-over-new-work-from-a-remote)
+    - [Bound everything](#bound-everything)
+        - [Channels](#channels)
+        - [Local queues](#local-queues)
+        - [Further reading](#further-reading)
+    - [No premature optimizations](#no-premature-optimizations)
+    - [Keep things sequential unless proven to be slow](#keep-things-sequential-unless-proven-to-be-slow)
+    - [Use `async/await` for sequential execution only](#use-asyncawait-for-sequential-execution-only)
+    - [Don't communicate by sharing memory; share memory by communicating.](#dont-communicate-by-sharing-memory-share-memory-by-communicating)
+        - [Further Reading](#further-reading)
+
+<!-- markdown-toc end -->
+
+
+Below is a set of coding guidelines followed across the rust-libp2p code base.
+
+## Hierarchical State Machines
+
+If you sqint, rust-libp2p is just a big hierarchy of [state
+machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass
+events down to their children and children pass events up to their parents.
+
+<details>
+    <summary>Code</summary>
+
+    ```
+    @startuml
+    Swarm <|-- RootBehaviour
+    Swarm <|-- ConnectionPool
+    Swarm <|-- Transport
+    RootBehaviour <|-- PingBehaviour
+    RootBehaviour <|-- IdentifyBehaviour
+    RootBehaviour <|-- KademliaBehaviour
+
+    Swarm : poll()
+    RootBehaviour : poll()
+    ConnectionPool : poll()
+    Transport : poll()
+    PingBehaviour : poll()
+    IdentifyBehaviour : poll()
+    KademliaBehaviour : poll()
+    @enduml
+    ```
+</details>
+
+```
+                                         ,------.
+                                         |Swarm |
+                                         |------|
+                                         |poll()|
+                                         `------'
+                                             |
+                                             |
+                   ,-------------.   ,--------------.   ,---------.
+                   |RootBehaviour|   |ConnectionPool|   |Transport|
+                   |-------------|   |--------------|   |---------|
+                   |poll()       |   |poll()        |   |poll()   |
+                   `-------------'   `--------------'   `---------'
+                           |
+,-------------.  ,-----------------.   ,-----------------.
+|PingBehaviour|  |IdentifyBehaviour|   |KademliaBehaviour|
+|-------------|  |-----------------|   |-----------------|
+|poll()       |  |poll()           |   |poll()           |
+`-------------'  `-----------------'   `-----------------'
+```
+
+Using hierarchical state machines is a deliberate choice throughout the
+rust-libp2p code base. It makes reasoning about control and data flow easy and
+works well with Rust's `Future` model. The archetecture pattern of
+hierarchical state machines should be used wherever possible.
+
+### Conventions for `poll` implementations
+
+The `poll` method of a single state machine can be complex especially when that
+state machine itself `poll`s many child state machines. The patterns shown below
+have proven useful and should be followed across the code base.
+
+``` rust
+fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>{
+  loop {
+    match self.child_1.poll(cx) {
+      // The child made progress.
+      Poll::Ready(_) => {
+        // Either return an event to the parent:
+        return Poll::Ready(todo!());
+        // or `continue`, thus polling `child_1` again. `child_1` can potentially make more progress:
+        continue
+        // but NEVER move to the next child if the current child made progress. Given
+        // that the current child might be able to make more progress, it did not yet
+        // register the waker in order for the root task to be woken up later on. Moving
+        // on to the next child might result in the larger `Future` task to stall as it
+        // assumes that there is no more progress to be made.
+      }
+
+      // The child did not make progress. It has registered the waker for a
+      // later wake up. Proceed with the other childs.
+      Poll::Ready(_) => {
-      Poll::Ready(_) => {
+      Poll::Pending => {
-      Poll::Ready(_) => {
+      Poll::Pending => {
+    }
+
+    match self.child_2.poll(cx) {
+      // As above.
+    }
+
+    match self.child_3.poll(cx) {
+      // As above.
+    }
+
+    // ...
+
+    // None of the child state machines can make any more progress. Each registered
+    // the waker in order for the root `Future` task to be woken up again.
+    return Poll::Pending
+  }
+}
+```
+
+### Prioritize local work over new work from a remote
+
+When handling multiple work streams, prioritize local work items over
+accepting new work items from a remote. Take the following state machine as an
+example, reading and writing from a socket, returning result to its parent:
+
+``` rust
+struct SomeStateMachine {
+  socket: Socket,
+  events_to_return_to_parent: VecDeque<Event>,
+  messages_to_send_on_socket: VecDeque<Message>,
+}
+
+impl Stream for SomeStateMachine {
+  type Item = Event;
+
+  fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
+    // First priority is returning local finished work.
+    if let Some(event) = events_to_return_to_parent.pop_front() {
+      return Poll::Ready(Some(event));
+    }
+
+    // Second priority is finishing local work, i.e. sending on the socket.
+    if let Poll::Ready(()) = socket.poll_ready(cx) {
+      todo!("Send messages")
+    }
+
+    // Last priority is accepting new work, i.e. reading from the socket.
+    if let Some(work_item) = socket.poll_next(cx) {
+      todo!("Start work on new work item")
+    }
+  }
+}
+```
+
+This priotization provides:
+- Low memory footprint as local queues (here `events_to_return_to_parent`) stay small.
+- Low latency as accepted local work is not stuck in queues.
+- DOS defense as a remote does not control the size of the local queue, nor starves local work with its remote work.
+
+## Bound everything
+
+The concept of unboundedness is an illusion. Use bounded mechanisms to prevent
+unbounded memory growth and high latencies.
+
+### Channels
+
+When using channels (e.g. `futures::channel::mpsc` or `std::sync::mpsc`)
+always use the bounded variant, never use the unbounded variant. When using a
+bounded channel, a slow consumer eventually slows down a fast producer once
+the channel bound is reached, ideally granting the slow consumer more system
+resources e.g. CPU time, keeping queues small and thus latencies low. When
+using an unbounded channel a fast producer continues being a fast producer,
+growing the channel buffer indefinitely, increasing latency until the illusion
+of unboundedness breaks and the system runs out of memory.
+
+One may use an unbounded channel if one enforces backpressure through an
+out-of-band mechanism, e.g. the consumer granting the producer send-tokens
+through a side-channel.
+
+### Local queues
+
+As for channels shared across potentially concurrent actors (e.g. future tasks
+or OS threads), the same applies for queues owned by a single actor only. E.g.
+reading events from a socket into a `Vec<Event>` without some mechanism
+bounding the size of that `Vec<Event>` again can lead to unbounded memory
+growth and high latencies.
+
+Note that rust-libp2p fails at this guideline, i.e. still has many unbounded
+local queues.
+
+### Further reading
+
+- https://en.wikipedia.org/wiki/Bufferbloat
+- https://apenwarr.ca/log/20170814
+- https://twitter.com/peterbourgon/status/1212800031406739456
+
+## No premature optimizations
+
+Optimizations that add complexity need to be accompanied with a proof of their
+effectiveness.
+
+This as well applies to increasing buffer or channel sizes, as the downside of
+such pseudo optimizations is increased memory footprint and latency.
+
+## Keep things sequential unless proven to be slow
+
+Concurrency adds complexity. Concurrency adds overhead due to synchronization.
+Thus unless proven to be a bottleneck, don't make things concurrent. As an example
+the hierarchical `NetworkBehaviour` state machine, run sequentially. It is easy
+to debug as it runs sequentially. Thus far there has been no proof that
+shows a speed up when running it concurrently.
+
+## Use `async/await` for sequential execution only
+
+Using `async/await` does not allow accesing methods on the object being
+`await`ed unless paired with some synchronization mechanism like an
+`Arc<Mutex<_>>`.
+
+Using `async/await` for sequential execution makes things significantly simpler.
+
+Example: Read and once done write from/to a socket. Use `async/await`.
+
+``` rust
+socket.read_exact(&mut read_buf).await;
+socket.write(&write_buf).await;
+```
+
+Example: Read and concurrently write from/to a socket. Use `poll`.
+
+``` rust
+loop {
+    match socket.poll_read(cx, &mut read_buf) {
+        Poll::Ready(_) => {
+            todo!();
+            continue;
+        }
+        Poll::Pending => {}
+    }
+    match socket.poll_write(cx, &write_buf) {
+        Poll::Ready(_) => {
+            todo!();
+            continue;
+        }
+        Poll::Pending => {}
+    }
+
+    return Poll::Pending;
+}
+```
+
+When providing `async` methods, make it explicit whether it is safe to cancel
+the resulting `Future`, i.e. whether it is safe to drop the `Future` returned
+by the `async` method.
+
+## Don't communicate by sharing memory; share memory by communicating.
+
+The majority of rust-libp2p's code base follows the above Golang philosophy,
+e.g. using channels instead of mutexes. This pattern enforces single ownership
+over data, which works well with Rust's ownership model and makes reasoning
+about data flow easier.
+
+### Further Reading
+
+- https://go.dev/blog/codelab-share