Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/coding-guidelines: Add document #2780

Merged
merged 14 commits into from Aug 17, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
269 changes: 269 additions & 0 deletions docs/coding-guidelines.md
@@ -0,0 +1,269 @@
# Coding Guidelines

<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
**Table of Contents**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub Markdown has a built-in ToC:

image

It is a bit hidden so I don't know we want to rely on people using it if they want a ToC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. Though as you say, quite hidden. I suggest we keep the additional ToC, though that is not a strong opinion.


- [Coding Guidelines](#coding-guidelines)
- [Hierarchical State Machines](#hierarchical-state-machines)
- [Conventions for `poll` implementations](#conventions-for-poll-implementations)
- [Prioritize local work over new work from a remote](#prioritize-local-work-over-new-work-from-a-remote)
- [Bound everything](#bound-everything)
- [Channels](#channels)
- [Local queues](#local-queues)
- [Further reading](#further-reading)
- [No premature optimizations](#no-premature-optimizations)
- [Keep things sequential unless proven to be slow](#keep-things-sequential-unless-proven-to-be-slow)
- [Use `async/await` for sequential execution only](#use-asyncawait-for-sequential-execution-only)
- [Don't communicate by sharing memory; share memory by communicating.](#dont-communicate-by-sharing-memory-share-memory-by-communicating)
- [Further Reading](#further-reading)

<!-- markdown-toc end -->


Below is a set of coding guidelines followed across the rust-libp2p code base.

## Hierarchical State Machines

If you sqint, rust-libp2p is just a big hierarchy of [state
machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass
Comment on lines +29 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I am a fan of using a "one-line per sentence" rule for version-controlled markdown documents.

Git works on a line-by-line basis and sentences in natural language tend to change as a whole too. Using one sentence per line makes it really easy to use GitHub's suggestions to collaborate on documents like this and diffs are more meaningful too.

Just an idea :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I have to admit I am also in the "use an editor with word wrap enabled"-camp :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxinden Any comment on this? We don't have to resolve it here. Should I open a separate discussion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this, correct @thomaseizinger? mxinden@ecf86c8

With GitHubs diff highlighting showing what within a line has changed, I don't think there is a great benefit for it. As a downside, I think it adds complexity, e.g. having to explain this to newcomers.

Just an idea :)

I don't feel strongly about it. It seems like you don't feel strongly about it either. Thus I will proceed without the change.

That said, feel free to propose a new pull request with the commit above.


But I have to admit I am also in the "use an editor with word wrap enabled"-camp :D

Same here. I even stopped truncating my emails to 80 characters 🤯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get to choose, I'd pick one sentence per line.

We don't have to enforce it though (I am yet to find a tool that reliably does).

Are you happy with doing it on a personal preference basis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you happy with doing it on a personal preference basis?

For markdown files on rust-libp2p? Yes, that is fine by me.

events down to their children and children pass events up to their parents.

<details>
<summary>Code</summary>

```
@startuml
Swarm <|-- RootBehaviour
Swarm <|-- ConnectionPool
Swarm <|-- Transport
RootBehaviour <|-- PingBehaviour
RootBehaviour <|-- IdentifyBehaviour
RootBehaviour <|-- KademliaBehaviour

Swarm : poll()
RootBehaviour : poll()
ConnectionPool : poll()
Transport : poll()
PingBehaviour : poll()
IdentifyBehaviour : poll()
KademliaBehaviour : poll()
@enduml
```
</details>

```
,------.
|Swarm |
|------|
|poll()|
`------'
|
|
,-------------. ,--------------. ,---------.
|RootBehaviour| |ConnectionPool| |Transport|
|-------------| |--------------| |---------|
|poll() | |poll() | |poll() |
`-------------' `--------------' `---------'
|
,-------------. ,-----------------. ,-----------------.
|PingBehaviour| |IdentifyBehaviour| |KademliaBehaviour|
|-------------| |-----------------| |-----------------|
|poll() | |poll() | |poll() |
`-------------' `-----------------' `-----------------'
```

Using hierarchical state machines is a deliberate choice throughout the
rust-libp2p code base. It makes reasoning about control and data flow easy and
works well with Rust's `Future` model. The archetecture pattern of
hierarchical state machines should be used wherever possible.
thomaseizinger marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there downsides to this design? If so, should we list them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, the main downsides are:

  • It feels more verbose but I think it is not actually in reality when you would compare two solutions side by side.
  • The mix of control flow (loop, return, break, continue) in poll functions together with the asynchronous and thus decoupled communication via events can be very hard to understand.

Both are a form of complexity that we are trading for correctness and performance which seems typical in Rust land.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I documented the above in 0ef322a. @thomaseizinger I really liked your wording which I mostly adopted. Hope you don't mind.

@melekes in case you can think of other downsides, please comment and let's get them included in the discussion and the document.


### Conventions for `poll` implementations

The `poll` method of a single state machine can be complex especially when that
state machine itself `poll`s many child state machines. The patterns shown below
have proven useful and should be followed across the code base.

``` rust
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>{
loop {
thomaseizinger marked this conversation as resolved.
Show resolved Hide resolved
match self.child_1.poll(cx) {
// The child made progress.
Poll::Ready(_) => {
// Either return an event to the parent:
return Poll::Ready(todo!());
// or `continue`, thus polling `child_1` again. `child_1` can potentially make more progress:
continue
// but NEVER move to the next child if the current child made progress. Given
// that the current child might be able to make more progress, it did not yet
// register the waker in order for the root task to be woken up later on. Moving
// on to the next child might result in the larger `Future` task to stall as it
// assumes that there is no more progress to be made.
}

// The child did not make progress. It has registered the waker for a
// later wake up. Proceed with the other childs.
Poll::Ready(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Poll::Ready(_) => {
Poll::Pending => {

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 2b40afa

}

match self.child_2.poll(cx) {
// As above.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to mention that using continue in here will go back to child1 which could make progress again.

There is a subtle difference between looping over the entire thing and exhausting each child until it returns Pending. We want the former, not the latter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though not sure how to put it into words.

There is this line already:

        // or `continue`, thus polling `child_1` again. `child_1` can potentially make more progress:
        continue

Could you make a suggestion @thomaseizinger ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to just fill in the match here too and make a comment next to the continue of the Poll::Pending branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to just fill in the match here too

As in the same as the above? What would be the benefit of that? Isn't that just duplicating the same comments once more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't duplicate the entire explanation but perhaps have a complete example of the control flow and make one comment at the bottom that highlights that we always continue to go back to the top. Even though it should be clear from the written text, I find that seeing the entire control flow is better.

Maybe also build in the case where the event from child2 is dispatched to child1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added ed4d633. Let me know what you think.


match self.child_3.poll(cx) {
// As above.
}

// ...

// None of the child state machines can make any more progress. Each registered
// the waker in order for the root `Future` task to be woken up again.
return Poll::Pending
}
}
```

### Prioritize local work over new work from a remote

When handling multiple work streams, prioritize local work items over
accepting new work items from a remote. Take the following state machine as an
example, reading and writing from a socket, returning result to its parent:

``` rust
struct SomeStateMachine {
socket: Socket,
events_to_return_to_parent: VecDeque<Event>,
messages_to_send_on_socket: VecDeque<Message>,
}

impl Stream for SomeStateMachine {
type Item = Event;

fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
// First priority is returning local finished work.
if let Some(event) = events_to_return_to_parent.pop_front() {
return Poll::Ready(Some(event));
}

// Second priority is finishing local work, i.e. sending on the socket.
if let Poll::Ready(()) = socket.poll_ready(cx) {
todo!("Send messages")
}

// Last priority is accepting new work, i.e. reading from the socket.
if let Some(work_item) = socket.poll_next(cx) {
todo!("Start work on new work item")
}
thomaseizinger marked this conversation as resolved.
Show resolved Hide resolved
}
}
```

This priotization provides:
- Low memory footprint as local queues (here `events_to_return_to_parent`) stay small.
- Low latency as accepted local work is not stuck in queues.
- DOS defense as a remote does not control the size of the local queue, nor starves local work with its remote work.

## Bound everything

The concept of unboundedness is an illusion. Use bounded mechanisms to prevent
unbounded memory growth and high latencies.

### Channels

When using channels (e.g. `futures::channel::mpsc` or `std::sync::mpsc`)
always use the bounded variant, never use the unbounded variant. When using a
Comment on lines +180 to +181
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use clippy's mechanism of banning these functions!

https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Though I am failing to have clippy pick up my Clippy.toml when I save it in the root of the repository.

disallowed-methods = [
  { path = "futures::sync::mpsc::channel", reason = "does not enforce backpressure" },
]

@thomaseizinger does this work for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reliably works for me after a cargo clean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be introduced with #2823.

bounded channel, a slow consumer eventually slows down a fast producer once
the channel bound is reached, ideally granting the slow consumer more system
resources e.g. CPU time, keeping queues small and thus latencies low. When
using an unbounded channel a fast producer continues being a fast producer,
growing the channel buffer indefinitely, increasing latency until the illusion
of unboundedness breaks and the system runs out of memory.

One may use an unbounded channel if one enforces backpressure through an
out-of-band mechanism, e.g. the consumer granting the producer send-tokens
through a side-channel.

### Local queues

As for channels shared across potentially concurrent actors (e.g. future tasks
or OS threads), the same applies for queues owned by a single actor only. E.g.
reading events from a socket into a `Vec<Event>` without some mechanism
bounding the size of that `Vec<Event>` again can lead to unbounded memory
growth and high latencies.

Note that rust-libp2p fails at this guideline, i.e. still has many unbounded
local queues.

### Further reading

- https://en.wikipedia.org/wiki/Bufferbloat
- https://apenwarr.ca/log/20170814
- https://twitter.com/peterbourgon/status/1212800031406739456

## No premature optimizations

Optimizations that add complexity need to be accompanied with a proof of their
effectiveness.

This as well applies to increasing buffer or channel sizes, as the downside of
such pseudo optimizations is increased memory footprint and latency.

## Keep things sequential unless proven to be slow

Concurrency adds complexity. Concurrency adds overhead due to synchronization.
Thus unless proven to be a bottleneck, don't make things concurrent. As an example
the hierarchical `NetworkBehaviour` state machine, run sequentially. It is easy
to debug as it runs sequentially. Thus far there has been no proof that
shows a speed up when running it concurrently.

## Use `async/await` for sequential execution only

Using `async/await` does not allow accesing methods on the object being
`await`ed unless paired with some synchronization mechanism like an
`Arc<Mutex<_>>`.

Using `async/await` for sequential execution makes things significantly simpler.

Example: Read and once done write from/to a socket. Use `async/await`.

``` rust
socket.read_exact(&mut read_buf).await;
socket.write(&write_buf).await;
```

Example: Read and concurrently write from/to a socket. Use `poll`.

``` rust
loop {
match socket.poll_read(cx, &mut read_buf) {
Poll::Ready(_) => {
todo!();
continue;
}
Poll::Pending => {}
}
match socket.poll_write(cx, &write_buf) {
Poll::Ready(_) => {
todo!();
continue;
}
Poll::Pending => {}
}

return Poll::Pending;
}
```

When providing `async` methods, make it explicit whether it is safe to cancel
the resulting `Future`, i.e. whether it is safe to drop the `Future` returned
by the `async` method.

## Don't communicate by sharing memory; share memory by communicating.

The majority of rust-libp2p's code base follows the above Golang philosophy,
e.g. using channels instead of mutexes. This pattern enforces single ownership
over data, which works well with Rust's ownership model and makes reasoning
about data flow easier.

### Further Reading

- https://go.dev/blog/codelab-share