fix(swarm): `NetworkBehaviour` macro rewritten to generate more optimal code #5303

jakubDoka · 2024-04-11T14:55:19Z

Description

I have rewritten NetworkBehavior derive macro to generate more optimal and faster to compile code when using more behaviours (5, 10, 20), I noticed performance degrades even though I benchmarked the same load. This is related to #5026.

New macro implementation generates enums and structs for each type implementing the traits instead of type-level linked lists. In many cases, this makes resulting types more compact (we store just one enum tag, whereas composed Eithers each need to store tags to make values addressable) and makes the enum dispatch constant. This also opened the opportunity to optimize UpgradeInfoIterator and ConnectionHandler into a state machine (they now remember where they stopped polling/iterating and skipped exhausted subhandlers/iterators). We could optimize the NetworkBehaviour itself too, but it would require users to put extra fields into the struct (this could be optional for BC).

Change checklist

I have performed a self-review of my code
I have made corresponding changes to the documentation (none)
I have added tests that prove my fix is effective or that my feature works (no regressions)
A changelog entry has been made in the appropriate crates

:

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

jakubDoka · 2024-04-11T15:44:15Z

Numbers for new macro

behaviours	iters	protocols per behaviour	-	per iter	-
1	10000	10	5.0441 ns	5.0892 ns	5.1472 ns
1	10000	100	7.8285 ns	8.0366 ns	8.3113 ns
1	10000	1000	19.439 ms	19.568 ms	19.731 ms
5	10000	2	6.1172 ns	6.2539 ns	6.4356 ns
5	10000	20	14.192 ns	14.835 ns	15.676 ns
5	10000	200	270.12 ms	272.66 ms	275.51 ms
10	10000	1	8.0800 ns	8.2733 ns	8.4751 ns
10	10000	10	18.930 ns	19.989 ns	21.327 ns
10	10000	100	301.00 ms	307.12 ms	313.86 ms
20	5000	1	12.019 ns	12.222 ns	12.488 ns
20	5000	10	40.238 ns	41.747 ns	43.697 ns
20	5000	100	731.50 ms	748.87 ms	767.03 ms

Compared to #5026.

behaviours	iters	protocols per behaviour	-	per iter	-	-	per iter	-	verdict
1	10000	10	6.9099 ns	7.0400 ns	7.1924 ns	+19.863%	+51.795%	+95.414%	Performance has regressed.
1	10000	100	11.301 ns	11.701 ns	12.220 ns	+3.6304%	+61.992%	+156.01%	No change in performance detected.
1	10000	1000	45.102 ms	45.699 ms	46.335 ms	+129.98%	+133.54%	+137.49%	Performance has regressed.
5	10000	2	6.7147 ns	6.8633 ns	7.0595 ns	-13.115%	+21.191%	+65.712%	No change in performance detected.
5	10000	20	33.995 ns	36.069 ns	38.805 ns	+56.477%	+186.56%	+410.85%	Performance has regressed.
5	10000	200	844.90 ms	863.90 ms	883.72 ms	+208.68%	+216.83%	+224.95%	Performance has regressed.
10	10000	1	9.6857 ns	9.8934 ns	10.168 ns	-5.8785%	+33.452%	+94.927%	No change in performance detected.
10	10000	10	353.64 ns	380.66 ns	415.67 ns	+1326.5%	+2344.9%	+4154.7%	Performance has regressed.
10	10000	100	1.2271 s	1.2532 s	1.2804 s	+296.43%	+308.03%	+320.38%	Performance has regressed.
20	5000	1	15.601 ns	16.091 ns	16.726 ns	+11.958%	+65.527%	+145.85%	Performance has regressed.
20	5000	10	128.33 µs	138.88 µs	152.28 µs	+237805%	+421868%	+752554%	Performance has regressed.
20	5000	100	1.9057 s	1.9361 s	1.9668 s	+151.25%	+158.54%	+166.02%	Performance has regressed.

thomaseizinger

Wow, thank you for this!

I don't have the capacity to review this in detail, will defer to @jxs. Overall, performance improvements are always welcome. We don't have many explicit tests that ensure the derive macro works correctly though. Esp. if we start to generate our own poll impls, I am a bit worried that we are introducing a source of very-hard-to-find bugs.

We could offer the new version behind a feature toggle and let users experiment with it instead of re-writing it right away. For example, we can start using it internally for all our tests etc.

What do you think?

jakubDoka · 2024-04-12T07:49:35Z

Feature flag sounds very reasonable (since I want to use this in my project), In that case, I can introduce a required field in the behavior that can store the state needed to also improve the main poll implementation. The main idea behind the custom poll is simple:

// previous implementation
#(
    if let std::task::Poll::Ready(event) = self.#fields.poll(cx) {
        return std::task::Poll::Ready(event
            .map_custom(#to_beh::#var_names)
            .map_outbound_open_info(#ooi::#var_names)
            .map_protocol(#ou::#var_names));
    }
)*

// proposed implementation
let mut fuel = #beh_count;
while fuel > 0 {
    // save the poll position to avoid repolling exhaused handlers
    match self.field_index {
        #(#indices => match self.#fields.poll(cx) {
            std::task::Poll::Ready(event) =>
                return std::task::Poll::Ready(event
                    .map_custom(#to_beh::#var_names)
                    .map_outbound_open_info(#ooi::#var_names)
                    .map_protocol(#ou::#var_names)),
            std::task::Poll::Pending => {}
        },)*
        _ => {
            self.field_index = 0;
            continue;
        }
    }
    self.field_index += 1;
    fuel -= 1;
}

in each poll:

we return pending if all handlers return pending
if behavior returns ready, the next poll will poll it first
if we start polling from later behavior, we wrap around if needed

This polling pattern ensures we don't repoll other behaviors while exhausting events from some point of the hierarchy. Hopefully, the compiler is smart enough to use a branch table for the match.

For this to work, I need to maintain an integer between poll calls, thus the extra field.

mergify · 2024-04-15T09:48:53Z

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

drHuangMHT · 2024-04-16T12:31:14Z

Find myself doing this too lol. My implementation mimics existing *Select structs(public or hidden). Somehow I managed to make it work, which is funny. Past discussion: #3902. TL;DR: type-based composition is preferred.

mergify · 2024-04-22T13:49:29Z

This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏

jakubDoka and others added 30 commits December 22, 2023 22:51

eliminating protocol cloning when nothing is happening

40ff8d1

Merge branch 'master' into master

d8417ea

switching to hashsets

b9ffdae

adding benchmark

6628dd4

modifiing benchmark

bacb93c

optimizing further

98b2eb1

making the protocol name lengths more realiztic

c271dbd

:

Merge branch 'master' into master

d5edce0

Merge branch 'master' into master

b7ab499

reducing dif

5860200

adding benchmark

adaaf7a

appliing suggestions

016a3fa

Merge branch 'master' into master

a7ec976

Merge branch 'master' into master

3a53071

Merge branch 'master' into master

8ff2371

Update swarm/src/handler.rs

63a5891

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Update swarm/src/connection.rs

bc72a1c

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Merge branch 'master' into master

2cf3210

temp

2e4061c

resolving more comments

18867e7

Merge branch 'master' into master

651b298

Merge branch 'master' into master

5aff909

Merge branch 'master' into master

fcfa598

Update swarm/src/handler.rs

5d344c4

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Update swarm/src/handler.rs

f96d187

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Update swarm/src/handler.rs

8e597d2

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Update swarm/src/handler.rs

0cee240

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Update swarm/src/handler.rs

044bbde

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Update swarm/src/handler.rs

69baf9c

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

refactored tests to make them more readable

d9c273a

mlokr and others added 17 commits April 11, 2024 12:34

changing the macro to not use nested binary generics

6576d94

changing the macro to not use nested binary generics

c7c5c87

more debugging

080757d

more debugging

0ca81f9

f

67d7525

fixing bound generation

83351d6

changing the macro to not use nested binary generics

82d6376

changing the macro to not use nested binary generics

d1c9269

more debugging

41606c3

more debugging

59822d6

Merge remote-tracking branch 'origin/macro-rework' into temporary

a2f16f6

f

ff1b807

changing the protocols to owned variant

d6c8fc0

renaming

019f3ca

Merge branch 'master' into temporary

1fd1491

accident

797a718

accident

cc25814

thomaseizinger reviewed Apr 11, 2024

View reviewed changes

jakubDoka and others added 2 commits April 12, 2024 09:52

Merge branch 'master' into network-behaviour-rework

6006d05

hinding rewritten macro behind feature flag

3690678

jakubDoka and others added 2 commits April 20, 2024 17:32

Merge branch 'master' into network-behaviour-rework

28e4ff5

cleaning up benchmarks

b889ae4

jakubDoka added 3 commits April 22, 2024 17:42

Merge branch 'master' into network-behaviour-rework

e3958d7

Merge branch 'master' into network-behaviour-rework

55aaebe

Merge branch 'master' into network-behaviour-rework

7bcabfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(swarm): `NetworkBehaviour` macro rewritten to generate more optimal code #5303

fix(swarm): `NetworkBehaviour` macro rewritten to generate more optimal code #5303

jakubDoka commented Apr 11, 2024

jakubDoka commented Apr 11, 2024

thomaseizinger left a comment

jakubDoka commented Apr 12, 2024

mergify bot commented Apr 15, 2024

drHuangMHT commented Apr 16, 2024

mergify bot commented Apr 22, 2024

fix(swarm): NetworkBehaviour macro rewritten to generate more optimal code #5303

Are you sure you want to change the base?

fix(swarm): NetworkBehaviour macro rewritten to generate more optimal code #5303

Conversation

jakubDoka commented Apr 11, 2024

Description

Change checklist

jakubDoka commented Apr 11, 2024

thomaseizinger left a comment

Choose a reason for hiding this comment

jakubDoka commented Apr 12, 2024

mergify bot commented Apr 15, 2024

drHuangMHT commented Apr 16, 2024

mergify bot commented Apr 22, 2024

fix(swarm): `NetworkBehaviour` macro rewritten to generate more optimal code #5303

fix(swarm): `NetworkBehaviour` macro rewritten to generate more optimal code #5303