New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(swarm): NetworkBehaviour
macro rewritten to generate more optimal code
#5303
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Numbers for new macro
Compared to #5026.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thank you for this!
I don't have the capacity to review this in detail, will defer to @jxs. Overall, performance improvements are always welcome. We don't have many explicit tests that ensure the derive macro works correctly though. Esp. if we start to generate our own poll
impls, I am a bit worried that we are introducing a source of very-hard-to-find bugs.
We could offer the new version behind a feature toggle and let users experiment with it instead of re-writing it right away. For example, we can start using it internally for all our tests etc.
What do you think?
Feature flag sounds very reasonable (since I want to use this in my project), In that case, I can introduce a required field in the behavior that can store the state needed to also improve the main poll implementation. The main idea behind the custom poll is simple: // previous implementation
#(
if let std::task::Poll::Ready(event) = self.#fields.poll(cx) {
return std::task::Poll::Ready(event
.map_custom(#to_beh::#var_names)
.map_outbound_open_info(#ooi::#var_names)
.map_protocol(#ou::#var_names));
}
)*
// proposed implementation
let mut fuel = #beh_count;
while fuel > 0 {
// save the poll position to avoid repolling exhaused handlers
match self.field_index {
#(#indices => match self.#fields.poll(cx) {
std::task::Poll::Ready(event) =>
return std::task::Poll::Ready(event
.map_custom(#to_beh::#var_names)
.map_outbound_open_info(#ooi::#var_names)
.map_protocol(#ou::#var_names)),
std::task::Poll::Pending => {}
},)*
_ => {
self.field_index = 0;
continue;
}
}
self.field_index += 1;
fuel -= 1;
} in each poll:
This polling pattern ensures we don't repoll other behaviors while exhausting events from some point of the hierarchy. Hopefully, the compiler is smart enough to use a branch table for the match. For this to work, I need to maintain an integer between poll calls, thus the extra field. |
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏 |
Find myself doing this too lol. My implementation mimics existing |
This pull request has merge conflicts. Could you please resolve them @jakubDoka? 🙏 |
Description
I have rewritten
NetworkBehavior
derive macro to generate more optimal and faster to compile code when using more behaviours (5, 10, 20), I noticed performance degrades even though I benchmarked the same load. This is related to #5026.New macro implementation generates enums and structs for each type implementing the traits instead of type-level linked lists. In many cases, this makes resulting types more compact (we store just one enum tag, whereas composed
Either
s each need to store tags to make values addressable) and makes the enum dispatch constant. This also opened the opportunity to optimizeUpgradeInfoIterator
andConnectionHandler
into a state machine (they now remember where they stopped polling/iterating and skipped exhausted subhandlers/iterators). We could optimize theNetworkBehaviour
itself too, but it would require users to put extra fields into the struct (this could be optional for BC).Change checklist