Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive calls to poll and cloning of Wakers. #4990

Open
jakubDoka opened this issue Dec 8, 2023 · 7 comments
Open

Excessive calls to poll and cloning of Wakers. #4990

jakubDoka opened this issue Dec 8, 2023 · 7 comments

Comments

@jakubDoka
Copy link

Summary

I perfed my protocol and discovered major hot spots within the repo code mostly ending up in libp2p_swarm::connection::Connection<THandler>::poll and yamux implementation. From the looks of it the "nothing to do" paths are constantly being hit and subsequently the Arcs dropped and Wakers cloned. I quickly noticed that code usually tries to poll everything anytime one central Waker triggers.

Expected behavior

Ideally, code selecting many futures in parallel should branch the main Waker for each future to collect information on what actually needs polling.

Actual behavior

Most of the hot instructions are doing atomic operations within Waker::clone and code almost always looks like this:

pub fn set_waker(old: &mut Option<Waker>, new: &Waker) {
    *old = Some(new.clone());
}

The most significant hot spot though is something different. here we unconditionally create Arc for each available protocol as the last thing before breaking the loop. The crux of the problem is that we call the poll to drop some Arcs and clone Wakers most of the time with no real work getting done.

Relevant log output

No response

Possible Solution

Simplest fix for updating cached wakers is refactoring the function to:

pub fn set_waker(old: &mut Option<Waker>, new: &Waker) {
    if let Some(old) = old {
        // clone_from will clone only if `new` won't wake the `old`
        old.clone_from(new);
    } else {
        *old = Some(new.clone());
    }
}

Although this is just fixing the symptom, its still a substantial improvement considering how easy it is to implement.

Another simple fix is avoiding the Arc creation in mentioned section of the code. I managed to pull it off but not without breaking the API (set intersection and difference iterators getting replaced with SmallVecs in ProtocolChange events).

Now for the main cause of all this...

Excessive Polling

All the mentioned code is almost exclusively placed at spots where handwritten futures ran out of things to do, this is also the code that is executed all the time. My conclusion is that we poll when we don't need to. Solving this though is hard and requires lot of changes, and careful testing (since eliminating excessive polls is painful since bugs exhibit as program getting blocked forever).

Version

b7914e4

Would you like to work on fixing this bug ?

Yes

@jakubDoka
Copy link
Author

jakubDoka commented Dec 9, 2023

After some more research... I found out bottleneck is where is should be, I mistook the instruction for something else, so excessive polling most likely does not happen, consider this issue irrelevant.

@jakubDoka jakubDoka closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2023
@mxinden
Copy link
Member

mxinden commented Dec 9, 2023

Thank you for the research @jakubDoka!

The most significant hot spot though is something different. here we unconditionally create Arc for each available protocol as the last thing before breaking the loop. The crux of the problem is that we call the poll to drop some Arcs and clone Wakers most of the time with no real work getting done.

This is further discussed in #4284. Help much appreciated @jakubDoka.

Ideally, code selecting many futures in parallel should branch the main Waker for each future to collect information on what actually needs polling.

Agreed. I would assume this to be especially powerful within the NetworkBehaviour and ConnectionHandler hierarchies. Again, help appreciated. Though this requires some benchmarking.

@jakubDoka
Copy link
Author

jakubDoka commented Dec 9, 2023

@mxinden should I make a small pr that fixes the #4284. Disadvantage of the fix is that the libp2p_swarm::handler::ProtocolChange changes from difference iterator to owned SmallVec since the iterator is set specific and borrowed, so that prevents me from changing the storing strategy of protocols.

@thomaseizinger
Copy link
Contributor

I'd like to see some benchmarks that moving to SmallVec actually fixes this. Another problem is that we always call StreamProtocol::try_from_owned which we can only get rid of by changing what UpgradeInfo returns (or by removing it entirely, see #3268 and related issues).

It is an unfortunate penality that we pay here as this scales with N (the number of protocols you plug into Swarm). I'd really like to fix it. The good news is that it shouldn't matter much (compared to the actual CPU time spent on your protocols) for small number of protocols. A proper fix for this is hard I think. Help is much appreciated though! :)

@thomaseizinger
Copy link
Contributor

Disadvantage of the fix is that the libp2p_swarm::handler::ProtocolChange changes from difference iterator to owned SmallVec since the iterator is set specific and borrowed, so that prevents me from changing the storing strategy of protocols.

The storage should be an implementation detail. Why is that a breaking change?

@jakubDoka
Copy link
Author

jakubDoka commented Dec 9, 2023

Disadvantage of the fix is that the libp2p_swarm::handler::ProtocolChange changes from difference iterator to owned SmallVec since the iterator is set specific and borrowed, so that prevents me from changing the storing strategy of protocols.

The storage should be an implementation detail. Why is that a breaking change?

@thomaseizinger because we use hash_set::Difference<StreamProtocol> (in api) which means I need to store HashSet<StreamProtocol> to constrct it

@jakubDoka
Copy link
Author

jakubDoka commented Dec 10, 2023

I discovered quite the opposite problem when experimenting with optimized polling. Without proper waking in my handler, the whole swarm gets stuck because connection event does not cause a repoll anymore (with scoped waking). This behaviour is valid one as documented in the standard library. Future may be polled more then needed though that should not be relied upon (wake if you want to be polled), but, e.g., kad still relies on this.

@jakubDoka jakubDoka reopened this Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants