Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rumqttc: Throttling of pending requests can cause starvation on the request side #814

Open
flxo opened this issue Mar 7, 2024 · 6 comments

Comments

@flxo
Copy link
Contributor

flxo commented Mar 7, 2024

Current Behavior

The delay between pending request is awaited in parallel to incoming frames. A incoming frame cancels a potential pending delay. The delay is started from scratch in the next call to poll. Cyclic incoming frames with a period lower than the throttle duration leads to starvation on the request side.

Failure Information (for bugs)

This is a bug. I have no traces - found by review.

@flxo flxo changed the title Throttling of pending requests can cause starvation on the request side rumqttc: Throttling of pending requests can cause starvation on the request side Mar 7, 2024
@swanandx
Copy link
Member

swanandx commented Mar 7, 2024

Hey, please check if i understood this correctly:

  • if incoming packet comes before the pending throttle ( in case if there were some pending requests ), the next_request future will be cancelled.
  • if the incoming packet keeps coming at higher frequency, it might not get chance to complete the next_request, thus starve. ( i mentioned packet not frame because readb is resolved only when whole packet is read ).

my thoughts / questions:

  • pending_throttle is 0ms by default and is configurable.
  • correct me if wrong, if we give priority to pending requests, that would mean we are starving on incoming packets right?

due to this, and my limited context, i think this might be a trade-off rather than an issue. If you think otherwise, can you please elaborate the context more?

thank you so much!

@flxo
Copy link
Contributor Author

flxo commented Mar 7, 2024

Hey, please check if i understood this correctly:

  • if incoming packet comes before the pending throttle ( in case if there were some pending requests ), the next_request future will be cancelled.
  • if the incoming packet keeps coming at higher frequency, it might not get chance to complete the next_request, thus starve. ( i mentioned packet not frame because readb is resolved only when whole packet is read ).

Exactly.

my thoughts / questions:

  • pending_throttle is 0ms by default and is configurable.
  • correct me if wrong, if we give priority to pending requests, that would mean we are starving on incoming packets right?

Both - the rx and the sleep - are polled in the select!. So we cannot starve on the rx path but incoming packets can lead (as you wrote above) to starvation on the pending requests. Only the requests in the VecDec are affected - not the requests from the channel.

A scenario where this can happen is when there are pending requests and a configured throttle. The client connects with clean_session false and instantly gets publications from the broker.

due to this, and my limited context, i think this might be a trade-off rather than an issue. If you think otherwise, can you please elaborate the context more?

Think it's just a bug in the implementation. The poll fn doesn't store any context regarding the throttle. The sleep delay is always started from scratch. To solve this I'd create a Stream of Requests and like this and hold that stream in EventLoop. The code behind the link has the same bug as main. The throttled stream is created newly upon each call of poll.
Getting that stream in EventLoop is probably hard because of lifetime issues. To tackle this an option could be to spawn a task that loops on poll and communicates the events via channel. The channel is fed directly from State. This would also avoid the nasty Ok(self.state.events.pop_front().unwrap()) that feel odd in current implementation.
Such a task would also allow to interleave rx and tx similar to this. This all would be a rather big refactoring with a lot of room for errors.

@de-sh
Copy link
Member

de-sh commented Mar 8, 2024

Loved the Codec implementation, let's start by implementing that in the first PR? I'll take this up if you are ok with it. We can go about this by dividing up the work.

@flxo
Copy link
Contributor Author

flxo commented Mar 8, 2024

Loved the Codec implementation, let's start by implementing that in the first PR? I'll take this up if you are ok with it. We can go about this by dividing up the work.

I'm absolutely fine with that but think this is a general and architectural decision which must be approved within the rumqtt team.

Some notes:

The branch is probably a good starting point but needs something more:

  • Ensure that enqueued packets are not discarded when the connection is reestablished. Framed::into_parts gives access to the buffers. Alternatively reuse the Framed instance from Network.
  • Solving the issue described above is probably more invasive. The stream of pending requests must be stored within EventLoop in order to retain the throttling across calls to poll as described above.
  • Moving the loop into a task would require even more: Introduce a channel for the events (best use a Sink interface with explicit flushing to get less task switches). poll would just asynchronically receive on the channel (Even better: implement Stream for EventLoop).

@de-sh
Copy link
Member

de-sh commented Mar 8, 2024

I'm absolutely fine with that but think this is a general and architectural decision which must be approved within the rumqtt team.

Noted, we will discuss this and get back to you.

Alternatively reuse the Framed instance from Network.

That sounds like a great option, we could just as well re-establish a connection and use the buffers as it was from the previous connection, dropping corrupted/incomplete packet bytes.

Moving the loop into a task would require even more: Introduce a channel for the events (best use a Sink interface with explicit flushing to get less task switches). poll would just asynchronically receive on the channel (Even better: implement Stream for EventLoop).

Forgive me if my understanding is naive, but I believe it is better to not spawn a separate task and use something like a timer that doesn't get cancelled to handle this, we could ignore it and set it to never when nothing is pending?

@flxo
Copy link
Contributor Author

flxo commented Mar 8, 2024

Alternatively reuse the Framed instance from Network.

That sounds like a great option, we could just as well re-establish a connection and use the buffers as it was from the previous connection, dropping corrupted/incomplete packet bytes.

Hm. The tx buffer should be fine to keep in total because it's flushed. The rx buffer is likely to contain incomplete packets. I don't know the MQTT standard by heart but probably it would be fine the process the complete frames and drop the possible last incomplete. The implementation today discards an incomplete received frame upon network errors. Nothing you can do here...

Moving the loop into a task would require even more: Introduce a channel for the events (best use a Sink interface with explicit flushing to get less task switches). poll would just asynchronically receive on the channel (Even better: implement Stream for EventLoop).

Forgive me if my understanding is naive, but I believe it is better to not spawn a separate task and use something like a timer that doesn't get cancelled to handle this, we could ignore it and set it to never when nothing is pending?

Also possible and less invasive. This would be another select! branch. The request_rx branch is guarded with self.pending.is_empty().

flxo pushed a commit to flxo/rumqtt that referenced this issue Mar 19, 2024
Store the pending throttle interval within the EventLoop.

Fixes: bytebeamio#814
flxo pushed a commit to flxo/rumqtt that referenced this issue Mar 26, 2024
Store the pending throttle interval within the EventLoop.

Fixes: bytebeamio#814
flxo pushed a commit to flxo/rumqtt that referenced this issue Mar 26, 2024
Store the pending throttle interval within the EventLoop.

Fixes: bytebeamio#814
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants