Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal provide support for notifying the latest added waiter (LIFO) for the Notify interface #6506

Open
pfreixes opened this issue Apr 22, 2024 · 15 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-sync Module: tokio/sync

Comments

@pfreixes
Copy link

Is your feature request related to a problem? Please describe.

Current implementation of notify_one uses FIFO semantics for returning the oldest waiter added into the linked list. While it seems fair this does not play very well when the system is under a lot of pressure and for keeping high percentile statistics under the expected SLOs. By dequeuing the waiter that was waiting for a longer period of time there is higher chance of increasing the overall latency of the system, measured as for example with the percentile 99.

Describe the solution you'd like
This could be mitigated by changing - or providing an altenative solution - the strategy, by dequeueing the latest waiter added (LIFO), which will make sure that we process the element that will have more chances to not hurt the latencies of our system, since was the one in the queue for a shorter period of time

Additional context

If you believe that this could be a good addition for Tokio I would love to provide a PR with the implementation for supporting the new strategy LIFO but still keeping the current implementation FIFO as the default one.

If so, I would also appreciate a some guidance on how we could change current interfaces for limiting the breaking changes as much as possible .

@pfreixes pfreixes added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Apr 22, 2024
@Darksonn Darksonn added the M-sync Module: tokio/sync label Apr 22, 2024
@pfreixes
Copy link
Author

I've worked on a PR - still on WIP and without tests - which shows a possible implementation for addressing this issue pfreixes@1d20704.

@Darksonn
Copy link
Contributor

If we do this, it would most likely be a new notify_one_lifo method rather than a parameter to modify how notify_one works.

@pfreixes
Copy link
Author

If we do this, it would most likely be a new notify_one_lifo method rather than a parameter to modify how notify_one works

I was thinking on this too, but since this notify_one is also implicitly used here for waking up a new waiter when the initial waiter that we intended to woken up is being removed and will not take the notify. I got the feeling that would be much better to provide a cohesive API where the strategy (FIFO or LIFO) would be used behind the scenes consistently.

Unless we are able to wire in somehow the the strategy used between the call to notify_one or notify_one_lifo to the waiter. But having the feeling that in terms of implementation details this could end up in something less clear compared to the current proposal where we just configure the strategy used at Notify level.

Also not sure if there is a place for use cases where notify_one and notify_one_lifo calls could be mixed together.

WDYT?

@Darksonn
Copy link
Contributor

There are several reasons that I prefer notify_one_lifo to having a flag for setting the strategy.

  1. A notify_one_lifo method provides a simpler api for user.
  2. It doesn't increase the size of the Notify struct.

The point you make about forwarding notifications in drop is good. I'm not sure how I feel about this feature request given that. One could perhaps add a OneLifo case to the enum, but this seems to complicate the implementation more, so I don't love it.

@pfreixes
Copy link
Author

About your considerations:

A notify_one_lifo method provides a simpler api for user.

Might be simpler, but seems to me quite subjective and in the end having the feeling that would be better to have a way for providing a LIFO strategy than having nothing.

It doesn't increase the size of the Notify struct.

Its a 4 bytes increase for a struct that at least for use cases like this one are just shared references to the same struct, which for this use cases having a 4 bytes increase in size is kinda negligible. Not sure about other use cases, but considering that Notify is kinda used for syncing different tasks, I would expect a very similar usage pattern.

The point you make about forwarding notifications in drop is good. I'm not sure how I feel about this feature request given that. One could perhaps add a OneLifo case to the enum, but this seems to complicate the implementation more, so I don't love it.

I still prefer the option of configuring the strategy at the beginning, unless the points that you have raised are a deal breaker.

@Darksonn
Copy link
Contributor

I'm sorry, but if the API you are looking for is one with a waiter_notify_strategy parameter, then I think it would be a better fit for a new utility in a separate crate.

I've actually considered introducing a NotifyWaiters type that doesn't support notify_one. The motivation for this is that it can be done without the state field, reducing the size of the struct. I would use it in places like the watch channel.

@pfreixes
Copy link
Author

pfreixes commented Apr 24, 2024

I'm sorry, but if the API you are looking for is one with a waiter_notify_strategy parameter, then I think it would be a better fit for a new utility in a separate crate.

So either we reach an implementation where we can expose just notify_one_lifo or we should not move forward with this proposal? If its the case I can give it a try to the proposal of notify_one_lifo but having the feeling that the 4 bytes issue will be just transferred to the waiter state, but if it works for you I can give it a try.

I've actually considered introducing a NotifyWaiters type that doesn't support notify_one. The motivation for this is that it can be done without the state field, reducing the size of the struct. I would use it in places like the watch channel.

Oks if Im understanding this, if we move the extra required 4 bytes payload as part of the state for waking up a waiter - still to be proven how doable it is - for telling the strategy used, implementations like this one should not get impacted. right?

@Darksonn
Copy link
Contributor

Yes, I don't want to go for the solution with a waiter_notify_strategy parameter. I don't think you actually need to increase the size of the waiter since there are unused bits in the atomic.

That said, I would prefer to get some input from @carllerche before we continue here.

@pfreixes
Copy link
Author

I don't think you actually need to increase the size of the waiter since there are unused bits in the atomic.

Ah nice let me take a look to this.

@carllerche
Copy link
Member

If a waiter has been waiting for too long and you want to notify more recent waiters, wouldn't it be better to first notify the old waiters with an error or some signal telling them to error out?

@pfreixes
Copy link
Author

wouldn't it be better to first notify the old waiters with an error or some signal telling them to error out?

I guess that this is an expectation that should be managed by the caller. By for example limiting the time for processing a request, which if this request would be in waiting state for having its waiter associated waiting for getting a free DB connection, after a while and once reached the max time the waiter will be just removed without completing its task and not getting any resource. But this would need to happen regardless if the strategy for dequeuing waiters would be FIFO or LIFO.

Going back to the reason to why LIFO, when you have resources enough FIFO vs LIFO will not make a difference, but when you have contention and there is no enough resources for processing all waiters (i.e DB connections), by giving priority to the one that have been waiting longer by giving resources to them would mean that the overall latency of such operation would be worst compared to the latency that you would get giving this opportunity to the one that have been waiting less.

With a bit of luck the ones that would have been waiting longer, in case of a contention, will have the chance to fall into a higher latency percentile than the one that you could be using for measuring your SLO.

@pfreixes
Copy link
Author

I was trying to find some examples where LIFO is used for helping on the latency side, just found this use case from Netflix, where its claimed that:

To help keep success latencies low and minimize timeouts any blocked requests are processed in last in/first out order

@pfreixes
Copy link
Author

@Darksonn just for double checking, I guess that when you were telling me that there was plenty of space in the atomic, you were referring to this one, or you were referring to the Notify state?

@Darksonn
Copy link
Contributor

I was referring to AtomicNotification.

@pfreixes
Copy link
Author

FYI Ive created a PR with an implementation proposal #6520 based on the guidance from @Darksonn in case you believe that its worth it to consider this change, otherwise Im happy to keep the current discussion, 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-sync Module: tokio/sync
Projects
None yet
Development

No branches or pull requests

3 participants