New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinstate decoupled timerfd logic in EpollEventLoop #9590
Conversation
Can one of the admins verify this patch? |
@normanmaurer I've attempted to stress this a bit and haven't managed to break it. Would be good if you could re-test with your own testsuite when you have time. |
@njhill can you please rebase ? |
@njhill also just re-run the test suite and have the same issue as before... I hope I will have some time this week to look into it but I wouldn't hold my breath for it |
8f779bf
to
810e18f
Compare
Motivation This reinstates the parts of 1fa7a5e and a22d4ba that were reverted in 7f39142 (applies to EpollEventLoop class only), with some minor changes. It sounds like this still exhibits some testsuite failures even with the lazySet fixes, but I thought it would be good to have a PR with the "latest" version to continue to test/debug. Modification Changes are the same as before apart a few added comments and the following in EpollEventLoop#checkScheduleTaskQueueForNewDelay: - Use set instead of lazySet to avoid race condition - Avoid arming timer and entering epoll wait if a task has already expired - Switch order of calling setTimerFd and updating the nextDeadlineNanos AtomicLong within the sync block The latter two are perf refinements and should not have contributed to the issues seen. Result Performance benefits from having timerfd updates decoupled from the event loop, as originally introduced by @Scottmitch
810e18f
to
125dc27
Compare
@normanmaurer thanks for re-running the test, now rebased on 4.1, understand re the time to debug. Good that we've narrowed it at least. I wonder if you could provide any other clues as to the nature of the failures/tests... do I understand correctly that it's essentially delayed tasks scheduled on the EL timing-out? Might it involve channel/EL shutdown by any chance? |
@njhill is this still a thing after we merged all the other stuff ? |
@normanmaurer yes there's some remaining stuff which was originally done in #7834:
Maybe it would be better to deal with these individually, and open separate PRs for those we think still worthwhile? I think we should at least do (1) ... the logic is actually already there (wasn't reverted) just not currently called. (2) would also be a nice simplification but I'm unsure whether there are any possible negative implications. We were going ahead with it before (until reverted for different reasons), so I guess it should be ok. There will still be a cap on how many times the task queue is processed before checking IO again. The relative value of (3) I think is lower now that #9605 is merged, but it might still be beneficial since some syscalls should be saved in some specific circumstances. I'm not sure if those would now be too narrow to be worth it though. The downside is some additional complexity and probably some more expensive atomic operations on the hot paths (to support the added coordination needed). Would be good to get thoughts from you and @Scottmitch and anyone else on these decisions... |
@njhill lets go for 1) and let us just "dismiss" the rest for now. |
Can one of the admins verify this patch? |
I will just close this for now ... If we want to pick this up again lets reopen a new pr |
Motivation
This reinstates the parts of 1fa7a5e and a22d4ba that were reverted in 7f39142 (applies to
EpollEventLoop
class only), with some minor changes.It sounds like this still exhibits some testsuite failures even with the lazySet fixes, but I thought it would be good to have a PR with the "latest" version to continue to test/debug.
Modification
Changes are the same as before apart a few added comments and the following in
EpollEventLoop#checkScheduleTaskQueueForNewDelay
:set
instead oflazySet
to avoid race conditionsetTimerFd
and updatingnextDeadlineNanos
within the sync blockThe latter two are perf refinements and should not have contributed to the issues seen.
Result
Performance benefits from having timerfd updates decoupled from the event loop, as originally introduced by @Scottmitch