Reinstate decoupled timerfd logic in EpollEventLoop #9590

njhill · 2019-09-21T18:44:16Z

Motivation

This reinstates the parts of 1fa7a5e and a22d4ba that were reverted in 7f39142 (applies to EpollEventLoop class only), with some minor changes.

It sounds like this still exhibits some testsuite failures even with the lazySet fixes, but I thought it would be good to have a PR with the "latest" version to continue to test/debug.

Modification

Changes are the same as before apart a few added comments and the following in EpollEventLoop#checkScheduleTaskQueueForNewDelay:

Use set instead of lazySet to avoid race condition
Avoid arming timer and entering epoll wait if a task has already expired
Switch order of calling setTimerFd and updating nextDeadlineNanos within the sync block

The latter two are perf refinements and should not have contributed to the issues seen.

Result

Performance benefits from having timerfd updates decoupled from the event loop, as originally introduced by @Scottmitch

netty-bot · 2019-09-21T18:44:20Z

Can one of the admins verify this patch?

njhill · 2019-09-22T18:12:27Z

@normanmaurer I've attempted to stress this a bit and haven't managed to break it. Would be good if you could re-test with your own testsuite when you have time.

normanmaurer · 2019-09-23T13:31:32Z

@njhill can you please rebase ?

normanmaurer · 2019-09-23T13:39:24Z

@njhill also just re-run the test suite and have the same issue as before... I hope I will have some time this week to look into it but I wouldn't hold my breath for it

@Scottmitch

Motivation This reinstates the parts of 1fa7a5e and a22d4ba that were reverted in 7f39142 (applies to EpollEventLoop class only), with some minor changes. It sounds like this still exhibits some testsuite failures even with the lazySet fixes, but I thought it would be good to have a PR with the "latest" version to continue to test/debug. Modification Changes are the same as before apart a few added comments and the following in EpollEventLoop#checkScheduleTaskQueueForNewDelay: - Use set instead of lazySet to avoid race condition - Avoid arming timer and entering epoll wait if a task has already expired - Switch order of calling setTimerFd and updating the nextDeadlineNanos AtomicLong within the sync block The latter two are perf refinements and should not have contributed to the issues seen. Result Performance benefits from having timerfd updates decoupled from the event loop, as originally introduced by @Scottmitch

njhill · 2019-09-23T18:05:28Z

@normanmaurer thanks for re-running the test, now rebased on 4.1, understand re the time to debug. Good that we've narrowed it at least.

I wonder if you could provide any other clues as to the nature of the failures/tests... do I understand correctly that it's essentially delayed tasks scheduled on the EL timing-out? Might it involve channel/EL shutdown by any chance?

normanmaurer · 2019-10-12T18:17:57Z

@njhill is this still a thing after we merged all the other stuff ?

njhill · 2019-10-15T06:13:24Z

@normanmaurer yes there's some remaining stuff which was originally done in #7834:

Execute expired scheduled tasks directly rather than first dumping them into the main task queue
Get rid of IO ratio
Make required timerfd updates directly instead of waking event loop, for scheduled tasks submitted while the EL is waiting

Maybe it would be better to deal with these individually, and open separate PRs for those we think still worthwhile?

I think we should at least do (1) ... the logic is actually already there (wasn't reverted) just not currently called. (2) would also be a nice simplification but I'm unsure whether there are any possible negative implications. We were going ahead with it before (until reverted for different reasons), so I guess it should be ok. There will still be a cap on how many times the task queue is processed before checking IO again.

The relative value of (3) I think is lower now that #9605 is merged, but it might still be beneficial since some syscalls should be saved in some specific circumstances. I'm not sure if those would now be too narrow to be worth it though. The downside is some additional complexity and probably some more expensive atomic operations on the hot paths (to support the added coordination needed).

Would be good to get thoughts from you and @Scottmitch and anyone else on these decisions...

normanmaurer · 2019-10-16T22:27:03Z

@njhill lets go for 1) and let us just "dismiss" the rest for now.

netty-bot · 2020-06-23T07:29:10Z

Can one of the admins verify this patch?

normanmaurer · 2020-12-23T20:01:44Z

I will just close this for now ... If we want to pick this up again lets reopen a new pr

njhill force-pushed the timerfd_decouple_reinstate branch from 8f779bf to 810e18f Compare September 23, 2019 17:45

njhill force-pushed the timerfd_decouple_reinstate branch from 810e18f to 125dc27 Compare September 23, 2019 17:51

njhill changed the base branch from epoll_eventfd_race to 4.1 September 23, 2019 17:51

normanmaurer mentioned this pull request Sep 27, 2019

Avoid unnecessary event loop wakeups #9605

Merged

njhill mentioned this pull request Nov 25, 2019

Clean up NioEventLoop #9799

Merged

normanmaurer force-pushed the 4.1 branch from cf4203f to 9da336f Compare December 23, 2020 08:32

normanmaurer closed this Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinstate decoupled timerfd logic in EpollEventLoop #9590

Reinstate decoupled timerfd logic in EpollEventLoop #9590

njhill commented Sep 21, 2019 •

edited

netty-bot commented Sep 21, 2019

njhill commented Sep 22, 2019

normanmaurer commented Sep 23, 2019

normanmaurer commented Sep 23, 2019

njhill commented Sep 23, 2019

normanmaurer commented Oct 12, 2019

njhill commented Oct 15, 2019

normanmaurer commented Oct 16, 2019

netty-bot commented Jun 23, 2020

normanmaurer commented Dec 23, 2020

Reinstate decoupled timerfd logic in EpollEventLoop #9590

Reinstate decoupled timerfd logic in EpollEventLoop #9590

Conversation

njhill commented Sep 21, 2019 • edited

netty-bot commented Sep 21, 2019

njhill commented Sep 22, 2019

normanmaurer commented Sep 23, 2019

normanmaurer commented Sep 23, 2019

njhill commented Sep 23, 2019

normanmaurer commented Oct 12, 2019

njhill commented Oct 15, 2019

normanmaurer commented Oct 16, 2019

netty-bot commented Jun 23, 2020

normanmaurer commented Dec 23, 2020

njhill commented Sep 21, 2019 •

edited