New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close eventfd shutdown/wakeup race by closely tracking epoll edges #9586
Conversation
Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@johnou FYI... "re-reverted" another one... |
|
||
// Ordered store is sufficient here since the only access outside this | ||
// thread is a getAndSet in the wakeup() method | ||
wakenUp.lazySet(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was lazySet safe here based on what @njhill said the other day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnou good question. To distil the problem, say we have fields
final AtomicInteger x = new AtomicInteger(1), y = new AtomicInteger(0);
Thread 1 program order:
x.lazySet(0);
if (y.get() == 0) { print("sleep"); }
Thread 2 program order:
y.compareAndSet(0, 1); // (always succeeds)
if (x.compareAndSet(0, 1)) { print("wake"); }
Is it possible for "sleep" to be printed without "wake" also being printed? I now think the answer is yes, even more so after reading this. Which means that lazySet
isn't safe, but would be good to get confirmation from others... @franz1981?? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've said right: it isn't safe and set would be a better choice for a total ordered operation (among the others) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franz1981 please review the rest as well
I pushed a commit to change the |
Just pushed another commit 8523b5a:
@normanmaurer would be good to sanity test again with these updates when you have a chance! |
@netty-bot test this please |
@njhill all works... just merged |
@normanmaurer would you like me to port this and #9397 to master? |
@njhill yep please |
(netty#9586) Motivation This is another iteration of netty#9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes netty#9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@normanmaurer here is master-based commit for this one: 8ed2222, I see you had actually already done #9397. |
@normanmaurer should I open a PR from the above or will you cherrypick? |
@njhill pr please... thanks |
Motivation This is a vestige that was removed in the original PR netty#9535 before it was reverted, but we missed it when re-applying in netty#9586. It means there is a possible race condition because a wakeup event could be missed while shutting down, but the consequences aren't serious since there's a 1 second safeguard timeout when waiting for it. Modification Remove call to epollWaitNow() in EpollEventLoop#closeAll() Result Cleanup redundant code, avoid shutdown delay race condition
…9586) (#9612) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
) Motivation This is a vestige that was removed in the original PR #9535 before it was reverted, but we missed it when re-applying in #9586. It means there is a possible race condition because a wakeup event could be missed while shutting down, but the consequences aren't serious since there's a 1 second safeguard timeout when waiting for it. Modification Remove call to epollWaitNow() in EpollEventLoop#closeAll() Result Cleanup redundant code, avoid shutdown delay race condition
Motivation
This is another iteration of #9476.
Modifications
Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.
This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.
Result
Race condition eliminated. Fixes #9362
Co-authored-by: Norman Maurer norman_maurer@apple.com