Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashedWheelTimer task scheduling behavior changed in 4.1.85 #13018

Closed
lhotari opened this issue Nov 25, 2022 · 4 comments · Fixed by #13021
Closed

HashedWheelTimer task scheduling behavior changed in 4.1.85 #13018

lhotari opened this issue Nov 25, 2022 · 4 comments · Fixed by #13021

Comments

@lhotari
Copy link
Contributor

lhotari commented Nov 25, 2022

Expected behavior

The expectation is that task scheduling behavior of HashedWheelTimer doesn't change significantly between Netty 4.1.x releases.

Actual behavior

The HashedWheelTimer behavior changed in some way that makes multiple integration tests to fail in Apache Pulsar. The only change in HashedWheelTimer in 4.1.85.Final is the PR #12888 .

Steps to reproduce

There are steps to reproduce by running a specific test in Apache Pulsar project. The instructions are in a PR in the Apache Pulsar repo:
apache/pulsar#18599 (comment)

Netty version

4.1.85.Final

JVM version (e.g. java -version)

openjdk version "17.0.5" 2022-10-18
OpenJDK Runtime Environment Temurin-17.0.5+8 (build 17.0.5+8)
OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (build 17.0.5+8, mixed mode, sharing)

OS version (e.g. uname -a)

Linux x86_64

@lhotari
Copy link
Contributor Author

lhotari commented Nov 25, 2022

@chrisvest I investigated the issue and didn't find a behavior change in HashedWheelTimer. I used HashedWheelTimerTest and made several variations of testExecutionOnTime.

There might be some subtle change that causes the issue, and we'll just have to deal with that in Pulsar.

@needmorecode
Copy link
Contributor

@lhotari After reading your comment, I reviewed my code and realised I made a mistake in #12888.
My optimisation based on the assumption that the tasks in a bucket are originally in the order of execution time, which are in fact not.
So breaking the loop at execRound > currRound may cause latter tasks not respond in time.

@needmorecode
Copy link
Contributor

Sorry for making that trouble. I already made a PR #13021 to revert it. @lhotari @chrisvest

@lhotari
Copy link
Contributor Author

lhotari commented Nov 27, 2022

@lhotari After reading your comment, I reviewed my code and realised I made a mistake in #12888.
My optimisation based on the assumption that the tasks in a bucket are originally in the order of execution time, which are in fact not.
So breaking the loop at execRound > currRound may cause latter tasks not respond in time.

@needmorecode Thanks for the quick confirmation and investigation. Your explanation makes sense. I missed that case when I was trying to add a unit test that would prove an issue.

normanmaurer pushed a commit that referenced this issue Nov 28, 2022
Motivation:

The code I commited in #12888 may cause unexpected task scheduling problems.


Modification:

Revert this commit.


Result:

Fixes #13018 .
normanmaurer pushed a commit that referenced this issue Nov 28, 2022
Motivation:

The code I commited in #12888 may cause unexpected task scheduling problems.


Modification:

Revert this commit.


Result:

Fixes #13018 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants