New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase initialDelaySeconds for MOFED POD liveness probe #166
Increase initialDelaySeconds for MOFED POD liveness probe #166
Conversation
Before version 1.21, Kubernetes used startupProbe for "fresh" POD starts and restarts (caused by liveness check failed as an example). Starting from v1.21, startupProbe applied to "fresh" starts only. To prevent a crash loop after MOFED POD restarts, we should grant enough time to POD to fully boot before we start liveness checking. Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
# starting from v1.21, Kubernetes doesn't use startupProbe during POD restarts | ||
# to prevent crash loop after POD restarts, we should grant enough time to POD to | ||
# fully boot before we start liveness checking | ||
initialDelaySeconds: 570 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who did you come up with this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this with @AbdYsn. We use the same delay as for startupProbe
10 minutes. If I remember correctly, @AbdYsn mentioned that choosing startupProbe
delays is based on experiments on multipile different ENVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ykulazhenkov can you point me to the kubernetes commit that change the behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR changed behavior of startupProbe kubernetes/kubernetes#98376
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will ask community. Maybe this is some regression in Kuberntes, because I can't find any mentions that behavior of startupProbe should change.
Need to check with the community that the current behavior of startupProbe is expected. |
Issue in the Kubernetes repo: kubernetes/kubernetes#101064 |
Issue in Kubernetes confirmed. Fix in progress. No need to change timeouts in network-operator repo. Closed. |
Before version 1.21, Kubernetes used startupProbe for "fresh" POD
starts and restarts (caused by liveness check failed as an example).
Starting from v1.21, startupProbe applied to "fresh" starts only.
To prevent a crash loop after MOFED POD restarts, we should grant
enough time to POD to fully boot before we start liveness checking.