Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase initialDelaySeconds for MOFED POD liveness probe #166

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 4 additions & 1 deletion manifests/stage-ofed-driver/0010_ofed-driver-ds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,10 @@ spec:
command:
[sh, -c, 'lsmod | grep mlx5_core']
periodSeconds: 30
initialDelaySeconds: 30
# starting from v1.21, Kubernetes doesn't use startupProbe during POD restarts
# to prevent crash loop after POD restarts, we should grant enough time to POD to
# fully boot before we start liveness checking
initialDelaySeconds: 570
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who did you come up with this number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this with @AbdYsn. We use the same delay as for startupProbe 10 minutes. If I remember correctly, @AbdYsn mentioned that choosing startupProbe delays is based on experiments on multipile different ENVs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ykulazhenkov can you point me to the kubernetes commit that change the behaviour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR changed behavior of startupProbe kubernetes/kubernetes#98376

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask community. Maybe this is some regression in Kuberntes, because I can't find any mentions that behavior of startupProbe should change.

failureThreshold: 1
successThreshold: 1
readinessProbe:
Expand Down