New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readiness prober timeout do not run as expected #123931
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
The timeout code you linked to is the timeout on the CRI call the to the runtime, not the actual timeout on the probe exec call. That should be the timeout specified in the ExecSync request here: kubernetes/pkg/kubelet/cri/remote/remote_runtime.go Lines 487 to 493 in d1a2a13
If the longer timeout is being hit, that likely indicates a problem with the runtime, e.g. it timed out creating the exec process, or making the request. What version of containerd are you running? I wasn't able to reproduce this with a quick check: apiVersion: v1
kind: Pod
metadata:
name: probe-timeout
spec:
volumes:
- name: probe-log
emptyDir: {}
containers:
- name: alpine
image: alpine:latest
args: [sh, -c, 'while true; do sleep 100000; done']
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
exec:
command: ['sh', '-c', 'date >> /probe/readiness-log.txt && sleep 300']
volumeMounts:
- mountPath: /probe
name: probe-log Output
Note that the interval is sometimes 5s, and sometimes 10s. The 10s interval comes from the probe period, but the 5s is probably from retries on probe error. I'm not sure why it's inconsistent. In either case, this sounds like it's probably a containerd issue, not a Kubernetes issue. |
@tallclair I think this is due to the image . If you use the nginx instead of the alpine, you could reproduce the issue . I also make a strace for the busybox and nginx image . And found that the the sleep 300 is not triggered in busybox or alpine image , so it will trigger the readiness prober every timeout period.
|
Ah, interesting. I swapped the alpine image for nginx, and sure enough it replicated: apiVersion: v1
kind: Pod
metadata:
name: probe-timeout
spec:
volumes:
- name: probe-log
emptyDir: {}
containers:
- name: test
image: nginx
args: [sh, -c, 'while true; do sleep 100000; done']
readinessProbe:
timeoutSeconds: 5
exec:
command: ['sh', '-c', 'date >> /probe/readiness-log.txt && sleep 45']
volumeMounts:
- mountPath: /probe
name: probe-log Output:
Probe taking the full 45 seconds, rather than the desired 5. This looks like a containerd issue, not a Kubelet issue. I found containerd/containerd#9568 reporting the same issue. |
We are seeing the same problem and have been able to recreate #123931 (comment) on the following version combinations:
|
We also tested Kubernetes with crio and that combination works as expected. |
Seems like a containerd issue, so I think we can close this and track it over in containerd/containerd#9568 /close |
@haircommander: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
We meet the issue that the readiness probe timeout do not run as expected. It run as 2mins + timeout setting in actual
According to the below official kubernets doc, it define that the timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. After I read this infromation from the doc , i think it means if the prober command running time exceeded the timeout period, the kubelet will stop this prober action and start the next prober, butl it not in actual. The next prober action will run after the last prober action completed or last prober action lasted (2mins + timeout setting).
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes
kubernetes/pkg/kubelet/cri/remote/remote_runtime.go
Line 469 in d1a2a13
What did you expect to happen?
Could you plz modify the deafult timeout in code or add description in the officail doc ?
How can we reproduce it (as minimally and precisely as possible)?
After the pod running , will find that the new file will be created every (2mins+8s)=128s in /tmp path
strace the cri which pull the containerd and find that it will trigger the prober action every (2mins+8s)
if the timeout period setting < prober command runiig time< 2 mins. The new readiness prober will run after last prober command completed, not timeout period
Anything else we need to know?
No response
Kubernetes version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3-aliyun.1
Cloud provider
Alibaba Cloud
OS version
The text was updated successfully, but these errors were encountered: