Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 minute slower since kubernets version 1.20.3 - [kubelet-check] Initial timeout of 40s passed. #2395

Closed
medyagh opened this issue Feb 20, 2021 · 24 comments
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@medyagh
Copy link
Member

medyagh commented Feb 20, 2021

Hello, on minikube we updated our default kubernetes to v1.20.4 and we noticed bumping from v1.20.2 to v1.20.3 causes 1 minute slow down on miikube which the logs shows it is in kubeadm init
kubernetes/minikube#10545


    ▪ Booting up control plane ...I0220 15:44:24.522119   82905 command_runner.go:123] > [control-plane] Creating static Pod manifest for "kube-apiserver"
I0220 15:44:24.522344   82905 command_runner.go:123] > [control-plane] Creating static Pod manifest for "kube-controller-manager"
I0220 15:44:24.522442   82905 command_runner.go:123] > [control-plane] Creating static Pod manifest for "kube-scheduler"
I0220 15:44:24.522659   82905 command_runner.go:123] > [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0220 15:44:24.522894   82905 command_runner.go:123] > [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
| I0220 15:45:04.513403   82905 command_runner.go:123] > [kubelet-check] Initial timeout of 40s passed.
/ I0220 15:45:28.017602   82905 command_runner.go:123] > [apiclient] All control plane components are healthy after 63.506128 seconds
I0220 15:45:28.017909   82905 command_runner.go:123] > [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0220 15:45:28.031480   82905 command_runner.go:123] > [kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
- I0220 15:45:28.557112   82905 command_runner.go:123] > [upload-certs] Skipping phase. Please see --upload-certs
I0220 15:45:28.557337   82905 command_runner.go:123] > [mark-control-plane] Marking the node minikube as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
| I0220 15:45:29.070246   82905 command_runner.go:123] > [bootstrap-token] Using token: ckmqq4.dov5m97q5ko44fpg
I0220 15:45:29.159638   82905 out.go:140]     ▪ Configuring RBAC rules ..

is there a new config or somethign that we are missing out int he v1.20.3 that is adding an additional check that causes 1 minute tax ?

screenshot of our performance dashboard

Screen Shot 2021-02-21 at 11 10 34 AM

@neolit123
Copy link
Member

hi, @medyagh

| I0220 15:45:04.513403 82905 command_runner.go:123] > [kubelet-check] Initial timeout of 40s passed

this indicates the kubelet takes a while to become ready.

i don't think we have anything on the kubeadm side that can cause it, so it might be a core issue.
i will investigate.

@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. kind/regression Categorizes issue or PR as related to a regression from a prior release. labels Feb 21, 2021
@neolit123 neolit123 modified the milestones: v1.21, v1.20 Feb 21, 2021
@neolit123 neolit123 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 21, 2021
@neolit123
Copy link
Member

i can confirm the problem in our CI (kind(er) based doing containerd-in-docker in a prow pod):

this is using a 1.20.4 kubeadm with a 1.20.4 kubelet.

I0220 17:01:12.877602 169 request.go:943] Got a Retry-After 1s response for attempt 8 to https://172.17.0.7:6443/healthz?timeout=10s
I0220 17:01:13.878699 169 round_trippers.go:445] GET https://172.17.0.7:6443/healthz?timeout=10s in 0 milliseconds
I0220 17:01:13.878760 169 request.go:943] Got a Retry-After 1s response for attempt 9 to https://172.17.0.7:6443/healthz?timeout=10s
[kubelet-check] Initial timeout of 40s passed.

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-1-20/1363170590768238592/build-log.txt

this is the whole diff of k/k between 1.20.2 and 1.20.3:
kubernetes/kubernetes@v1.20.2...v1.20.3

there are kubeadm commits that change the default image repository for CI images, which doesn't seem related.
i do see a few kubelet changes:

  • more

i will now try a 1.20.3 kubeadm with a 1.20.2 kubelet and bisect.

@neolit123
Copy link
Member

neolit123 commented Feb 21, 2021

this is also a problem in the tip of 1.21 and 1.19:

[kubelet-check] Initial timeout of 40s passed.

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-latest/1363270239839588352/build-log.txt

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-1-19/1363254137684234240/build-log.txt

so we must have backported something that causes this.

looking at logs and diffs this started happening on the 13th of Jan. my current suspect commit is:
kubernetes/kubernetes@88ee6d7

@dims
Copy link
Member

dims commented Feb 21, 2021

@SergeyKanzhelev @ehashman please see a possible regression

@neolit123
Copy link
Member

before kubernetes/kubernetes@88ee6d7

[apiclient] All control plane components are healthy after 26.046012 seconds

after:

[apiclient] All control plane components are healthy after 69.034837 seconds

the PR that did this is:
kubernetes/kubernetes#94087

so i can see this wait making sense, but it should bail out if this is the first node and there is no API server yet.

@neolit123
Copy link
Member

cc @BenTheElder for kind

@neolit123
Copy link
Member

i have commented on the original PR.

praveenkumar added a commit to praveenkumar/crc that referenced this issue Feb 22, 2021
We bumped apiserver timeout till 3 mins when commited
4da07ce one but looks like CI
is still failing when testing 4.7.0-rc.3 bits.

Upstream issue: kubernetes/kubeadm#2395
@liggitt
Copy link
Member

liggitt commented Feb 22, 2021

for a self-hosting kubelet, informer sync will fail until the API server is available. Does the delay mean that it takes 40 seconds to start the self-hosted API server? or does it mean that GetNode() is called 4 times and times out in the process of starting static pods?

@neolit123
Copy link
Member

or does it mean that GetNode() is called 4 times and times out in the process of starting static pods?

i suspect that something like that is happening.

the base of the time comparison is a check in kubeadm where it asks both the kubelet and the apiserver to report 200 on their /healthz.
normally the kubelet readies up within seconds and then it starts the apiserver pod.
before the change, this completes in ~26 seconds.

after this change the kubelet /healthz takes more than 40 seconds to 200 as it waits for the apiserver for the informer sync.

@brandond
Copy link

So just to be clear, the apiserver still starts up just as quickly, it's just that the kubelet now does not return 200 on healthz until the apiserver is also up?

@neolit123
Copy link
Member

neolit123 commented Feb 22, 2021

So just to be clear, the apiserver still starts up just as quickly, it's just that the kubelet now does not return 200 on healthz until the apiserver is also up?

seems that both the kubelet and api-server healthz are delayed.
also yesterday i did not add sufficient logging in the kubelet to figure out whats going on.

logs from kubeadm:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0220 17:00:34.356861     169 round_trippers.go:445] GET https://172.17.0.7:6443/healthz?timeout=10s  in 1 milliseconds
...
[kubelet-check] Initial timeout of 40s passed.
...
I0220 17:01:43.390484     169 round_trippers.go:445] GET https://172.17.0.7:6443/healthz?timeout=10s 200 OK in 13 milliseconds
I0220 17:01:43.390615     169 uploadconfig.go:108] [upload-config] Uploading the kubeadm ClusterConfiguration to a ConfigMap
[apiclient] All control plane components are healthy after 69.035888 seconds
  • kubelet starts around 17:00:34.
  • [kubelet-check] Initial timeout of 40s passed. means that the kubelet healthz is not 200 after 40 seconds.
  • API server reports 200 at 17:01:43.
  • we don't log the kubelet 200, but it must have happened at some point.

@neolit123
Copy link
Member

i have a WIP proposal fix for the kubelet.
will try to send later.

@dims
Copy link
Member

dims commented Feb 23, 2021

cc @adisky

@ehashman
Copy link
Member

I haven't noticed any substantial delays in cluster creation on Kubernetes since this patch went in, is it possible this is minikube-specific and should be fixed on minikube bootstrap?

The patch in kubernetes/kubernetes#94087 is a correctness fix to avoid nodes admitting pods before they have fully synced pod state. We've seen many issues with pods getting stuck in NodeAffinity status after cold-starting clusters without this patch because the kubelet starts admitting pods before it is ready and .

so i can see this wait making sense, but it should bail out if this is the first node and there is no API server yet.

I don't think the kubelet has any way of knowing it's the first node? Are you suggesting a change in k8s or kubeadm?

Edit: oops I thought I hit send on this comment 3h ago but apparently not 😅

@neolit123
Copy link
Member

I haven't noticed any substantial delays in cluster creation on Kubernetes since this patch went in

what cluster creation tooling / e2e test jobs are you checking this against?

is it possible this is minikube-specific and should be fixed on minikube bootstrap?

this patch definitely creates a problem if the api-server is managed as a static pod.
so it's not only kubeadm specific (or for anything that uses kubeadm such as minikube)

The patch in kubernetes/kubernetes#94087 is a correctness fix to avoid nodes admitting pods before they have fully synced pod state.

i do not disagree with the intent, i just think the implementation can be improved.

@neolit123
Copy link
Member

fix attempt at:
kubernetes/kubernetes#99336

praveenkumar added a commit to praveenkumar/crc that referenced this issue Feb 23, 2021
We bumped apiserver timeout till 3 mins when commited
4da07ce one but looks like CI
is still failing when testing 4.7.0 bits.

Upstream issue: kubernetes/kubeadm#2395
praveenkumar added a commit to praveenkumar/crc that referenced this issue Feb 23, 2021
We bumped apiserver timeout till 3 mins when commited
4da07ce one but looks like CI
is still failing when testing 4.7.0 bits.

Upstream issue: kubernetes/kubeadm#2395
@ehashman
Copy link
Member

What I'm confused by is why this would affect bootstrap at all. For a standalone kubelet, the checks are always bypassed: https://github.com/kubernetes/kubernetes/pull/94087/files#diff-67dd9e8f3ee257072765326cb4f242852554a2c0753563fa51e292c0a63a7b94R455-R460

what cluster creation tooling / e2e test jobs are you checking this against?

OpenShift and upstream Kubernetes. I don't have specific jobs to point you to though, idk if there's an easy way to gather stats on cluster start time from testgrid.

@liggitt
Copy link
Member

liggitt commented Feb 24, 2021

"self hosted" is not always standalone. kubeadm sets up a kubelet which runs the apiserver and then registers itself as a node to the API server it started

@BenTheElder
Copy link
Member

In particular for development clusters like kind/minikube etc. using kubeadm it's common to have:

  • one node
  • running self-hosted api-server static pod
  • registered kubelet (and control-plane node taint dropped so workloads can schedule)

But as Jordan mentioned even kubeadm clusters with N nodes have static pod apiserver on certain nodes + registered kubelets for all nodes. This will slow down bootstrap for pretty much any kubeadm cluster since most work is blocked on healthy API server / first node.

idk if there's an easy way to gather stats on cluster start time from testgrid.

There is data of sorts, just not what you want probably. Jobs running under kubetest (GCE) record the Up step (cluster bringup) time, but that includes more than just the core cluster bringup. IIRC kind jobs log time for bootstrap and more in the logs, but not to junit currently (pending rewriting how we run those tests).

Local cluster jobs (local-up-cluster.sh, kind) are currently impacted by CI/prow.k8s.io I/O throttling issues impacting pod startup, build times, etc. badly in general and wouldn't make useful performance data until that's resolved.

@brandond
Copy link

"self hosted" is not always standalone. kubeadm sets up a kubelet which runs the apiserver and then registers itself as a node to the API server it started

Our RKE2 project does the same thing. Runs etcd plus apiserver and sundry as static pods; the kubelet registers to the local apiserver once it's up.

@fabriziopandini
Copy link
Member

FYI, got pinged about this issue by some Cluster API users...

guillaumerose pushed a commit to crc-org/crc that referenced this issue Feb 25, 2021
We bumped apiserver timeout till 3 mins when commited
4da07ce one but looks like CI
is still failing when testing 4.7.0 bits.

Upstream issue: kubernetes/kubeadm#2395
@neolit123 neolit123 modified the milestones: v1.20, v1.21 Mar 9, 2021
@neolit123 neolit123 modified the milestones: v1.21, v1.22 Apr 12, 2021
@neolit123
Copy link
Member

update

PR in master (1.22) merged:
kubernetes/kubernetes#99336

the change is backported all the way back to 1.18 (inc); backport PRs are pending.
kubernetes/kubernetes#99336 (comment)

@neolit123
Copy link
Member

PRs merged.

@neolit123
Copy link
Member

neolit123 commented Apr 28, 2021

(i guess the 1.21 PR has not merged yet, but is about to.)

you can expect new releases after the 12th of May.

dongjoon-hyun added a commit to apache/spark that referenced this issue Dec 13, 2022
### What changes were proposed in this pull request?

This PR aims to upgrade the minimum Minikube version to 1.18.0 to 1.28.0 for Apache Spark 3.4.0.

### Why are the changes needed?

Minikube v1.28.0 is released on `Nov 4th, 2022` and the latest one as of today.
- https://github.com/kubernetes/minikube/releases/tag/v1.28.0

GitHub Action CI has been using `Minikube 1.28.0` to test `K8s v1.25.3` and Homebrew also provides `1.28.0` by default.
- https://github.com/apache/spark/actions/runs/3681318787/jobs/6227888255
```
* minikube v1.28.0 on Ubuntu 20.04
...
* Downloading Kubernetes v1.25.3 preload ...
```

In addition, we can choose different K8s versions on this latest Minikube like the following.
```
$ minikube start --kubernetes-version=1.21.0
😄  minikube v1.28.0 on Darwin 13.1 (arm64)
❗  Kubernetes 1.21.0 has a known performance issue on cluster startup. It might take 2 to 3 minutes for a cluster to start.
❗  For more information, see: kubernetes/kubeadm#2395
✨  Automatically selected the docker driver
📌  Using Docker Desktop driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
💾  Downloading Kubernetes v1.21.0 preload ...
```

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the CIs.

Closes #39043 from dongjoon-hyun/SPARK-41502.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this issue Dec 18, 2022
### What changes were proposed in this pull request?

This PR aims to upgrade the minimum Minikube version to 1.18.0 to 1.28.0 for Apache Spark 3.4.0.

### Why are the changes needed?

Minikube v1.28.0 is released on `Nov 4th, 2022` and the latest one as of today.
- https://github.com/kubernetes/minikube/releases/tag/v1.28.0

GitHub Action CI has been using `Minikube 1.28.0` to test `K8s v1.25.3` and Homebrew also provides `1.28.0` by default.
- https://github.com/apache/spark/actions/runs/3681318787/jobs/6227888255
```
* minikube v1.28.0 on Ubuntu 20.04
...
* Downloading Kubernetes v1.25.3 preload ...
```

In addition, we can choose different K8s versions on this latest Minikube like the following.
```
$ minikube start --kubernetes-version=1.21.0
😄  minikube v1.28.0 on Darwin 13.1 (arm64)
❗  Kubernetes 1.21.0 has a known performance issue on cluster startup. It might take 2 to 3 minutes for a cluster to start.
❗  For more information, see: kubernetes/kubeadm#2395
✨  Automatically selected the docker driver
📌  Using Docker Desktop driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
💾  Downloading Kubernetes v1.21.0 preload ...
```

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the CIs.

Closes apache#39043 from dongjoon-hyun/SPARK-41502.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

8 participants