Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS fails on installation with kubeadm #114547

Closed
claudiomerli opened this issue Dec 16, 2022 · 10 comments
Closed

CoreDNS fails on installation with kubeadm #114547

claudiomerli opened this issue Dec 16, 2022 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@claudiomerli
Copy link

What happened?

After installing single node cluster with Calico CNI, removing taints on control-plane node, coreDNS fails to start with this event on kubelet:

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: open /sys/fs/cgroup/memory/system.slice/containerd.service/kubepods-burstable-pod089b861e_bf5c_4b0f_93c4_dc831baffb75.slice:cri-containerd:coredns/memory.memsw.limit_in_bytes: no such file or directory: unknown

What did you expect to happen?

CoreDNS start correctly

How can we reproduce it (as minimally and precisely as possible)?

  • Bare metal server on OVH cloud with OVH cloud image Ubuntu 20.04 Focal Fossa
  • Intel Xeon E5-1620v2 - 4c/8t - 3.7 GHz/3.9 GHz
  • 32 GB 1333 MHz
  • 2×2 TB HDD SATA

Command executed:

sudo swapoff -a  

#Install docker
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release -y
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
sudo usermod -aG docker $USER

#Install kubeadm
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl -y
sudo curl -fsSLo /etc/apt/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl -y
sudo apt-mark hold kubelet kubeadm kubectl
sudo rm -f /etc/containerd/config.toml
sudo systemctl restart containerd

#Install k8s
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.5/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.5/manifests/custom-resources.yaml
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Then look for coreDNS pods status

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:58:30Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:45Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Dedicated server on OVH

OS version

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ uname -a
Linux control-plane 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

``` Client: Docker Engine - Community Version: 20.10.22 API version: 1.41 Go version: go1.18.9 Git commit: 3a2c30b Built: Thu Dec 15 22:28:08 2022 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.22
API version: 1.41 (minimum version 1.12)
Go version: go1.18.9
Git commit: 42c8b31
Built: Thu Dec 15 22:25:58 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.13
GitCommit: 78f51771157abb6c9ed224c22013cdf09962315d
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>
Calico: v3.24.5
</details>
@claudiomerli claudiomerli added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2022
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 16, 2022
@k8s-ci-robot
Copy link
Contributor

@claudiomerli: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@claudiomerli
Copy link
Author

/wg Architecture

@k8s-ci-robot
Copy link
Contributor

@claudiomerli: The label(s) wg/architecture cannot be applied, because the repository doesn't have them.

In response to this:

/wg Architecture

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: open /sys/fs/cgroup/memory/system.slice/containerd.service/kubepods-burstable-pod089b861e_bf5c_4b0f_93c4_dc831baffb75.slice:cri-containerd:coredns/memory.memsw.limit_in_bytes: no such file or directory: unknown

you could try a different runtime or containerd version.
also did you set the cgroup driver to systemd in the containerd config?

https://kubernetes.io/docs/setup/production-environment/container-runtimes/

/sig node cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 16, 2022
@claudiomerli
Copy link
Author

claudiomerli commented Dec 16, 2022

/sig node cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Dec 16, 2022
@claudiomerli
Copy link
Author

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: open /sys/fs/cgroup/memory/system.slice/containerd.service/kubepods-burstable-pod089b861e_bf5c_4b0f_93c4_dc831baffb75.slice:cri-containerd:coredns/memory.memsw.limit_in_bytes: no such file or directory: unknown

you could try a different runtime or containerd version. also did you set the cgroup driver to systemd in the containerd config?

https://kubernetes.io/docs/setup/production-environment/container-runtimes/

/sig node cluster-lifecycle

I retried entire installation with containerd and confriguring properly the config.toml. Nothing, same problem.

@claudiomerli
Copy link
Author

After 20h of debuggind and retries I noticed that removing resources entry in coredns deployment:

kubectl patch deploy coredns -n kube-system --type json -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/resources"}]'

the problem is fixed... I don't know if it is a bug or is related to an environment

@neolit123
Copy link
Member

neolit123 commented Dec 17, 2022

the problem is fixed... I don't know if it is a bug or is related to an environment

since the resources requested by the kubeadm coredns spec are not so demanding and i haven't seen other reports by users, i'd say it sounds like a problem with your environment.

asking in #kubernetes-users on the k8s slack about this can provide you with some responses on why this might be happening.
if you are convinced it's a bug. one a new ticket and provide the before / after spec and tag with /sig node.

thanks

/close
/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Dec 17, 2022
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

the problem is fixed... I don't know if it is a bug or is related to an environment

since the resources requsted by the kubeadm coredns spec are not so demanding and i haven't seen other reports by users, i'd say it sounds like a problem with your environment.

asking in #kubernetes-users on the k8s slack about this provide you with some responses on why this might be happening.
if you are convinced it's a bug. one a new ticket and provide the before / after spec and tag with /sig node.

thanks

/close
/kind support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@claudiomerli
Copy link
Author

claudiomerli commented Dec 19, 2022

the problem is fixed... I don't know if it is a bug or is related to an environment

since the resources requested by the kubeadm coredns spec are not so demanding and i haven't seen other reports by users, i'd say it sounds like a problem with your environment.

asking in #kubernetes-users on the k8s slack about this can provide you with some responses on why this might be happening. if you are convinced it's a bug. one a new ticket and provide the before / after spec and tag with /sig node.

thanks

/close /kind support

Hi, after a day I found this very recent issue question on Github:
containerd/containerd#7828
So it's effectively a problem but with containerd that will be fixed in next version. I told you just for info. My environment was OK. Downgrading containerd to 6.1.12 solved the problem al all. Pay attention because not only coredns was affected by this problem, but all pods that have resources declared!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

3 participants