Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd 1.6.4 breaks weave on Kubernetes #6921

Closed
jpetazzo opened this issue May 10, 2022 · 10 comments · Fixed by #7011
Closed

containerd 1.6.4 breaks weave on Kubernetes #6921

jpetazzo opened this issue May 10, 2022 · 10 comments · Fixed by #7011
Labels

Comments

@jpetazzo
Copy link

Description

Hi!

I noticed the following problem on my new Kubernetes clusters: all pods (except the ones using hostNetwork) remain in ContainerCreating status, and events (as shown by e.g. kubectl describe pod) indicate:

  Warning  FailedCreatePodSandBox  4m24s (x83 over 22m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9db599b21b2e13e75342e5707d1617f29c7a52ad09770bc9a2c38b1ecc35896a": failed to find network info for sandbox "9db599b21b2e13e75342e5707d1617f29c7a52ad09770bc9a2c38b1ecc35896a"

The kubelet logs (as shown by journalctl -u kubelet -f) indicate the same message (nothing more), and the containerd logs (as shown by journalctl -u containerd -f) indicate:

May 10 16:18:47 node4 containerd[10270]: time="2022-05-10T16:18:47.113889212Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:metrics-server-765bc4bc75-lqrls,Uid:71af0496-cd6b-430c-8e97-d7dc5b5159d9,Namespace:kube-system,Attempt:0,}"
May 10 16:18:47 node4 containerd[10270]: time="2022-05-10T16:18:47.516615215Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:metrics-server-765bc4bc75-lqrls,Uid:71af0496-cd6b-430c-8e97-d7dc5b5159d9,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"38c5bae03e8a1d51bff5e47aa57f2102c0e0db1dc015440dad832f7f2fee9b25\": failed to find network info for sandbox \"38c5bae03e8a1d51bff5e47aa57f2102c0e0db1dc015440dad832f7f2fee9b25\""

I looked at what had changed between my "new" Kubernetes clusters and the "old" ones (the "old" ones that were deployed just last week, using the same deployment mechanism, and that worked fine 🤔), and it seems to be containerd. To give a bit more context: unless instructed otherwise, my deployment scripts spawn Ubuntu 18.04 VMs, install the latest Kubernetes packages from the Kubernetes official deb repositories (http://apt.kubernetes.io/), the latest docker-ce package from the Docker repository (https://download.docker.com/linux/ubuntu), which itself seems to bring the latest containerd.io. According to https://download.docker.com/linux/ubuntu/dists/bionic/pool/stable/amd64/, it looks like containerd.io 1.6.4-1 was released May 5th, just 5 days ago.

To cross-check the issue, I've:

  • reproduced the issue with various versions of Kubernetes (1.24 down to 1.19)
  • fixed the issue by downgrading containerd.io to 1.5.11-1
  • fixed the issue by replacing Weave with Calico

So it seems that there is definitely something off with containerd 1.6; but it could also be something odd with Weave. However, according to https://github.com/weaveworks/weave/releases, Weave hasn't changed since early 2021.

Is there a way to get containerd to tell me more about what it's doing; or what it's expecting?

Steps to reproduce the issue

I think the following should work (but my deployment scripts are a little bit hairy, to account for various cloud provider discrepancies😬) →

  1. Obtain Ubuntu 18.04 machine
  2. Install docker-ce
  3. Wipe out /etc/containerd/config.toml to re-enable CRI API (which is disabled by default)
  4. Install Kubernetes packages
  5. kubeadm init with the config file below
  6. kubectl apply -f https://cloud.weave.works/k8s/net
  7. Observe with kubectl get pods --all-namespaces that pods (e.g. coredns pods) fail to start

kubeadm config file:

kind: InitConfiguration
apiVersion: kubeadm.k8s.io/v1beta2
nodeRegistration:
  criSocket: /run/containerd/containerd.sock

Describe the results you received and expected

Expected results: pods start

Observed results: pods fail to start, stay stuck in ContainerCreating, with errors mentioning failure to create pod sandbox

What version of containerd are you using?

containerd containerd.io 1.6.4 212e8b6

Any other relevant information

$ runc --version
runc version 1.1.1
commit: v1.1.1-0-g52de29d
spec: 1.0.2-dev
go: go1.17.9
libseccomp: 2.5.1
$ sudo crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": true,
        "reason": "",
        "message": ""
      }
    ]
  },
  "cniconfig": {
    "PluginDirs": [
      "/opt/cni/bin"
    ],
    "PluginConfDir": "/etc/cni/net.d",
    "PluginMaxConfNum": 1,
    "Prefix": "eth",
    "Networks": [
      {
        "Config": {
          "Name": "cni-loopback",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "type": "loopback",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"type\":\"loopback\"}"
            }
          ],
          "Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n  \"type\": \"loopback\"\n}]\n}"
        },
        "IFName": "lo"
      },
      {
        "Config": {
          "Name": "weave",
          "CNIVersion": "0.3.0",
          "Plugins": [
            {
              "Network": {
                "name": "weave",
                "type": "weave-net",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"hairpinMode\":true,\"name\":\"weave\",\"type\":\"weave-net\"}"
            },
            {
              "Network": {
                "type": "portmap",
                "capabilities": {
                  "portMappings": true
                },
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"capabilities\":{\"portMappings\":true},\"snat\":true,\"type\":\"portmap\"}"
            }
          ],
          "Source": "{\n    \"cniVersion\": \"0.3.0\",\n    \"name\": \"weave\",\n    \"plugins\": [\n        {\n            \"name\": \"weave\",\n            \"type\": \"weave-net\",\n            \"hairpinMode\": true\n        },\n        {\n            \"type\": \"portmap\",\n            \"capabilities\": {\"portMappings\": true},\n            \"snat\": true\n        }\n    ]\n}\n"
        },
        "IFName": "eth0"
      }
    ]
  },
  "config": {
    "containerd": {
      "snapshotter": "overlayfs",
      "defaultRuntimeName": "runc",
      "defaultRuntime": {
        "runtimeType": "",
        "runtimePath": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "baseRuntimeSpec": "",
        "cniConfDir": "",
        "cniMaxConfNum": 0
      },
      "untrustedWorkloadRuntime": {
        "runtimeType": "",
        "runtimePath": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "baseRuntimeSpec": "",
        "cniConfDir": "",
        "cniMaxConfNum": 0
      },
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "ContainerAnnotations": null,
          "runtimeRoot": "",
          "options": {
            "BinaryName": "",
            "CriuImagePath": "",
            "CriuPath": "",
            "CriuWorkPath": "",
            "IoGid": 0,
            "IoUid": 0,
            "NoNewKeyring": false,
            "NoPivotRoot": false,
            "Root": "",
            "ShimCgroup": "",
            "SystemdCgroup": false
          },
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": "",
          "cniConfDir": "",
          "cniMaxConfNum": 0
        }
      },
      "noPivot": false,
      "disableSnapshotAnnotations": true,
      "discardUnpackedLayers": false,
      "ignoreRdtNotEnabledErrors": false
    },
    "cni": {
      "binDir": "/opt/cni/bin",
      "confDir": "/etc/cni/net.d",
      "maxConfNum": 1,
      "confTemplate": "",
      "ipPref": ""
    },
    "registry": {
      "configPath": "",
      "mirrors": null,
      "configs": null,
      "auths": null,
      "headers": null
    },
    "imageDecryption": {
      "keyModel": "node"
    },
    "disableTCPService": true,
    "streamServerAddress": "127.0.0.1",
    "streamServerPort": "0",
    "streamIdleTimeout": "4h0m0s",
    "enableSelinux": false,
    "selinuxCategoryRange": 1024,
    "sandboxImage": "k8s.gcr.io/pause:3.6",
    "statsCollectPeriod": 10,
    "systemdCgroup": false,
    "enableTLSStreaming": false,
    "x509KeyPairStreaming": {
      "tlsCertFile": "",
      "tlsKeyFile": ""
    },
    "maxContainerLogSize": 16384,
    "disableCgroup": false,
    "disableApparmor": false,
    "restrictOOMScoreAdj": false,
    "maxConcurrentDownloads": 3,
    "disableProcMount": false,
    "unsetSeccompProfile": "",
    "tolerateMissingHugetlbController": true,
    "disableHugetlbController": true,
    "device_ownership_from_security_context": false,
    "ignoreImageDefinedVolumes": false,
    "netnsMountsUnderStateDir": false,
    "enableUnprivilegedPorts": false,
    "enableUnprivilegedICMP": false,
    "containerdRootDir": "/var/lib/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
  },
  "golang": "go1.17.9",
  "lastCNILoadStatus": "OK",
  "lastCNILoadStatus.default": "OK"
}
$ uname -a
Linux node1 4.15.0-176-generic #185-Ubuntu SMP Tue Mar 29 17:40:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Also reproduced on:

Linux node1 5.4.0-96-generic #109~18.04.1-Ubuntu SMP Thu Jan 13 15:06:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Show configuration if it is related to CRI plugin.

Configuration is empty.

@jpetazzo jpetazzo changed the title Regression in containerd 1.6.4 that appears to break weave on Kubernetes containerd 1.6.4 breaks weave on Kubernetes May 10, 2022
@mikebrow
Copy link
Member

mikebrow commented May 10, 2022

msg="RunPodSandbox for &PodSandboxMetadata{Name:metrics-server-765bc4bc75-lqrls,Uid:71af0496-cd6b-430c-8e97-d7dc5b5159d9,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox "38c5bae03e8a1d51bff5e47aa57f2102c0e0db1dc015440dad832f7f2fee9b25": failed to find network info for sandbox "38c5bae03e8a1d51bff5e47aa57f2102c0e0db1dc015440dad832f7f2fee9b25""

It means either the weave plugin did not give a result for your eth0 interface or it's result did not have an IPv4/6 address

Is there a way to get containerd to tell me more about what it's doing; or what it's expecting?

crictl inspectp on that pod to get the full result.. it should have more detail (unless of course it already deleted the pod) would have to walk back the code to see if we clean that pod right away

and/or run containerd in debug mode containerd -l debug that will output the result detail on attempt to run the pod

@MikeZappa87 FYI...

@MikeZappa87
Copy link

@mikebrow I was about to state the container -l debug as well. I want to see whats in the results. That will help us uncover why

if configs, ok := result.Interfaces[defaultIfName]; ok && len(configs.IPConfigs) > 0 {
is failing

@mikebrow
Copy link
Member

mikebrow commented May 10, 2022

nod.. possible ipam defaults..? but yes debug will help here.. Perhaps should modify that error message to add result out if debug is off since we need it here... or just a bit more detail.

@MikeZappa87
Copy link

MikeZappa87 commented May 10, 2022

I believe weave has its own ipam plugin? @jpetazzo can you do a ls /opt/cni/bin ? I think the weave-ipam isn't being called however I am not familiar with weave. It might be helpful to provide the actual contents of the network configuration in the /etc/cni/net/d dir as well

@mikebrow
Copy link
Member

noting possible prior report weaveworks/weave#3936

@fuweid
Copy link
Member

fuweid commented May 10, 2022

duplicate of #6575 ?

@mikebrow
Copy link
Member

mikebrow commented May 11, 2022

duplicate of #6575 ?

nod.. from your analysis it sounds like cni is still broken on backwards compatibility on setup when the plugin fails to provide cni version.. cni fixed a similar problem on tear down. But we still need weave to make their fix and/or cni. Need weave to fix for the range of cni releases where they are broken.. and cni to fix for backwards compatibility with very old plugin(s) that fail to provide cni version...

@fuweid
Copy link
Member

fuweid commented May 27, 2022

After github.com/containernetworking/cni#985 has been fixed, I test it with the commit in my local. It can fix this issue.

REF: fuweid@cfb4e22

@fuweid
Copy link
Member

fuweid commented Jun 4, 2022

@fuweid fuweid closed this as completed Jun 4, 2022
@ReillyBrogan
Copy link

Not that it's not a good thing that this issue was fixed, but anyone still using Weave Net as their CNI plugin should seriously be planning their migration to a different CNI at this point. Weave Net is clearly unmaintained by its developers in favor of their other products and you should assume it to be fundamentally insecure at this point to keep using it. Just a quick glance shows that the current version (2.8.1) was built with Golang 1.15.6 which has some 30-odd vulnerabilities reported at the current moment. It's probably safe to say there are even more in the outdated dependencies it was built with.

I highly recommend Cilium, which in my experience contains a superset of the features of Weave Net and is thus likely to support whatever usecase made you choose Weave Net in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants