Containerd IP leakage #5768

Random-Liu · 2021-07-20T18:47:05Z

Description

We see a problem in production that containerd may leak IP on the node.

Steps to reproduce the issue:

When pod network setup is quite slow, RunPodSandbox may timeout or fail;
Once RunPodSandbox fails, it tries to teardown the pod network in defer;
However, because CNI is slow, the teardown also failed;
At this point, the pod sandbox is gone, but the network is not properly tore down.

Proposed solution
We should probably change how RunPodSandbox works.

It should:

Create the sandbox container first;
Setup network for the sandbox container;
Create the sandbox container task.

In this way, when there is any issue in RunPodSandbox, we can still try to cleanup in defer. However, if any cleanup step failed, the sandbox container on disk can still represent the sandbox, and kubelet will try to guarantee it is properly cleaned up eventually.

The text was updated successfully, but these errors were encountered:

Random-Liu · 2021-07-20T18:47:41Z

@qiutongs volunteered to help fix this issue. :)

Thanks!

mikebrow · 2021-07-20T21:06:37Z

related: was going to need to refactor this a bit anyway to do some sort of pinning model for pid reuse #5630

SergeyKanzhelev · 2021-07-21T19:32:27Z

is this one related? #5438

Random-Liu · 2021-07-22T00:06:08Z

is this one related? #5438

Yeah, that can fix the context timeout issue, but can't fix the issue that network teardown just returned an error.

To make containerd work reliably in error cases, we should keep the sandbox around until it is properly cleaned up.

However, if the context timeout thing can solve most known problems, this becomes relatively lower priority. :)

yylt · 2021-07-27T08:29:58Z

There are other situations, like containerd/go-cni#60

Mostly use cni plugins more than one, such as

{
  "name":"cni0",
  "cniVersion":"0.3.1",
  "plugins":[
    {
      "type":"flannel",
      "delegate":{
        "forceAddress":true,
        "hairpinMode": true,
        "isDefaultGateway":true
      }
    },
    {
      "type":"portmap",
      "capabilities":{
        "portMappings":true
      }
    }
  ]
}

The order about cni remove is from last plugin to first plugin, and if cni plugin` failed, the plugins before that will not be executed, and mostly the first cni plugin is IPAM(IP address allocation) type

Y: Executed
X: Not Executed

CNI CREATE:
  +---------+     +--------+     +----------+
  |  plugin +--->|plugin |+->|  plugin  |
  +---------+     +--------+     +----------+
      Y              Y                Y

CNI REMOVE:
  +----------+  +----------+   +----------+
  |  plugin  |<-+  plugin  |<--+  plugin  |
  +----------+  +----------+   +----------+
       X             X                  Y

qiutongs · 2021-08-04T07:16:16Z

/assign

skmatti · 2021-08-11T02:10:41Z

We are also seeing network teardown errors with IP leaks:

Failed to destroy network for sandbox \"92bac2b1b1e49c0f9b2884ae51f855ea1cc4ae598e252c8e41655fe6ec1c695c\"" error="netplugin failed with no error message: signal: killed"

There a lot more of these errors than the number of leaked IPs. I was not sure if this error message leads to IP leaks. Is there way to know these errors are related to IP leaks?.

sharkymcdongles · 2021-10-09T11:49:49Z

I created a oneliner to solve this on GKE for anyone interested. It takes about 3 min or so to fully correct after running this depending on the number of pods. I set it to only check one namespace, but you could probably make it check more.

for node in `kubectl get pods -o wide --namespace NAMESPACE | grep Creating | awk '{print $7}' | sort -u`; do gcloud beta compute ssh --zone "ZONE" --project "PROJECT" $node --command "$(cat ~/script.sh)"; done

script.sh contents for cilium:

for hash in $(sudo find /var/lib/cni/networks/gke-pod-network -iregex '/var/lib/cni/networks/gke-pod-network/[0-9].*' -exec head -n1 {} \;); do if [ -z $(sudo ctr -n k8s.io c ls | grep $hash | awk '{print $1}') ]; then sudo grep -ilr $hash /var/lib/cni/networks/gke-pod-network; fi; done | sudo xargs rm

sudo systemctl restart kubelet containerd;

On calico the script changes slightly:

for hash in $(sudo find /var/lib/cni/networks/k8s-pod-network -iregex '/var/lib/cni/networks/k8s-pod-network/[0-9].*' -exec head -n1 {} \;); do if [ -z $(sudo ctr -n k8s.io c ls | grep $hash | awk '{print $1}') ]; then sudo grep -ilr $hash /var/lib/cni/networks/k8s-pod-network; fi; done | sudo xargs rm

sudo systemctl restart kubelet containerd;

The fix is supposedly in 1.4.7 containerd, but COS with that version won't debut for another month or so. This should help until then.

Explanation: Basically the script reads all the container hashes for deployed containers then if it finds a running container matching it doesn't touch that file. It then removes all hashes not mapped to running containers and restarts kubelet and containerd. You can delete the files for running hashes as they will be recreated, but it's much better to do it this way since the IPs won't need to be reassigned and the running pods always work as expected.

rueian · 2021-11-30T03:55:03Z

for hash in $(sudo find /var/lib/cni/networks/gke-pod-network -iregex '/var/lib/cni/networks/gke-pod-network/[0-9].*' -exec head -n1 {} ;); do if [ -z $(sudo ctr -n k8s.io c ls | grep $hash | awk '{print $1}') ]; then sudo grep -ilr $hash /var/lib/cni/networks/gke-pod-network; fi; done | sudo xargs rm

sudo systemctl restart kubelet containerd;

We use 1.4.8 containerd on GKE with anetd(cilium), but the problem is not fixed yet.

anfernee · 2022-02-25T06:35:03Z

Any plan to fix this issue? We still see this problem in GKE.

qiutongs · 2022-04-04T06:49:20Z

We are also seeing network teardown errors with IP leaks:
Failed to destroy network for sandbox \"92bac2b1b1e49c0f9b2884ae51f855ea1cc4ae598e252c8e41655fe6ec1c695c\"" error="netplugin failed with no error message: signal: killed"
There a lot more of these errors than the number of leaked IPs. I was not sure if this error message leads to IP leaks. Is there way to know these errors are related to IP leaks?.

Yeah, this likely leads to IP leaks. A previous issue about the timeout case of destroying network was fixed #5438. But other errors cases were not covered.

qiutongs · 2022-04-04T06:50:49Z

Any plan to fix this issue? We still see this problem in GKE.

I am prioritizing it now. Will update my PR asap.

qiutongs · 2022-04-05T05:40:09Z

The idea of fixing this bug is to have Kubelet see the failed-to-destroy-network sandbox and then Kubelet will call StopSandbox which should cleanup the IP leakage.

Here is how things work on Kubelet side.

PLET put pod status in cache

pkg/kubelet/pleg/generic.go updateCache
pkg/kubelet/kuberuntime/kuberuntime_manager.go GetPodStatus
- CRI PodSandboxStatus
- set to cache

Pod worker reads pod status from cache

pkg/kubelet/pod_workers.go managePodLoop
- read from cache
- invoke syncPod

Kubelet syncPod which kill changed sandbox

qiutongs · 2022-04-05T05:41:46Z

Therefore, we want to two things here.

CRI PodSandboxStatus returned the failed-to-destroy-network sandbox
CRI StopPodSandbox can cleanup things properly

xvzf · 2022-04-12T08:42:51Z

Any ETA for this?

qiutongs · 2022-05-03T04:56:39Z

Still working on it #5904. Hope I can merge it in next 2 weeks.

aojea · 2022-05-03T07:03:44Z

The order about cni remove is from last plugin to first plugin, and if cni plugin` failed, the plugins before that will not be executed, and mostly the first cni plugin is IPAM(IP address allocation) type

are you saying the cni plugin fails on a DEL command?
that doesn't seem to match the spec https://github.com/containernetworking/cni/blob/main/SPEC.md#del-remove-container-from-network-or-un-apply-modifications

Plugins should generally complete a DEL action without error even if some resources are missing
Plugins MUST accept multiple DEL calls for the same (CNI_CONTAINERID, CNI_IFNAME) pair, and return success if the interface in question, or any modifications added, are missing.

aojea · 2022-05-03T07:12:32Z

the delete should not return an error, @mikebrow may this be related to the CNI bug we fixed recently?
containerd/go-cni#98

qiutongs · 2022-05-07T04:32:40Z

@aojea An example to illustrate the problem here is the binary is missing for last plugin. There will be the "binary not found" error in both network setup and network teardown.

aojea · 2022-05-09T07:35:40Z

@aojea An example to illustrate the problem here is the binary is missing for last plugin. There will be the "binary not found" error in both network setup and network teardown.

@squeed is this something the CNI spec leaves to the implementations or is there an scenarios that the spec considers?

squeed · 2022-05-10T12:02:38Z

@squeed is this something the CNI spec leaves to the implementations or is there an scenarios that the spec considers?

@aojea Good question. This is not something the CNI spec covers: what to do with a configuration where ADD succeeded but now DEL will fail?

As a general rule, the preference is for plugins to effect a delete if at all possible. But this presents us with an interesting quandary: what do we do if a particular plugin does not execute? It seems like there are three cases we need to consider:

The configuration changes between ADD and DEL such that something is now broken. Not sure how to handle this case.
A plugin binary goes away
A plugin is successfully called, but deletion times out.

The reality is, I'm not sure we can write any CNI spec language that is safe in all cases here. There are two basic approaches:

Ignore all failures, proceeding at all costs
Require plugins to succeed

Unfortunately, if we choose to ignore failures, we're just as likely to suffer resource leaks. Thus, I'm not sure if there is a one-size-fits-all option here.

A few choices, which I can bring up to the CNI maintainers:

Some sort of way for plugins to be skipped on delete, or skipped if it is known the container is going away. Thus, only plugins that need to clean up external resources will be executed
A GC method (already proposed: Enhanced plugin / network lifecycle ( INIT / DEINIT / GC) containernetworking/cni#822) that would allow the runtime to say "these are all the containers, please purge all others"

qiutongs · 2022-07-14T15:28:34Z

Waiting for #7069 to be merged so that we can have more E2E tests.

qiutongs · 2022-09-30T01:40:29Z

#5904 is merged. I and @samuelkarp will work on backporting.

sparr · 2022-10-20T15:17:00Z

5904 merged should fix this for 1.7
Can we remove that milestone with just backport remaining?

yumingqiao · 2022-10-28T06:24:49Z

We meet this issue in our production for v1.5.5, thanks for fixing it @qiutongs. When would the fix being backport to v1.5?

estesp · 2022-10-28T18:24:02Z

#7464 is the backport for release/1.5 and will be merged and available in a future 1.5.x release

awx-fuyuanchu · 2023-01-04T08:15:24Z

Same issue here.
We use the GKE v1.22 and encountered IP leakage on one node.
the containerd version is 1.5.13

The pod creation was stuck and kubelet reported Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6edb3f3677c4df9a07e47cb205d3cab5fe6a810d80d38b8c50d8a6eff1bed9c8": failed to allocate for range 0: no IP addresses available in range set: 10.69.32.1-10.69.32.126

I checked the node, and only 45 running pods on it. However, I got 135 pods with ready state when I running crictl pods | grep -v NotReady | wc -l on the node. So most pods were not clean up by kubelet.

Take the pod id 90487414c798e as an example:
Log of kubelet

Dec 31 13:03:56 node-pool-757cbbe8-wpl7 kubelet[2345]: I1231 13:03:56.675987    2345 kubelet.go:2142] "SyncLoop (PLEG): event for pod" pod="infra/runner-spmuwnyf-project-2128-concurrent-5tff9n" event=&{ID:bc86dd91-e83a-48c1-87b9-e74109988187 Type:ContainerStarted Data:90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618}
Dec 31 13:10:51 node-pool-757cbbe8-wpl7 kubelet[2345]: E1231 13:10:51.254019    2345 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" podSandboxID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618"
Dec 31 13:10:51 node-pool-757cbbe8-wpl7 kubelet[2345]: E1231 13:10:51.254099    2345 kuberuntime_manager.go:993] "Failed to stop sandbox" podSandboxID={Type:containerd ID:90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618}
Dec 31 13:14:53 node-pool-757cbbe8-wpl7 kubelet[2345]: E1231 13:14:53.058122    2345 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to stop container \"0a4b225c30d3a478ec66af76aabd4c9a76dba9b00c875ab4b93b7cf6c0a5b8a9\": failed to kill container \"0a4b225c30d3a478ec66af76aabd4c9a76dba9b00c875ab4b93b7cf6c0a5b8a9\": context deadline exceeded: unknown" podSandboxID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618"
Dec 31 13:14:53 node-pool-757cbbe8-wpl7 kubelet[2345]: E1231 13:14:53.058182    2345 kuberuntime_manager.go:993] "Failed to stop sandbox" podSandboxID={Type:containerd ID:90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618}

Log of containerd

Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.813 [INFO][188155] plugin.go 324: Calico CNI found existing endpoint: &{{WorkloadEndpoint projectcalico.org/v3} {node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0 runner-spmuwnyf-project-2128-concurrent-5 infra  bc86dd91-e83a-48c1-87b9-e74109988187 1691742427 0 2022-12-31 13:03:32 +0000 UTC <nil> <nil> map[pod:runner-spmuwnyf-project-2128-concurrent-5 projectcalico.org/namespace:infra projectcalico.org/orchestrator:k8s projectcalico.org/serviceaccount:gitlab-runner] map[] [] []  []} {k8s  node-pool-757cbbe8-wpl7  runner-spmuwnyf-project-2128-concurrent-5tff9n eth0 gitlab-runner [] []   [kns.infra ksa.infra.gitlab-runner] calic63534fe44b  [] []}} ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-"
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.813 [INFO][188155] k8s.go 74: Extracted identifiers for CmdAddK8s ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0"
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.819 [INFO][188155] utils.go 345: Calico CNI passing podCidr to host-local IPAM: 10.69.32.0/25 ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0"
Dec 31 13:03:33node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.968 [INFO][188155] k8s.go 383: Populated endpoint ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0", GenerateName:"runner-spmuwnyf-project-2128-concurrent-5", Namespace:"infra", SelfLink:"", UID:"bc86dd91-e83a-48c1-87b9-e74109988187", ResourceVersion:"1691742427", Generation:0, CreationTimestamp:time.Date(2022, time.December, 31, 13, 3, 32, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"pod":"runner-spmuwnyf-project-2128-concurrent-5", "projectcalico.org/namespace":"infra", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"gitlab-runner"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"node-pool-757cbbe8-wpl7", ContainerID:"", Pod:"runner-spmuwnyf-project-2128-concurrent-5tff9n", Endpoint:"eth0", ServiceAccountName:"gitlab-runner", IPNetworks:[]string{"10.69.32.93/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.infra", "ksa.infra.gitlab-runner"}, InterfaceName:"calic63534fe44b", MAC:"", Ports:[]v3.WorkloadEndpointPort(nil), AllowSpoofedSourcePrefixes:[]string(nil)}}
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.969 [INFO][188155] k8s.go 384: Calico CNI using IPs: [10.69.32.93/32] ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0"
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.969 [INFO][188155] dataplane_linux.go 68: Setting the host side veth name to calic63534fe44b ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="gke--platform--preprod--demo--2--node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0"
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.984 [INFO][188155] dataplane_linux.go 453: Disabling IPv4 forwarding ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="gke--platform--preprod--demo--2--node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0"
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:32.991 [INFO][188155] k8s.go 411: Added Mac, interface name, and active container ID to endpoint ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="gke--platform--preprod--demo--2--node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"gke--platform--preprod--demo--2--node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0", GenerateName:"runner-spmuwnyf-project-2128-concurrent-5", Namespace:"infra", SelfLink:"", UID:"bc86dd91-e83a-48c1-87b9-e74109988187", ResourceVersion:"1691742427", Generation:0, CreationTimestamp:time.Date(2022, time.December, 31, 13, 3, 32, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"pod":"runner-spmuwnyf-project-2128-concurrent-5", "projectcalico.org/namespace":"infra", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"gitlab-runner"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"node-pool-757cbbe8-wpl7", ContainerID:"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618", Pod:"runner-spmuwnyf-project-2128-concurrent-5tff9n", Endpoint:"eth0", ServiceAccountName:"gitlab-runner", IPNetworks:[]string{"10.69.32.93/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.infra", "ksa.infra.gitlab-runner"}, InterfaceName:"calic63534fe44b", MAC:"a6:77:2a:27:ca:cf", Ports:[]v3.WorkloadEndpointPort(nil), AllowSpoofedSourcePrefixes:[]string(nil)}}
Dec 31 13:03:33 node-pool-757cbbe8-wpl7 containerd[2223]: 2022-12-31 13:03:33.005 [INFO][188155] k8s.go 489: Wrote updated endpoint to datastore ContainerID="90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618" Namespace="infra" Pod="runner-spmuwnyf-project-2128-concurrent-5tff9n" WorkloadEndpoint="gke--platform--preprod--demo--2--node--pool--757cbbe8--wpl7-k8s-runner--spmuwnyf--project--2128--concurrent--5tff9n-eth0"
Dec 31 13:03:56 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:03:56.128084912Z" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618 pid=188837
Dec 31 13:03:56 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:03:56.428131703Z" level=error msg="ContainerStatus for \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\": not found"
Dec 31 13:03:56 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:03:56.428572797Z" level=error msg="PodSandboxStatus for \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" failed" error="rpc error: code = NotFound desc = an error occurred when try to find sandbox: not found"
Dec 31 13:03:56 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:03:56.448918396Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:runner-spmuwnyf-project-2128-concurrent-5tff9n,Uid:bc86dd91-e83a-48c1-87b9-e74109988187,Namespace:infra,Attempt:0,} returns sandbox id \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\""
Dec 31 13:03:56 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:03:56.456030582Z" level=info msg="CreateContainer within sandbox \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" for container &ContainerMetadata{Name:build,Attempt:0,}"
Dec 31 13:04:15 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:04:15.161913567Z" level=info msg="CreateContainer within sandbox \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" for &ContainerMetadata{Name:build,Attempt:0,} returns container id \"0a4b225c30d3a478ec66af76aabd4c9a76dba9b00c875ab4b93b7cf6c0a5b8a9\""
Dec 31 13:04:17 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:04:17.027796597Z" level=info msg="CreateContainer within sandbox \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" for container &ContainerMetadata{Name:helper,Attempt:0,}"
Dec 31 13:04:49 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:04:49.001426550Z" level=info msg="CreateContainer within sandbox \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" for &ContainerMetadata{Name:helper,Attempt:0,} returns container id \"237a34714da1a06482d80dc729710dd083942b404ee906c444ff0f5412651c66\""
Dec 31 13:08:51 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:08:51.253575903Z" level=info msg="StopPodSandbox for \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\""
Dec 31 13:10:51 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:10:51.253747407Z" level=error msg="StopPodSandbox for \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" failed" error="failed to stop container \"237a34714da1a06482d80dc729710dd083942b404ee906c444ff0f5412651c66\": failed to kill container \"237a34714da1a06482d80dc729710dd083942b404ee906c444ff0f5412651c66\": context deadline exceeded: unknown"
Dec 31 13:12:53 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:12:53.057644692Z" level=info msg="StopPodSandbox for \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\""
Dec 31 13:14:53 node-pool-757cbbe8-wpl7 containerd[2223]: time="2022-12-31T13:14:53.057747235Z" level=error msg="StopPodSandbox for \"90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618\" failed" error="failed to stop container \"0a4b225c30d3a478ec66af76aabd4c9a76dba9b00c875ab4b93b7cf6c0a5b8a9\": failed to kill container \"0a4b225c30d3a478ec66af76aabd4c9a76dba9b00c875ab4b93b7cf6c0a5b8a9\": context deadline exceeded: unknown"

process of containerd-shim-runc-v still running

ps -eaf | grep 90487414c798e
root      188837       1  0  2022 ?        00:01:02 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618 -address /run/containerd/containerd.sock

 ps -eaf | grep 188837
root      188837       1  0  2022 ?        00:01:02 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 90487414c798e765f396edaf1d7dd2845d3a735a62ea424bfac5155eb9868618 -address /run/containerd/containerd.sock
65535     188858  188837  0  2022 ?        00:00:00 /pause
root     3930576 3818155  0 08:20 pts/0    00:00:00 grep --colour=auto 188837

samuelkarp · 2023-01-24T23:04:32Z

Closing since this has been fixed in main

Random-Liu added the kind/bug label Jul 20, 2021

AkihiroSuda added the area/cri Container Runtime Interface (CRI) label Jul 20, 2021

k8s-ci-robot assigned qiutongs Aug 4, 2021

qiutongs mentioned this issue Aug 23, 2021

Setup pod network after creating the sandbox container #5904

Merged

dmcgowan added this to the 1.6 milestone Sep 23, 2021

dmcgowan modified the milestones: 1.6, 1.7 Dec 9, 2021

MrHohn mentioned this issue Jun 2, 2022

Pods stuck on ContainerCreating after containerd is restarted #7010

Closed

pacoxu mentioned this issue Jul 4, 2022

Why is the netns file in the /var/run/docker/netns/* not deleted after the container has been deleted in kubernetes? kubernetes/kubernetes#110938

Closed

CaoYunzhou mentioned this issue Sep 5, 2022

aliyun-kubernetes-ack-pro Terway network cannot allocate IP. After chaos mesh is applied chaos-mesh/chaos-mesh#3599

Closed

samuelkarp mentioned this issue Sep 28, 2022

Stricter ordering for resource creation/removal in CRI server #7440

Open

samuelkarp closed this as completed Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containerd IP leakage #5768

Containerd IP leakage #5768

Random-Liu commented Jul 20, 2021

Random-Liu commented Jul 20, 2021 •

edited

mikebrow commented Jul 20, 2021

SergeyKanzhelev commented Jul 21, 2021

Random-Liu commented Jul 22, 2021 •

edited

yylt commented Jul 27, 2021

qiutongs commented Aug 4, 2021

skmatti commented Aug 11, 2021

sharkymcdongles commented Oct 9, 2021 •

edited

rueian commented Nov 30, 2021

anfernee commented Feb 25, 2022

qiutongs commented Apr 4, 2022

qiutongs commented Apr 4, 2022

qiutongs commented Apr 5, 2022 •

edited

qiutongs commented Apr 5, 2022

xvzf commented Apr 12, 2022

qiutongs commented May 3, 2022

aojea commented May 3, 2022

aojea commented May 3, 2022

qiutongs commented May 7, 2022

aojea commented May 9, 2022

squeed commented May 10, 2022

qiutongs commented Jul 14, 2022

qiutongs commented Sep 30, 2022

sparr commented Oct 20, 2022

yumingqiao commented Oct 28, 2022

estesp commented Oct 28, 2022

awx-fuyuanchu commented Jan 4, 2023 •

edited

samuelkarp commented Jan 24, 2023

Containerd IP leakage #5768

Containerd IP leakage #5768

Comments

Random-Liu commented Jul 20, 2021

Random-Liu commented Jul 20, 2021 • edited

mikebrow commented Jul 20, 2021

SergeyKanzhelev commented Jul 21, 2021

Random-Liu commented Jul 22, 2021 • edited

yylt commented Jul 27, 2021

qiutongs commented Aug 4, 2021

skmatti commented Aug 11, 2021

sharkymcdongles commented Oct 9, 2021 • edited

rueian commented Nov 30, 2021

anfernee commented Feb 25, 2022

qiutongs commented Apr 4, 2022

qiutongs commented Apr 4, 2022

qiutongs commented Apr 5, 2022 • edited

qiutongs commented Apr 5, 2022

xvzf commented Apr 12, 2022

qiutongs commented May 3, 2022

aojea commented May 3, 2022

aojea commented May 3, 2022

qiutongs commented May 7, 2022

aojea commented May 9, 2022

squeed commented May 10, 2022

qiutongs commented Jul 14, 2022

qiutongs commented Sep 30, 2022

sparr commented Oct 20, 2022

yumingqiao commented Oct 28, 2022

estesp commented Oct 28, 2022

awx-fuyuanchu commented Jan 4, 2023 • edited

samuelkarp commented Jan 24, 2023

Random-Liu commented Jul 20, 2021 •

edited

Random-Liu commented Jul 22, 2021 •

edited

sharkymcdongles commented Oct 9, 2021 •

edited

qiutongs commented Apr 5, 2022 •

edited

awx-fuyuanchu commented Jan 4, 2023 •

edited