New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Containerd IP leakage #5768
Comments
@qiutongs volunteered to help fix this issue. :) Thanks! |
related: was going to need to refactor this a bit anyway to do some sort of pinning model for pid reuse #5630 |
is this one related? #5438 |
Yeah, that can fix the context timeout issue, but can't fix the issue that network teardown just returned an error. To make containerd work reliably in error cases, we should keep the sandbox around until it is properly cleaned up. However, if the context timeout thing can solve most known problems, this becomes relatively lower priority. :) |
There are other situations, like containerd/go-cni#60 Mostly use cni plugins more than one, such as {
"name":"cni0",
"cniVersion":"0.3.1",
"plugins":[
{
"type":"flannel",
"delegate":{
"forceAddress":true,
"hairpinMode": true,
"isDefaultGateway":true
}
},
{
"type":"portmap",
"capabilities":{
"portMappings":true
}
}
]
} The order about cni remove is from
|
/assign |
We are also seeing network teardown errors with IP leaks:
There a lot more of these errors than the number of leaked IPs. I was not sure if this error message leads to IP leaks. Is there way to know these errors are related to IP leaks?. |
I created a oneliner to solve this on GKE for anyone interested. It takes about 3 min or so to fully correct after running this depending on the number of pods. I set it to only check one namespace, but you could probably make it check more.
script.sh contents for cilium:
On calico the script changes slightly:
The fix is supposedly in 1.4.7 containerd, but COS with that version won't debut for another month or so. This should help until then. Explanation: Basically the script reads all the container hashes for deployed containers then if it finds a running container matching it doesn't touch that file. It then removes all hashes not mapped to running containers and restarts kubelet and containerd. You can delete the files for running hashes as they will be recreated, but it's much better to do it this way since the IPs won't need to be reassigned and the running pods always work as expected. |
We use 1.4.8 containerd on GKE with anetd(cilium), but the problem is not fixed yet. |
Any plan to fix this issue? We still see this problem in GKE. |
Yeah, this likely leads to IP leaks. A previous issue about the timeout case of destroying network was fixed #5438. But other errors cases were not covered. |
I am prioritizing it now. Will update my PR asap. |
The idea of fixing this bug is to have Kubelet see the failed-to-destroy-network sandbox and then Kubelet will call StopSandbox which should cleanup the IP leakage. Here is how things work on Kubelet side. PLET put pod status in cache
Pod worker reads pod status from cache
Kubelet syncPod which kill changed sandbox |
Therefore, we want to two things here.
|
Any ETA for this? |
Still working on it #5904. Hope I can merge it in next 2 weeks. |
are you saying the cni plugin fails on a DEL command?
|
the delete should not return an error, @mikebrow may this be related to the CNI bug we fixed recently? |
@aojea An example to illustrate the problem here is the binary is missing for last plugin. There will be the "binary not found" error in both network setup and network teardown. |
@aojea Good question. This is not something the CNI spec covers: what to do with a configuration where ADD succeeded but now DEL will fail? As a general rule, the preference is for plugins to effect a delete if at all possible. But this presents us with an interesting quandary: what do we do if a particular plugin does not execute? It seems like there are three cases we need to consider:
The reality is, I'm not sure we can write any CNI spec language that is safe in all cases here. There are two basic approaches:
Unfortunately, if we choose to ignore failures, we're just as likely to suffer resource leaks. Thus, I'm not sure if there is a one-size-fits-all option here. A few choices, which I can bring up to the CNI maintainers:
|
Waiting for #7069 to be merged so that we can have more E2E tests. |
#5904 is merged. I and @samuelkarp will work on backporting. |
5904 merged should fix this for 1.7 |
We meet this issue in our production for v1.5.5, thanks for fixing it @qiutongs. When would the fix being backport to v1.5? |
#7464 is the backport for |
Same issue here. The pod creation was stuck and kubelet reported I checked the node, and only 45 running pods on it. However, I got 135 pods with ready state when I running Take the pod id
Log of containerd
process of containerd-shim-runc-v still running
|
Closing since this has been fixed in |
Description
We see a problem in production that containerd may leak IP on the node.
Steps to reproduce the issue:
RunPodSandbox
may timeout or fail;RunPodSandbox
fails, it tries to teardown the pod network in defer;Proposed solution
We should probably change how
RunPodSandbox
works.It should:
In this way, when there is any issue in
RunPodSandbox
, we can still try to cleanup in defer. However, if any cleanup step failed, the sandbox container on disk can still represent the sandbox, and kubelet will try to guarantee it is properly cleaned up eventually.The text was updated successfully, but these errors were encountered: