You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using a remote snapshotter (or any other snapshotter that doesn't place snapshots under the containerd root directory), ephemeral storage limits are not enforced by the kubelet. The container can blow past its limits and keep running indefinitely.
The kublet logs show errors like:
kubelet[3094]: E0419 15:57:23.046299 3094 cri_stats_provider.go:448] "Failed toget the info of the filesystem with mountpoint" err="failed to get device for dir \"/var/lib/containerd/io.containerd.snapshotter.v1.soci\": stat failed on /var/lib/containerd/io.containerd.snapshotter.v1.soci with error: no such file or directory" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.soci"
and
kubelet[3094]: E0419 15:56:55.022396 3094 kubelet.go:1436] "Image garbage collection failed multiple times in a row" err="invalid capacity 0 on image filesystem"
It looks like the kublet is unable to run ephemeral storage checks and image garbage collection because it's looking for image filesystem information in the wrong place.
Steps to reproduce the issue
Configure containerd to use a remote snapshotter in a k8s environment
Show configuration if it is related to CRI plugin.
$ cat /etc/containerd/config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
[grpc]
address = "/run/containerd/containerd.sock"
[proxy_plugins.soci]
type = "snapshot"
address = "/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
snapshotter = "soci"
# This line is required for containerd to send information about how to lazily load the image to the snapshotter
disable_snapshot_annotations = false
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
The text was updated successfully, but these errors were encountered:
From my investigation, this is fixed in 2.0/main by:
Split CRI image service from GRPC handler #9152 which refactored the CRI plugin to get a map of snapshotter -> correct snapshotter root dir based on an exported root key on the snapshotter or the default hard coded path
Rebasing #9152 onto 1.6/1.7 would be tricky because there's a lot of structural change. #9216 was an attempt to fix this before the structural changes and would probably be a better starting point.
Description
When using a remote snapshotter (or any other snapshotter that doesn't place snapshots under the containerd root directory), ephemeral storage limits are not enforced by the kubelet. The container can blow past its limits and keep running indefinitely.
The kublet logs show errors like:
and
It looks like the kublet is unable to run ephemeral storage checks and image garbage collection because it's looking for image filesystem information in the wrong place.
Steps to reproduce the issue
Describe the results you received and expected
The pod should be evicted and the kubelet logs should not show erorrs
What version of containerd are you using?
containerd github.com/containerd/containerd 1.7.11 64b8a81
Any other relevant information
Related downstream issue awslabs/soci-snapshotter#1093
Show configuration if it is related to CRI plugin.
$ cat /etc/containerd/config.toml
The text was updated successfully, but these errors were encountered: