[1.6/1.7] kubernetes ephemeral-storage limits not enforced with remote snapshotters #10095

Kern-- · 2024-04-19T16:05:02Z

Description

When using a remote snapshotter (or any other snapshotter that doesn't place snapshots under the containerd root directory), ephemeral storage limits are not enforced by the kubelet. The container can blow past its limits and keep running indefinitely.

The kublet logs show errors like:

kubelet[3094]: E0419 15:57:23.046299    3094 cri_stats_provider.go:448] "Failed toget the info of the filesystem with mountpoint" err="failed to get device for dir \"/var/lib/containerd/io.containerd.snapshotter.v1.soci\": stat failed on /var/lib/containerd/io.containerd.snapshotter.v1.soci with error: no such file or directory" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.soci"

and

kubelet[3094]: E0419 15:56:55.022396    3094 kubelet.go:1436]  "Image garbage collection failed multiple times in a row" err="invalid capacity 0 on image filesystem"

It looks like the kublet is unable to run ephemeral storage checks and image garbage collection because it's looking for image filesystem information in the wrong place.

Steps to reproduce the issue

Configure containerd to use a remote snapshotter in a k8s environment
Create a pod with an ephemeral storage limit:

resources:
  limits:
    ephemeral-storage: 20M
  requests:
    ephemeral-storage: 10M

Exec into the container and allocate more disk space than allowed

# fallocate -l 1G test1

Observe that the pod does not get evicted and the kubelet logs show errors above

Describe the results you received and expected

The pod should be evicted and the kubelet logs should not show erorrs

What version of containerd are you using?

containerd github.com/containerd/containerd 1.7.11 64b8a81

Any other relevant information

Related downstream issue awslabs/soci-snapshotter#1093

Show configuration if it is related to CRI plugin.

$ cat /etc/containerd/config.toml

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/containerd/containerd.sock"

[proxy_plugins.soci]
type = "snapshot"
address = "/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
snapshotter = "soci"
# This line is required for containerd to send information about how to lazily load the image to the snapshotter
disable_snapshot_annotations = false

[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

The text was updated successfully, but these errors were encountered:

Kern-- · 2024-04-19T16:06:15Z

Related to #9216

Kern-- · 2024-04-19T16:12:03Z

From my investigation, this is fixed in 2.0/main by:

Split CRI image service from GRPC handler #9152 which refactored the CRI plugin to get a map of snapshotter -> correct snapshotter root dir based on an exported root key on the snapshotter or the default hard coded path
Add exports to proxy plugin config #9253 which allows proxy plugins to have exports
Snapshotters: Export the root path #10073 which exports snapshotter root for the remaining snapshotters that didn't before

Rebasing #9152 onto 1.6/1.7 would be tricky because there's a lot of structural change. #9216 was an attempt to fix this before the structural changes and would probably be a better starting point.

Kern-- · 2024-05-06T17:37:52Z

This is fixed in containerd 1.7.16.

1.6 backport is still pending.

Kern-- added the kind/bug label Apr 19, 2024

This was referenced Apr 19, 2024

[release/1.7] Fix CRI snapshotter root path when not under containerd root #10096

Merged

[release/1.6] Fix CRI snapshotter root path when not under containerd root #10127

Merged

Kern-- closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.6/1.7] kubernetes ephemeral-storage limits not enforced with remote snapshotters #10095

[1.6/1.7] kubernetes ephemeral-storage limits not enforced with remote snapshotters #10095

Kern-- commented Apr 19, 2024

Kern-- commented Apr 19, 2024

Kern-- commented Apr 19, 2024

Kern-- commented May 6, 2024

[1.6/1.7] kubernetes ephemeral-storage limits not enforced with remote snapshotters #10095

[1.6/1.7] kubernetes ephemeral-storage limits not enforced with remote snapshotters #10095

Comments

Kern-- commented Apr 19, 2024

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of containerd are you using?

Any other relevant information

Show configuration if it is related to CRI plugin.

Kern-- commented Apr 19, 2024

Kern-- commented Apr 19, 2024

Kern-- commented May 6, 2024