Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet and containerd in endless loop for CreateContainer with unexpected media type octet-stream #124515

Closed
thomasmey opened this issue Apr 24, 2024 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@thomasmey
Copy link

thomasmey commented Apr 24, 2024

What happened?

I think the private container registry had some intermittent errors and did return wrong data for some layers.

The problem I think is that kubelet or containerd is in a state were it assumes that a given OCI image is already downloaded locally, but then some parsing of this layer fails and the loop starts again.

What did you expect to happen?

kubelet and or containerd should detect the wrong layer/manifest state and delete the incomplete/erroneous download and pull again from container registry.

How can we reproduce it (as minimally and precisely as possible)?

Not sure, probably container registry needs to return media type octet-stream for some layer/manifest instead of correct media type.

Anything else we need to know?

No response

Kubernetes version

Kubelet 1.28.7-gke.1026000 Containerd 1.7.10 Linux 6.1.58+

Cloud provider

GCP

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Will provide

Related plugins (CNI, CSI, ...) and versions (if applicable)

@thomasmey thomasmey added the kind/bug Categorizes issue or PR as related to a bug. label Apr 24, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 24, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thomasmey
Copy link
Author

The error message I see in the logs are probably come from here:
https://github.com/containerd/containerd/blob/v1.7.15/images/image.go#L238
Not sure about exact containerd version but I think it's 1.7.x

@thomasmey
Copy link
Author

In the pod events we see thousands of events like:
"Container image "xyz" already present on machine" Pulled.
This seems to originate from here:
https://github.com/kubernetes/kubernetes/blob/v1.28.7/pkg/kubelet/images/image_manager.go#L138

So kubelet thinks that image is already present and successful pulled, but then containerd fails with:
"failed to create containerd container: error unpacking image: unexpected media type application/octet-stream for sha:xyz: not found: CreateContainerError"

@thomasmey
Copy link
Author

The containerd fails probably here:
https://github.com/containerd/containerd/blob/v1.7.10/images/image.go#L238

Looks like something with the local manifest bookkeeping is not correct, which hinders container creation.

Maybe containerd can detect that image layer/manifest is read from local cache and remove erroneous cached local image layers

@neolit123
Copy link
Member

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 25, 2024
@saschagrunert
Copy link
Member

kubelet and or containerd should detect the wrong layer/manifest state and delete the incomplete/erroneous download and pull again from container registry.

containerd should identiffy the corrupt layer and remove it, that's nothing the kubelet should do. The kubelet only tells the runtime to pull images when they're not present on disk.

Do you mind moving that issue to the containerd repo?

@thomasmey
Copy link
Author

opened in containerd as containerd/containerd#10136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

4 participants