Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC | Issues about pulling image with vanilla containerd and remote-snapshotter #8407

Closed
ChengyuZhu6 opened this issue Nov 9, 2023 · 13 comments · Fixed by #9636
Closed

CC | Issues about pulling image with vanilla containerd and remote-snapshotter #8407

ChengyuZhu6 opened this issue Nov 9, 2023 · 13 comments · Fixed by #9636
Labels
area/confidential-containers Issues related to confidential containers (see also CCv0 branch) area/containerd Interaction with containerd bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@ChengyuZhu6
Copy link
Member

ChengyuZhu6 commented Nov 9, 2023

This issue serves as a place to collect and discuss the current challenges of using vanilla containerd and remote-snapshotter in confidential-containers. If you have any related questions or are interested in this topic, please feel free to share them in this issue.

/cc @jiangliu @fidencio @stevenhorsman @fitzthum @BbolroC @huoqifeng

@ChengyuZhu6 ChengyuZhu6 added bug Incorrect behaviour needs-review Needs to be assessed by the team. area/containerd Interaction with containerd area/confidential-containers Issues related to confidential containers (see also CCv0 branch) labels Nov 9, 2023
@ChengyuZhu6
Copy link
Member Author

ChengyuZhu6 commented Nov 9, 2023

Issue 1

 error unpacking image: failed to extract layer : ......: failed to get reader from content store: ......: not found

Description of problem

When we use the default snapshotter (overlayfs) to pull an image (such as a pause image) and create a container, and then we switch to another snapshotter (nydus-snapshotter) to pull the same image, we encounter an error::

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               1s    default-scheduler  Successfully assigned default/coco-kata-1 to zcyubuntu22
  Warning  FailedCreatePodSandBox  1s    kubelet            Failed to create pod sandbox: rpc error: code = NotFound desc = failed to start sandbox "10e7b3ad6400414b393eb8b7e7dc84cfbdd2875bad6733b0db8081a8b99d61ae": failed to create containerd container: error unpacking image: failed to extract layer sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68: failed to get reader from content store: content digest sha256:fbe1a72f5dcd08ba4ca3ce3468c742786c1f6578c1f6bb401be1c4620d6ff705: not found

and I checked the image (I replaced registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 with registry.k8s.io/pause:3.6 because of the network issue) :

$ ctr -n k8s.io image check| grep pause
registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6                                                                           application/vnd.docker.distribution.manifest.list.v2+json sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db incomplete (1/2) 901.0 B/290.4 KiB   true
registry.cn-hangzhou.aliyuncs.com/google_containers/pause@sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db       application/vnd.docker.distribution.manifest.list.v2+json sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db incomplete (1/2) 901.0 B/290.4 KiB   true

Further information

This issue has occurred in the CI tests. This problem is not caused by Kata or CoCo, but by containerd. There is an existing issue in the containerd repo that describes the same problem we are facing: containerd/containerd#8674

This error occurs because the image puller has an optimization that skips downloading the layer related to the pause image, if it is already unpacked as a snapshot in the default snapshotter (overlayfs). This optimization assumes that the layer digests are the same, but they may not be.

This error does not happen with the overlayfs snapshotter, but it does with the nydus snapshotter, because the nydus snapshotter does not have the layer in its storage and cannot find it in the content store after the image was pulled by overlayfs.

A possible way to solve the issue: containerd/containerd#8878.

@imeoer
Copy link

imeoer commented Nov 9, 2023

@huoqifeng
Copy link
Contributor

Related issue: #8337

@huoqifeng
Copy link
Contributor

Bad news is Support runtime level snapshotter is experimental-features in 1.7 as described in https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features, good news is it's target supported features in 2.0.

@wainersm
Copy link
Contributor

wainersm commented Nov 9, 2023

Issue 2 (NOT A BUG - PLEASE DISREGARD)

missing CRI reference annotation for snaposhot 3: unknown

Description of the problem

When we use the remote snapshotter with cloud-api-adaptor (a.k.a peer pods) on AWS EKS a simple pod does not get running, and the error missing CRI reference annotation for snaposhot shows up on the kubectl describe logs. Apparently the same issue happen on Azure AKS and Magnus Kulke has worked to proper reproduce it.

Here is the samples messages seeing from kubectl describe:

Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-101183212-RW23 sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 3: unknown
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-967947483-KBeo sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 4: unknown
  Warning  Failed     47m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-758725951-sRqW sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 5: unknown

I can reproduce that problem on Kubernetes 1.26 and 1.28. The AWS EKS worker nodes are Amazon Linux 2 which comes with containerd 1.6. I managed to install the CoCo operator with INSTALL_OFFICIAL_CONTAINERD equal to true, so the CoCo provided containerd 1.7 got installed on the workers. The pod configuration has the io.containerd.cri.runtime-handler: kata-remote annotation which "activate" the remote snapshotter.

@ChengyuZhu6
Copy link
Member Author

ChengyuZhu6 commented Nov 9, 2023

Issue 2

missing CRI reference annotation for snaposhot 3: unknown

Description of the problem

When we use the remote snapshotter with cloud-api-adaptor (a.k.a peer pods) on AWS EKS a simple pod does not get running, and the error missing CRI reference annotation for snaposhot shows up on the kubectl describe logs. Apparently the same issue happen on Azure AKS and Magnus Kulke has worked to proper reproduce it.

Here is the samples messages seeing from kubectl describe:

Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-101183212-RW23 sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 3: unknown
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-967947483-KBeo sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 4: unknown
  Warning  Failed     47m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-758725951-sRqW sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 5: unknown

I can reproduce that problem on Kubernetes 1.26 and 1.28. The AWS EKS worker nodes are Amazon Linux 2 which comes with containerd 1.6. I managed to install the CoCo operator with INSTALL_OFFICIAL_CONTAINERD equal to true, so the CoCo provided containerd 1.7 got installed on the workers. The pod configuration has the io.containerd.cri.runtime-handler: kata-remote annotation which "activate" the remote snapshotter.

@wainersm , I reproduce the problem in my local machine with runtime kata-qemu. That's my test pod yaml:

apiVersion: v1
kind: Pod
metadata:
  name: coco-1
  namespace: default
  annotations:
    io.containerd.cri.runtime-handler: kata-qemu
spec:
  runtimeClassName: kata-qemu
  containers:
    - name: cc-1
      image: quay.io/kata-containers/confidential-containers:unsigned

It seems that the disable_snapshot_annotations in the containerd config file affects the error you are encountering, which allows passing arbitrary metadata to the underlying snapshotter. When this option is set to true, the error occurs.

Warning  FailedCreatePodSandBox  0s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6": failed to pull image "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6": failed to pull and unpack image "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6": failed to prepare extraction snapshot "extract-640214641-sQ-B sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68": missing CRI reference annotation for snaposhot 2: unknown

But when it is set to false, the error does not happen. Therefore, I suggest you check the value of disable_snapshot_annotations in your config file and change it to false if it is true. Then, please try again and see if the error is resolved.

@mkulke
Copy link
Contributor

mkulke commented Nov 9, 2023

But when it is set to false, the error does not happen. Therefore, I suggest you check the value of disable_snapshot_annotations in your config file and change it to false if it is true. Then, please try again and see if the error is resolved.

Thank you! I'm seeing (some) success by adding disable_snapshot_annotations = false to the containerd config on AKS nodes. (needs more thorough testing to be sure)

Note: since containerd/containerd#4665 this has been implicitly set to true, I suspect the respective logic in the operator doesn't account for that.

@mkulke
Copy link
Contributor

mkulke commented Nov 10, 2023

(needs more thorough testing to be sure)

After a more testing, I think we can say with some confidence that Issue 2 has been addressed by this PR for AKS, not sure about EKS

@wainersm
Copy link
Contributor

(needs more thorough testing to be sure)

After a more testing, I think we can say with some confidence that Issue 2 has been addressed by this PR for AKS, not sure about EKS

On EKS it fixes the issue as well but then I hit another problem which is the lack of fuse on the node, which is not a bug at all. I will mark ** issue 2** as not a bug.

Thank you so much @ChengyuZhu6 for the tip!

@ChengyuZhu6
Copy link
Member Author

Issue 3

The following error occurred on s390x when a step Configure the devmapper snapshotter is added to a workflow https://github.com/BbolroC/kata-containers/blob/5077c6cd224f88eaa3308861276dd9b9a7b03346/.github/workflows/run-k8s-tests-on-zvsi.yaml#L68-L69

Nov 14 02:25:16 hchoi-gha-test-01 k3s[61753]: E1114 02:25:16.639493   61753 pod_workers.go:1294] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"handlers_kata-containers-k8s-tests(3b5a9f6e-5448-40a1-9535-f4b61b28ba72)\" with CreatePodSandboxError
: \"Failed to create sandbox for pod \\\"handlers_kata-containers-k8s-tests(3b5a9f6e-5448-40a1-9535-f4b61b28ba72)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: write /sys/fs/cgroup/kubepods-besteffort-pod3b5a9f6e_544
8_40a1_9535_f4b61b28ba72.slice:cri-containerd:f534123492d026b6958ee19187409a87778031b929570e5dea6794031f6a268c/cgroup.procs: invalid argument: unknown\"" pod="kata-containers-k8s-tests/handlers" podUID=3b5a9f6e-5448-40a1-9535-f4b61b28ba72

You can find the log for the run at https://github.com/BbolroC/kata-containers/actions/runs/6853385374/job/18648918588

I got the test successful without the step (https://github.com/BbolroC/kata-containers/actions/runs/6860820793/job/18655757525)

You can find the journal log at https://gist.github.com/BbolroC/2f6a983d3d374ad3fb8e97769fc933a8 You can find the journal log for containerd at https://gist.github.com/BbolroC/2aa032b4b6f09047d6e3b7ef069b01ec

@BbolroC I found a warning in the containerd log:

Nov 14 01:46:17 hchoi-gha-test-01 containerd[25870]: time="2023-11-14T01:46:17.467565161Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"

Could the error be attributed to a configuration issue with devmapper? BTW, could you attempt to enable the debug option in the config, allowing us to access more detailed logs?

@stevenhorsman
Copy link
Member

Hi, is it okay to bring the issue 3 to the upcoming AC meeting for help? It is a blocker to merge #7931. Thanks!

@BbolroC - hey Choi, could you expand on that - the remote snapshotter approach is just on CCv0 at the moment and I think your PR is for testing main?

@BbolroC
Copy link
Member

BbolroC commented Nov 16, 2023

I've removed the issue I posted because it is an issue around containerd devmapper snapshotter rather than remote snapshotter. Sorry for making confusion here. Thanks.

Update

I was using Ubuntu 22.04 for the runner where cgroupv2 is set by default. The issue has been gone when I switched the version to 1. Thanks!

liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj added a commit to liudalibj/kata-containers that referenced this issue Dec 13, 2023
- add test cases for guest pull images
- need revist after we use container2.0 with 'image pull per runtime class' feature

for kata-containers#8337 and kata-containers#8407

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 1, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 2, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 3, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 8, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 11, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 11, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue Apr 11, 2024
… on CI

we are encountering the issue (kata-containers#8407)
with containerd on CI is likely due to the content digest being missing from the content store,
which can happen when switching between different snapshotters.
To help sort it out on CI, we now clean up related snapshots or images in k8s.io namespace.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
@ChengyuZhu6
Copy link
Member Author

Issue 4

failed to mount /run/kata-containers/shared/containers/47fd27a9d404246dfa462037a1b3c6bccab301f95246c6cd2d533c238c6cfefc/rootfs to /run/kata-containers/47fd27a9d404246dfa462037a1b3c6bccab301f95246c6cd2d533c238c6cfefc/rootfs, with error: ENOENT: No such file or directory: unknown

Description of problem

In guest-pull scenarios, we use the pause image that is pre-installed in the rootfs. However, in CI, the majority of cases involve running a normal pod that employs the default snapshotter (overlayfs) in containerd. This leads to the guest-pull tests using the pause image from the host, rather than the pause image pre-installed in the rootfs:

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               2s    default-scheduler  Successfully assigned default/busybox to zcy-ubuntu22
  Warning  FailedCreatePodSandBox  1s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: failed to mount /run/kata-containers/shared/containers/47fd27a9d404246dfa462037a1b3c6bccab301f95246c6cd2d533c238c6cfefc/rootfs to /run/kata-containers/47fd27a9d404246dfa462037a1b3c6bccab301f95246c6cd2d533c238c6cfefc/rootfs, with error: ENOENT: No such file or directory: unknown

The issue arises from containerd recognizing that the contents of pause image, already stored on the host, has been loaded into the Content Store. Consequently, containerd uses the exited snapshots on the host and proceeds directly to container creation:

$ ctr -n k8s.io content ls|grep registry.k8s.io
sha256:4873874c08efc72e9729683a83ffbb7502ee729e9a5ac097723806ea7fa13517 973B    10 minutes      containerd.io/distribution.source.registry.k8s.io=pause,containerd.io/gc.ref.snapshot.nydus=sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816
sha256:9001185023633d17a2f98ff69b6ff2615b8ea02a825adffa40422f51dfdcde9d 2.761kB 10 minutes      containerd.io/distribution.source.registry.k8s.io=pause,containerd.io/gc.ref.content.m.0=sha256:f5944f2d1daf66463768a1503d0c8c5e8dde7c1674d3f85abc70cef9c7e32e95,containerd.io/gc.ref.content.m.1=sha256:27295ffe5a75328e8230ff9bcabe2b54ebb9079ff70344d73a7b7c7e163ee1a6,containerd.io/gc.ref.content.m.2=sha256:566af08540f378a70a03588f3963b035f33c49ebab3e4e13a4f5edbcd78c6689,containerd.io/gc.ref.content.m.3=sha256:2f205253a51c641263b155d48460ee2056c5b5013f8239ae3811792ec63b3546,containerd.io/gc.ref.content.m.4=sha256:7eaeb31509d7f370599ef78d55956e170eafb7f4a75b8dc14b5c06071d13aae0,containerd.io/gc.ref.content.m.5=sha256:78bfb9d8999c190fca79871c4b2f8d69d94a0605266f0bbb2dbaa1b6dfd03720,containerd.io/gc.ref.content.m.6=sha256:9d05676469a08d6dba9889297333b7d1768e44e38075ab5350a4f8edd97f5be1,containerd.io/gc.ref.content.m.7=sha256:e8fb66bcfe1a85ec1299652d28e6f7f9cfbb01d33c6260582a42971d30dcb77d
sha256:9457426d68990df190301d2e20b8450c4f67d7559bdb7ded6c40d41ced6731f7 307kB   10 minutes      containerd.io/distribution.source.registry.k8s.io=pause,containerd.io/uncompressed=sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816
sha256:f5944f2d1daf66463768a1503d0c8c5e8dde7c1674d3f85abc70cef9c7e32e95 526B    10 minutes      containerd.io/distribution.source.registry.k8s.io=pause,containerd.io/gc.ref.content.config=sha256:4873874c08efc72e9729683a83ffbb7502ee729e9a5ac097723806ea7fa13517,containerd.io/gc.ref.content.l.0=sha256:9457426d68990df190301d2e20b8450c4f67d7559bdb7ded6c40d41ced6731f7

ChengyuZhu6 added a commit to ChengyuZhu6/kata-containers that referenced this issue May 15, 2024
Bump nydus snapshotter to v0.13.13 to fix the gap when switching
different snapshotters in guest pull.

Fixes: kata-containers#8407

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
@katacontainersbot katacontainersbot moved this from To do to In progress in Issue backlog May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/confidential-containers Issues related to confidential containers (see also CCv0 branch) area/containerd Interaction with containerd bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
Issue backlog
  
In progress
Development

Successfully merging a pull request may close this issue.

7 participants