Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod panic due to invalid argument when using rafs version6 #594

Open
zewenying opened this issue May 10, 2024 · 9 comments
Open

Pod panic due to invalid argument when using rafs version6 #594

zewenying opened this issue May 10, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@zewenying
Copy link

zewenying commented May 10, 2024

Problem Description

Hi, I meet a problem when using Nydus format imgae to create a Pod. The problem is Pod will panic with the following logs.
Pod logs

panic: read /usr/share/mime/globs2: invalid argument

goroutine 1 [running]:
mime.loadMimeGlobsFile({0x1fbb3db?, 0x384a920?})
	GOROOT/src/mime/type_unix.go:74 +0x265
mime.initMimeUnix()
	GOROOT/src/mime/type_unix.go:107 +0x4e
mime.initMime()
	GOROOT/src/mime/type.go:88 +0x3d
sync.(*Once).doSlow(0x13?, 0x1cb68a0?)
	GOROOT/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
	GOROOT/src/sync/once.go:65
mime.AddExtensionType({0x1f966c5, 0x5}, {0x1faea9b, 0x10})
	GOROOT/src/mime/type.go:171 +0x65
k8s.io/kube-openapi/pkg/handler3.init.0()
	external/io_k8s_kube_openapi/pkg/handler3/handler.go:88 +0x2b

Nydus-snapshotter logs:
error: failed to get chunk information

time="2024-05-10T04:11:47.958422844Z" level=debug msg="[Prepare] snapshot with key k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f, parent k8s.io/2/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4"
time="2024-05-10T04:11:47.959171647Z" level=debug msg="[Prepare] snapshot with labels map[]" key=k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f parent="k8s.io/2/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4"
time="2024-05-10T04:11:47.959217839Z" level=debug msg="continue to check snapshot 450 parent"
time="2024-05-10T04:11:47.959244457Z" level=debug msg="continue to check snapshot 1 parent"
time="2024-05-10T04:11:47.959398559Z" level=debug msg="overlayfs mount options [workdir=/var/lib/containerd-nydus/snapshots/450/work upperdir=/var/lib/containerd-nydus/snapshots/450/fs lowerdir=/var/lib/containerd-nydus/snapshots/1/fs]"
time="2024-05-10T04:12:03.730236689Z" level=debug msg="[Mounts] snapshot k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f"
time="2024-05-10T04:12:03.730282359Z" level=info msg="[Mounts] snapshot k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f ID 450 Kind Active"
time="2024-05-10T04:12:03.730359974Z" level=debug msg="overlayfs mount options [workdir=/var/lib/containerd-nydus/snapshots/450/work upperdir=/var/lib/containerd-nydus/snapshots/450/fs lowerdir=/var/lib/containerd-nydus/snapshots/1/fs]"
time="2024-05-10T04:12:04.392817303Z" level=debug msg="[Prepare] snapshot with key k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541, parent k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393599760Z" level=debug msg="[Prepare] snapshot with labels map[]" key=k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 parent="k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393649764Z" level=debug msg="continue to check snapshot 451 parent"
time="2024-05-10T04:12:04.393696728Z" level=info msg="Prepares active snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541, nydusd should start afterwards" key=k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 parent="k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393709642Z" level=debug msg="Found nydus meta layer id 419" key=k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 parent="k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393722593Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.393728039Z" level=info msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.393741228Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.394829703Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.394853820Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.394872248Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.394929091Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.407118785Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.407196424Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.407235229Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.407334540Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.409115598Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.409154210Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.409175676Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.409232543Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.435037247Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.435096518Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.435153471Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.435290250Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
[2024-05-10 04:12:04.594260 +00:00] ERROR [/src/error.rs:22] Error:
	"failed to get chunk information"
	at rafs/src/metadata/direct_v6.rs:752
	note: enable `RUST_BACKTRACE=1` env to display a backtrace
time="2024-05-10T04:13:03.588204445Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:13:03.588756976Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:13:03.588824124Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:13:03.588967684Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:13:04.604445896Z" level=debug msg="[Remove] snapshot with key k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f snapshot id 450"
time="2024-05-10T04:13:04.605409593Z" level=debug msg="[Remove] snapshot with key k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 snapshot id 451"
time="2024-05-10T04:13:04.606052374Z" level=debug msg="[Cleanup] snapshots"
time="2024-05-10T04:13:04.606533640Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd-nydus/snapshots/450 /var/lib/containerd-nydus/snapshots/451]"
time="2024-05-10T04:13:04.606557097Z" level=debug msg="no RAFS filesystem instance associated with snapshot 450"
time="2024-05-10T04:13:04.607088316Z" level=debug msg="no RAFS filesystem instance associated with snapshot 451"

Version Information

nydus-snapshotter
v0.10.0
nydusify

Nydusify version
Version	: v2.2.2
Revision	: 19d5b12bb0ca58d0474861416e91961169235114
Go version	: go1.18.10
Build time	: 2023-07-17T04:03:45

nydus-image

Version: 	v2.2.2
Git Commit: 	19d5b12bb0ca58d0474861416e91961169235114
Build Time: 	2023-07-17T04:12:08.369576349Z
Profile: 	release
Rustc: 		rustc 1.66.1 (90743e729 2023-01-10)

nydusd

Version: 	v2.2.2
Git Commit: 	19d5b12bb0ca58d0474861416e91961169235114
Build Time: 	2023-07-17T04:12:08.369576349Z
Profile: 	release
Rustc: 		rustc 1.66.1 (90743e729 2023-01-10)

Workaround

Sol1: change rafs version from 6 to 5

nydusify(v2.2.2) convert uses version6 as nydus image format, set to 5, then panic will not occur.
nydusify convert --fs-version 5 --source $sourceImage --target $targetImage

Sol2: remove nydus image on machine and recreate the Pod

Just remove the image, such like crictl rmi {image}

Question

Apart from the mentioned two solutions, is there any other solution to solve the problem? The ideal solution is every Pod using a fixed version nydus image can run normally on the machine. Is it possible to do some code patches in nydusify(v2.2.2) to implement the ideal solution?

Other Useful Information

Some other nydus images which are converted with fs-version6 can run normally on the same machine. At the same time, Pod using the mentioned image can run normally on other machines.

@imeoer
Copy link
Collaborator

imeoer commented May 11, 2024

@zewenying Thanks for the details, It seems be related to the fs version v6 format, could you describe the image content a little more?

@imeoer imeoer added the bug Something isn't working label May 11, 2024
@zewenying
Copy link
Author

@zewenying Thanks for the details, It seems be related to the fs version v6 format, could you describe the image content a little more?

Thanks for your reply. Could you please give me some tools to describe the image content? Because I don't know what kind of image content will help you to find out the problem.

@imeoer
Copy link
Collaborator

imeoer commented May 11, 2024

@zewenying Try to validate your rafs v6 image by nydusify check --source $oci_image --target $nydus_image first (requires nydusify, nydus-image, nydusd are installed on your node). :)

@zewenying
Copy link
Author

zewenying commented May 13, 2024

@zewenying Try to validate your rafs v6 image by nydusify check --source $oci_image --target $nydus_image first (requires nydusify, nydus-image, nydusd are installed on your node). :)

Hi, here is the log. And I try to setRUST_BACKTRACE=1 to get more logs, but there are no more logs.
image

Tips:

  1. I use an another nydus image which has the same problem to this one. But it passes the check. So it seems like that the registry of source image works well.
INFO[2024-05-13T12:00:05+08:00] Verifying filesystem for source and Nydus image
INFO[2024-05-13T12:01:03+08:00] Verified Nydus image $TARGETIMAGE

@imeoer
Copy link
Collaborator

imeoer commented May 15, 2024

@zewenying It looks like rafs v6 has some fs issues, does rafs v5 always works on the nydusify check?

@zewenying
Copy link
Author

zewenying commented May 15, 2024

@zewenying It looks like rafs v6 has some fs issues, does rafs v5 always works on the nydusify check?

No. It does not pass the check with the same error.

@imeoer
Copy link
Collaborator

imeoer commented May 15, 2024

@zewenying Sorry for the misunderstand, It appears that the TimeOut keyword in your log, maybe it's a registry/network issue.

@zewenying
Copy link
Author

@zewenying Sorry for the misunderstand, It appears that the TimeOut keyword in your log, maybe it's a registry/network issue.

I have tried another problem image which passed the check three days ago and it fails today. It seems like the registry does not work well.

@zewenying
Copy link
Author

hi, @imeoer , I will leave office tomorrow. There will be an another colleague to follow this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants