Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nerdctl fails when running concurrently due to CNI errors: CHAIN_USER_ADD failed (File exists): chain CNI-ISOLATION-STAGE-2 #2908

Open
aojea opened this issue Apr 3, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@aojea
Copy link

aojea commented Apr 3, 2024

Description

See in kubernetes-sigs/kind#3533

Command Output: time="2024-04-01T08:34:37Z" level=fatal msg="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time=\"2024-04-01T08:34:34Z\" level=fatal msg=\"failed to call cni.Setup: plugin type=\\\"firewall\\\" failed (add): running [/usr/sbin/iptables -t filter -N CNI-ISOLATION-STAGE-2 --wait]: exit status 4: iptables v1.8.7 (nf_tables):  CHAIN_USER_ADD failed (File exists): chain CNI-ISOLATION-STAGE-2\\n\"\nFailed to write to log, write /var/lib/nerdctl/1935db59/containers/default/18e88fcb538d49417539810b[25](https://github.com/kubernetes-sigs/kind/actions/runs/8505888448/job/23295140055?pr=3563#step:8:26)67922886e120771be00f165b5d64cf41a381f5/oci-hook.createRuntime.log: file already closed: unknown"

Stack Trace: 
sigs.k8s.io/kind/pkg/errors.WithStack
	sigs.k8s.io/kind/pkg/errors/errors.go:59
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
	sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl.createContainerWithWaitUntilSystemdReachesMultiUserSystem
	sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl/provision.go:383
sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl.planCreation.func3
	sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl/provision.go:123
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
	sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
	runtime/asm_amd64.s:1598
Error: Process completed with exit code 1.

Steps to reproduce the issue

It seems that can be reproduced by running multiple containers in parallel, at one point the cni plugin will race and fail

Describe the results you received and expected

CNI is a nice and simple implementation for container networking, but for doing more complex operations it always fall short because of this simplicity.
When trying to implement more advanced features, the chaining model executes different binaries that try to do different operations that may need to be synchronized across different containers.
Docker or podman moved to different model from CNI, libnetwork and netvark because of this, though I don't think that this is completely necessary, and CNI is still able to handle this problems if nerdctl creates its own CNI plugin implementation instead of relying on the composition of multiple reference implementation plugins.

I'm happy to collaborate on this if needed, I'll just need a bit of bootstrapping on the requirements, but it does not seems a complicated problem

What version of nerdctl are you using?

NERDCTL_VERSION: 1.7.4

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

@aojea aojea added the kind/unconfirmed-bug-claim Unconfirmed bug claim label Apr 3, 2024
@AkihiroSuda AkihiroSuda added bug Something isn't working and removed kind/unconfirmed-bug-claim Unconfirmed bug claim labels Apr 3, 2024
@AkihiroSuda
Copy link
Member

Maybe we should just have a flock on calling CNI?

Docker or podman moved to different model from CNI, libnetwork and netvark because of this

nit: libnetwork predates CNI, and Docker had never implemented CNI

@AkihiroSuda AkihiroSuda changed the title nerdctl fails when running concurrently due to CNI errors nerdctl fails when running concurrently due to CNI errors: CHAIN_USER_ADD failed (File exists): chain CNI-ISOLATION-STAGE-2 Apr 3, 2024
@aojea
Copy link
Author

aojea commented Apr 3, 2024

Maybe we should just have a flock on calling CNI?

yeah, that sounds simple enough, however, it seems a bug on the cni plugin itself , it should handle iptables concurrency

nit: libnetwork predates CNI, and Docker had never implemented CNI

yeah, just tried to highlight the diversity of opinions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants