Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker exec can't get detail messages if runc init exited abnormally #4083

Open
ningmingxiao opened this issue Oct 20, 2023 · 4 comments
Open

Comments

@ningmingxiao
Copy link

ningmingxiao commented Oct 20, 2023

Description

docker exec can't get detail messages
I try to print detail log at process_linux.go but it doesn't work.

func (p *setnsProcess) start() (retErr error) {
	defer p.messageSockPair.parent.Close()
	// get the "before" value of oom kill count
	oom, _ := p.manager.OOMKillCount()
	err := p.cmd.Start()
	defer func() {
		result,_:=p.cmd.Output()
		fmt.Println(string(result))
	}()

Steps to reproduce the issue

1.docker run --pids-limit=10 --name test -d centos sleep 100000
2 .docker exec -it test bash
run follow command

for i in  $(seq 1 50); do (sleep 100&) ; done

3.open another terminal
docker exec test

[root@host-226 ~]# docker exec test  date
OCI runtime exec failed: exec failed: unable to start container process: read init-p: connection reset by peer: unknown

4.use runc exec

[root@host-226 ~]# runc  --root /var/run/docker/runtime-runc/moby exec da4541ad588bc32a20f677d82b08cd504da0918204d3a01e6b285cf261b8c6e7 date
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f5b598d4791 m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: g 0: unknown pc 0x7f5b598d4791
stack: frame={sp:0x7ffc1903cb30, fp:0x0} stack=[0x7ffc1883e0f0,0x7ffc1903d100)
0x00007ffc1903ca30:  0x00007f5b594cbbd6 <runtime.mmap+0x0000000000000076>  0x00007ffc1903ca70
0x00007ffc1903ca40:  0x00007ffc1903ca48  0x0000000000114c10
0x00007ffc1903ca50:  0x0000002200000003  0x00007ffc1903ca98
0x00007ffc1903ca60:  0x00007f5b594cbcb1 <runtime.mmap.func1+0x0000000000000051>  0x00007f5b3426d000
0x00007ffc1903ca70:  0x00007f5b594cbc60 <runtime.mmap.func1+0x0000000000000000>  0x00007ffc1903ca68
0x00007ffc1903ca80:  0x00007f5b3426d000  0x0000000000407000
0x00007ffc1903ca90:  0x0000003200000003  0x00000000ffffffff
0x00007ffc1903caa0:  0x00007ffc1903cae8  0x00007f5b594dfe59 <runtime.sysMapOS+0x0000000000000039>
0x00007ffc1903cab0:  0x00007f5b32158000  0x00007f5b594cbc60 <runtime.mmap.func1+0x0000000000000000>
0x00007ffc1903cac0:  0x00007f5b594dfde5 <runtime.sysHugePageOS+0x0000000000000065>  0x00007f5b34400000
0x00007ffc1903cad0:  0x0000000000200000  0x000000000000000e
0x00007ffc1903cae0:  0x00007f5b00000000  0x00007ffc1903cb10
0x00007ffc1903caf0:  0x00007ffc1903cb20  0x00007f5b598b2cdc
0x00007ffc1903cb00:  0x0000000000407000  0x00007f5b59531ede <runtime.callCgoMmap+0x000000000000003e>
0x00007ffc1903cb10:  0x00007ffc1903cb10  0x00007ffc1903cb40
0x00007ffc1903cb20:  0x00007ffc1903cb50  0x00007f5b598b2cdc
0x00007ffc1903cb30: <0x0000000000000000  0x00007f5b59531ede <runtime.callCgoMmap+0x000000000000003e>
0x00007ffc1903cb40:  0x00007ffc1903cb40  0x0000000000000007
0x00007ffc1903cb50:  0x00007ffc1903cc10  0x00007f5b594f1f2d <runtime.(*pageAlloc).update+0x00000000000003ad>
0x00007ffc1903cb60:  0x00007f5b592edc00  0x0000000000010000
0x00007ffc1903cb70:  0x0000000000000007  0x00007ffc1903cc38
0x00007ffc1903cb80:  0x00007f5b594f1f2d <runtime.(*pageAlloc).update+0x00000000000003ad>  0x00007f5b592edc00
0x00007ffc1903cb90:  0x00007f5b594f1f2d <runtime.(*pageAlloc).update+0x00000000000003ad>  0x00007f5b592edc00
0x00007ffc1903cba0:  0x00007f5b592edc00  0x0000000000000003
0x00007ffc1903cbb0:  0xfffffffe7fffffff  0xffffffffffffffff
0x00007ffc1903cbc0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cbd0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cbe0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cbf0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cc00:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cc10:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cc20:  0xffffffffffffffff  0xffffffffffffffff
runtime: g 0: unknown pc 0x7f5b598d4791
stack: frame={sp:0x7ffc1903cb30, fp:0x0} stack=[0x7ffc1883e0f0,0x7ffc1903d100)
0x00007ffc1903ca30:  0x00007f5b594cbbd6 <runtime.mmap+0x0000000000000076>  0x00007ffc1903ca70
0x00007ffc1903ca40:  0x00007ffc1903ca48  0x0000000000114c10
0x00007ffc1903ca50:  0x0000002200000003  0x00007ffc1903ca98
0x00007ffc1903ca60:  0x00007f5b594cbcb1 <runtime.mmap.func1+0x0000000000000051>  0x00007f5b3426d000
0x00007ffc1903ca70:  0x00007f5b594cbc60 <runtime.mmap.func1+0x0000000000000000>  0x00007ffc1903ca68
0x00007ffc1903ca80:  0x00007f5b3426d000  0x0000000000407000
0x00007ffc1903ca90:  0x0000003200000003  0x00000000ffffffff
0x00007ffc1903caa0:  0x00007ffc1903cae8  0x00007f5b594dfe59 <runtime.sysMapOS+0x0000000000000039>
0x00007ffc1903cab0:  0x00007f5b32158000  0x00007f5b594cbc60 <runtime.mmap.func1+0x0000000000000000>
0x00007ffc1903cac0:  0x00007f5b594dfde5 <runtime.sysHugePageOS+0x0000000000000065>  0x00007f5b34400000
0x00007ffc1903cad0:  0x0000000000200000  0x000000000000000e
0x00007ffc1903cae0:  0x00007f5b00000000  0x00007ffc1903cb10
0x00007ffc1903caf0:  0x00007ffc1903cb20  0x00007f5b598b2cdc
0x00007ffc1903cb00:  0x0000000000407000  0x00007f5b59531ede <runtime.callCgoMmap+0x000000000000003e>
0x00007ffc1903cb10:  0x00007ffc1903cb10  0x00007ffc1903cb40
0x00007ffc1903cb20:  0x00007ffc1903cb50  0x00007f5b598b2cdc
0x00007ffc1903cb30: <0x0000000000000000  0x00007f5b59531ede <runtime.callCgoMmap+0x000000000000003e>
0x00007ffc1903cb40:  0x00007ffc1903cb40  0x0000000000000007
0x00007ffc1903cb50:  0x00007ffc1903cc10  0x00007f5b594f1f2d <runtime.(*pageAlloc).update+0x00000000000003ad>
0x00007ffc1903cb60:  0x00007f5b592edc00  0x0000000000010000
0x00007ffc1903cb70:  0x0000000000000007  0x00007ffc1903cc38
0x00007ffc1903cb80:  0x00007f5b594f1f2d <runtime.(*pageAlloc).update+0x00000000000003ad>  0x00007f5b592edc00
0x00007ffc1903cb90:  0x00007f5b594f1f2d <runtime.(*pageAlloc).update+0x00000000000003ad>  0x00007f5b592edc00
0x00007ffc1903cba0:  0x00007f5b592edc00  0x0000000000000003
0x00007ffc1903cbb0:  0xfffffffe7fffffff  0xffffffffffffffff
0x00007ffc1903cbc0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cbd0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cbe0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cbf0:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cc00:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cc10:  0xffffffffffffffff  0xffffffffffffffff
0x00007ffc1903cc20:  0xffffffffffffffff  0xffffffffffffffff

goroutine 1 [running]:
runtime.systemstack_switch()
        runtime/asm_amd64.s:463 fp=0xc000060780 sp=0xc000060778 pc=0x7f5b5952de60
runtime.main()
        runtime/proc.go:170 +0x6d fp=0xc0000607e0 sp=0xc000060780 pc=0x7f5b594ffa0d
runtime.goexit()
        runtime/asm_amd64.s:1598 +0x1 fp=0xc0000607e8 sp=0xc0000607e0 pc=0x7f5b59530101

rax    0x0
rbx    0x5555564ad340
rcx    0x7f5b598d4791
rdx    0x0
rdi    0x2
rsi    0x7ffc1903cb30
rbp    0x7f5b59a19ab2
rsp    0x7ffc1903cb30
r8     0x0
r9     0x7ffc1903cb30
r10    0x8
r11    0x246
r12    0xb
r13    0x7ffc1903cf40
r14    0x7f5b59e31640
r15    0x1
rip    0x7f5b598d4791
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
ERRO[0000] exec failed: unable to start container process: read init-p: connection reset by peer

runc exec will show runtime/cgo: pthread_create failed: Resource temporarily unavailable,we can know
Insufficient pid resources. but dockerd logs doesn't contain these logs

Describe the results you received and expected

show detail message

What version of runc are you using?

[root@host-226 ~]# runc -v
runc version 1.1.9
commit: v1.1.9-0-gccaecfc
spec: 1.0.2-dev
go: go1.20.7
libseccomp: 2.5.1

Host OS information

centos7

Host kernel information

5.10

@ningmingxiao ningmingxiao changed the title docker exec can't get detail message docker exec can't get detail messages Oct 20, 2023
@ningmingxiao ningmingxiao changed the title docker exec can't get detail messages docker exec can't get detail messages if runc init exited Oct 20, 2023
@ningmingxiao ningmingxiao changed the title docker exec can't get detail messages if runc init exited docker exec can't get detail messages if runc init exited abnormally Oct 20, 2023
@cyphar
Copy link
Member

cyphar commented Oct 21, 2023

Seems like a Docker/containerd issue? runc outputs the error to stderr, it's up to Docker to format it as necessary. I suspect the issue is that our logs go to stdout but panics go to stderr, so containerd/docker only care about stdout.

@ningmingxiao
Copy link
Author

ningmingxiao commented Oct 21, 2023

Seems like a Docker/containerd issue? runc outputs the error to stderr, it's up to Docker to format it as necessary. I suspect the issue is that our logs go to stdout but panics go to stderr, so containerd/docker only care about stdout.

containerd-shim will read error log from run log.json(now just read "unable to start container process: read init-p: connection reset by peer" ), so I think it is necessary to save it into runc log.json, but I can't find any way to save it into log.json.
see
https://github.com/containerd/containerd/blob/main/pkg/process/utils.go#L61

@kolyshkin
Copy link
Contributor

One of the issues here is runc exec gets SIGABRT after pthread_create returns an error:

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort

In my tests, this happens frequent but not every time. I thought we worked around that by using runtime.GOMAXPROCS(1) in runc init code but this is not enough.

Meaning, go runtime wants a new OS thread when we're already in the container cgroup (and are thus limited by the container's pids limit). From the cursory look I can't see where it may happen.

... or maybe I do. golang/go#24484 mentions that go runtime may need a new OS thread when doing a call to C function via cgo. This might be it:

runtime.callCgoMmap

@ningmingxiao
Copy link
Author

if use the runc main branch ,docker exec will show error detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants