Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileLog Pipeline Not Closed, Causing runc exec to Hang #4141

Open
ruijzhan opened this issue Dec 12, 2023 · 2 comments
Open

FileLog Pipeline Not Closed, Causing runc exec to Hang #4141

ruijzhan opened this issue Dec 12, 2023 · 2 comments

Comments

@ruijzhan
Copy link

Description

FileLog Pipeline Not Closed, Causing runc exec to Hang

1. Problem Phenomenon and Impact

In the node host system, there are long-standing runc exec processes that have not exited. The parent process is containerd-shim, and the child process is etcdctl. The child process is already defunct, becoming a zombie process.

# ps auxf
......

root       50699  0.0  0.0 713096 18576 ?        Sl   Dec07   6:37 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 1b386c6e9d8329e2bf858c51d73063ff47c9dec756d62f3e07d8ab5a52bcb9c3 -address /run/containerd/containerd.sock
root       50717  1.1  0.2 10313084 154032 ?     Ssl  Dec07  81:53  \_ etcd
root      685930  0.0  0.0 1229792 18740 ?       tl   03:27   0:00  \_ runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/1b386c6e9d8329e2bf858c51d73063ff47c9dec756d62f3e07d8ab5a52bcb9c3/log.json --log-format json --sys
temd-cgroup exec --process /tmp/runc-process062846044 --detach --pid-file /run/containerd/io.containerd.runtime.v2.task/moby/1b386c6e9d8329e2bf858c51d73063ff47c9dec756d62f3e07d8ab5a52bcb9c3/bb623815ad911cef808b0eafcf0c5333b13658bb7284a9603a1a7b84b3b875b1.pid 1b386c6e9d83
29e2bf858c51d73063ff47c9dec756d62f3e07d8ab5a52bcb9c3
root      685940  0.0  0.0      0     0 ?        Zs   03:27   0:00      \_ [etcdctl] <defunct>

......

By using commands like docker ps -a, we identified the stuck container ID and matched it with the Pod in the cluster. We found that the issue occurs when kubelet executes binary files in the container namespace through OCI calling runc exec during exec type liveness or readiness probes. Runc does not exit and does not reclaim the child process's PID.

    readinessProbe:
      exec:
        command:
        - etcdctl
        - --endpoints=https://localhost:2379 --cert=/opt/bitnami/etcd/certs/server/tls.crt
          --key=/opt/bitnami/etcd/certs/server/tls.key --cacert=/opt/bitnami/etcd/certs/server/ca.crt
          endpoint status
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 5

This problem has been present from the release of runc 1.0.2 two years ago to the latest version 1.1.10. It occasionally causes the cluster's Docker to trigger PLEG, leading to Node NotReady status.

2. Debugging Information

Debugging of the runc process was conducted using dlv attach.

Two non-system goroutines were identified. One was blocked on channel reading, and the other was blocked on file reading.

(dlv) goroutines
  Goroutine 1 - User: github.com/opencontainers/runc/libcontainer/container_linux.go:372 github.com/opencontainers/runc/libcontainer.(*linuxContainer).start.func1 (0x7ec225) [chan receive]
  ...
  Goroutine 36 - User: os/file_posix.go:31 os.(*File).Read (0x4e2bb7) [IO wait]
(dlv) bt
 0  0x000000000043ac25 in runtime.gopark
    at runtime/proc.go:337
 1  0x000000000040868f in runtime.chanrecv
    at runtime/chan.go:576
 2  0x00000000004082cb in runtime.chanrecv1
    at runtime/chan.go:439
 3  0x00000000007ec225 in github.com/opencontainers/runc/libcontainer.(*linuxContainer).start.func1
    at github.com/opencontainers/runc/libcontainer/container_linux.go:372
 4  0x00000000007c2e7b in github.com/opencontainers/runc/libcontainer.(*linuxContainer).start
    at github.com/opencontainers/runc/libcontainer/container_linux.go:399
 5  0x00000000007c2254 in github.com/opencontainers/runc/libcontainer.(*linuxContainer).Start
    at github.com/opencontainers/runc/libcontainer/container_linux.go:264
 6  0x00000000007c2459 in github.com/opencontainers/runc/libcontainer.(*linuxContainer).Run
    at github.com/opencontainers/runc/libcontainer/container_linux.go:274
 7  0x000000000085a0c9 in main.(*runner).run
    at github.com/opencontainers/runc/utils_linux.go:316
 8  0x000000000084ea65 in main.execProcess
    at github.com/opencontainers/runc/exec.go:157
 9  0x000000000085c64d in main.glob..func5
    at github.com/opencontainers/runc/exec.go:104
10  0x0000000000815ee5 in github.com/opencontainers/runc/vendor/github.com/urfave/cli.HandleAction
    at github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:523
11  0x0000000000816c59 in github.com/opencontainers/runc/vendor/github.com/urfave/cli.Command.Run
    at github.com/opencontainers/runc/vendor/github.com/urfave/cli/command.go:174
12  0x0000000000813e88 in github.com/opencontainers/runc/vendor/github.com/urfave/cli.(*App).Run
    at github.com/opencontainers/runc/vendor/github.com/urfave/cli/app.go:276
13  0x0000000000851edf in main.main
    at github.com/opencontainers/runc/main.go:163
14  0x000000000043a7f6 in runtime.main
    at runtime/proc.go:225
15  0x000000000046e761 in runtime.goexit
    at runtime/asm_amd64.s:1371
(dlv) bt
 0  0x000000000043ac25 in runtime.gopark
    at runtime/proc.go:337
 1  0x0000000000433351 in runtime.netpollblock
    at runtime/netpoll.go:447
 2  0x0000000000469035 in internal/poll.runtime_pollWait
    at runtime/netpoll.go:227
 3  0x00000000004d8d25 in internal/poll.(*pollDesc).wait
    at internal/poll/fd_poll_runtime.go:87
 4  0x00000000004d9e15 in internal/poll.(*pollDesc).waitRead
    at internal/poll/fd_poll_runtime.go:92
 5  0x00000000004d9e15 in internal/poll.(*FD).Read
    at internal/poll/fd_unix.go:166
 6  0x00000000004e2bb7 in os.(*File).read
    at os/file_posix.go:31
 7  0x00000000004e2bb7 in os.(*File).Read
    at os/file.go:117
 8  0x00000000005dbca9 in bufio.(*Scanner).Scan
    at bufio/scan.go:214
 9  0x00000000007aa7f5 in github.com/opencontainers/runc/libcontainer/logs.ForwardLogs.func1
    at github.com/opencontainers/runc/libcontainer/logs/logs.go:35
10  0x000000000046e761 in runtime.goexit
    at runtime/asm_amd64.s:1371

Detailed information about the file descriptor was obtained, indicating that the file was not closed.

(dlv) frame 1
> runtime.gopark() runtime/proc.go:337 (PC: 0x43ac25)
Warning: debugging optimized function
Frame 1: runtime/netpoll.go:447 (PC: 433351)
(dlv) args
pd = (*runtime.pollDesc)(0x7fb830708e88)
mode = 114
waitio = false
~r3 = (unreadable empty OP stack)
(dlv) print pd
(*runtime.pollDesc)(0x7fb830708e88)
*runtime.pollDesc {
        link: *runtime.pollDesc {
                link: *(*runtime.pollDesc)(0x7fb830708cb8),
                lock: (*runtime.mutex)(0x7fb830708da8),
                fd: 7,
                closing: true,
                everr: false,
                user: 0,
                rseq: 28,
                rg: 0,
                rt: (*runtime.timer)(0x7fb830708dd0),
                rd: 0,
                wseq: 28,
                wg: 0,
                wt: (*runtime.timer)(0x7fb830708e30),
                wd: 0,
                self: *(*runtime.pollDesc)(0x7fb830708da0),},
        lock: runtime.mutex {
                lockRankStruct: runtime.lockRankStruct {},
                key: 0,},
        fd: 9,
        closing: false,
        everr: false,
        user: 0,
        rseq: 33,
        rg: 824636884096,
        rt: runtime.timer {
                pp: 0,
                when: 0,
                period: 0,
                f: nil,
                arg: interface {} nil,
                seq: 0,
                nextwhen: 0,
                status: 0,},
        rd: 0,
        wseq: 33,
        wg: 0,
        wt: runtime.timer {
                pp: 0,
                when: 0,
                period: 0,
                f: nil,
                arg: interface {} nil,
                seq: 0,
                nextwhen: 0,
                status: 0,},
        wd: 0,
        self: *runtime.pollDesc {
                link: *(*runtime.pollDesc)(0x7fb830708da0),
                lock: (*runtime.mutex)(0x7fb830708e90),
                fd: 9,
                closing: false,
                everr: false,
                user: 0,
                rseq: 33,
                rg: 824636884096,
                rt: (*runtime.timer)(0x7fb830708eb8),
                rd: 0,
                wseq: 33,
                wg: 0,
                wt: (*runtime.timer)(0x7fb830708f18),
                wd: 0,
                self: *(*runtime.pollDesc)(0x7fb830708e88),},}

3. Code Analysis

The channel read is blocked on the reading of logsDone, which is causing the entire runc exec to be blocked and unable to exit.

func (c *linuxContainer) start(process *Process) (retErr error) {
	parent, err := c.newParentProcess(process)
   
   ...

	logsDone := parent.forwardChildLogs()
	if logsDone != nil {
		defer func() {
			// Wait for log forwarder to finish. This depends on
			// runc init closing the _LIBCONTAINER_LOGPIPE log fd.
			err := <-logsDone
			if err != nil && retErr == nil {
				retErr = newSystemErrorWithCause(err, "forwarding init logs")
			}
		}()
	}

	if err := parent.start(); err != nil {
		return newSystemErrorWithCause(err, "starting container process")
	}

	...
}

The function parent.forwardChildLogs() is implemented to return logs.ForwardLogs(p.logFilePair.parent).

func (p *setnsProcess) forwardChildLogs() chan error {
	return logs.ForwardLogs(p.logFilePair.parent)
}

In the implementation of newParentProcess(), p.logFilePair.parent is declared.

func (c *linuxContainer) newParentProcess(p *Process) (parentProcess, error) {

   ...

	parentLogPipe, childLogPipe, err := os.Pipe()
	if err != nil {
		return nil, fmt.Errorf("Unable to create the log pipe:  %s", err)
	}
	logFilePair := filePair{parentLogPipe, childLogPipe}

	cmd := c.commandTemplate(p, childInitPipe, childLogPipe)
	if !p.Init {
		return c.newSetnsProcess(p, cmd, messageSockPair, logFilePair)
	}

   ...
}

The implementation of ForwardLogs() shows a goroutine blocked in a loop at for s.Scan(). The command done <- s.Err() is not reached, causing the initial channel read to be blocked.

func ForwardLogs(logPipe io.ReadCloser) chan error {
	done := make(chan error, 1)
	s := bufio.NewScanner(logPipe)

	go func() {
		for s.Scan() {
			processEntry(s.Bytes())
		}
		if err := logPipe.Close(); err != nil {
			logrus.Errorf("error closing log source: %v", err)
		}
		// The only error we want to return is when reading from
		// logPipe has failed.
		done <- s.Err()
		close(done)
	}()

	return done
}

The purpose of these codes is to set the namespace for the process that runc exec is about to execute. The logs forwarded by logPipe are also the output of the setns command, but for some unknown reason, the pipeline was not closed after the setns execution was completed. There is also no explicit closure of p.logFilePair.parent in the entire code.

4. Temporary Solution

Since p.logFilePair.parent was not closed, resulting in runc being blocked, we close it after a minute of blocking. A minute is significantly longer than the 5-second timeoutSeconds of prob, so it will not affect the probe results.

// process_linux.go
func (p *setnsProcess) start() (retErr error) {

   // close the parent log pipe after 60 seconds   
	go func() {
		<-time.After(60 * time.Second)
		p.logFilePair.parent.Close()
	}()

	defer p.messageSockPair.parent.Close()
	// get the "before" value of oom kill count
	oom, _ := p.manager.OOMKillCount()
	err := p.cmd.Start()
   
   ...
}

Steps to reproduce the issue

Describe the results you received and expected

I want the logFile pipe is always closed and not block runc exec

What version of runc are you using?

from 1.0.2 to 1.1.10

Host OS information

No response

Host kernel information

Linux yCCS-vmMXiz6CnH 5.4.224-1.el7.elrepo.x86_64 #1 SMP Tue Nov 8 17:24:56 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

@kolyshkin
Copy link
Contributor

To add to your analysis

  1. logfd is a pair of file descriptors obtained via pipe(2) system call. The parent runc process using the reading end of the pipe, and is getting EOF from read on that one once the child closes the writing end of the pipe. This explains why runc never closes parent fd explicitly. This is also why the workaround suggested is just that, a workaround -- hiding but not solving the real problem.

  2. There's also the part where runc passes logfd via _LIBCONTAINER_LOGPIPE environment variable to the runc init (which gets executed during exec) . This fd is later closed in (*linuxSetnsInit) Init() here:

// Close the log pipe fd so the parent's ForwardLogs can exit.
logrus.Debugf("setns_init: about to exec")
if err := unix.Close(l.logFd); err != nil {
return &os.PathError{Op: "close log pipe", Path: "fd " + strconv.Itoa(l.logFd), Err: err}
}

which happens right before exec'ing the binary supplied to runc exec.

So, what you see is probably caused by the fact that this code was not reached.

Now, I think that you assume that PID685940 ([etcdctl] <defunct>) is the child process of PID 685930 (runc exec). I am not sure if this is the case, since ps output lacks the PPID field.

If you can repro this, can you look into:

  • whether runc exec reached the state of actually executing the binary (say, check the PPid field of the defunct etcdctl process to see whether it points to runc exec -- my guess not, or run runc exec with --debug so it would log setns_init: about to exec" right before closing the logfd pipe.
  • what is the cause of a defunct process.

@kolyshkin
Copy link
Contributor

@ruijzhan ideally, if you can find a way to repro this with bare runc, we can take a deeper look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants