exec.Cmd.Stop() causes a panic when it tries to SIGKILL; fixing it is insufficient due to a data race #291
Labels
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
What happened:
If the command run by
"utils/exec".New().Command()
ignores SIGTERM and does not exit, or takes longer than 10 seconds to exit, usingcmd.Stop()
will result in a panic after 10 seconds.What you expected to happen:
After SIGTERM fails to stop the process, Stop should send SIGKILL.
How to reproduce it:
I wrote a test that demonstrates the issue. It can be added to exec_test.go
Anything else we need to know?:
The bug seems to come from a misconception about
"os/exec".Cmd.ProcessState.Exited()
However:
That is,
Cmd.ProcessState
is nil untilCmd.Wait()
successfully returns.Exited()
really only is useful to distinguish if the process calledexit()
/ returned normally, or was killed by a signal.I also have a patch to address the panic:
HOWEVER I'm concerned about a possible data race on the osexec.Cmd object between the time.AfterFunc goroutine and whichever routine calls cmd.Wait(). Running under the go race detector after the patch above has been applied does not show a race. But, if patched with the wrong condition (
if c.ProcessState != nil {
), then the race detector is triggered (It's not clear to me why the detection is sensitive to inverting the condition):If sending SIGKILL is to depend on the termination state of cmd (and it seems like it should), then there needs to be some kind of synchronization point between the timer.AfterFunc goroutine and the goroutine calling Wait().
It's possible that Stop() should be removed. I could only find one user of
"utils/exec".Cmd.Stop()
in public repositories by searching with sourcegraph, and that is inkubernetes/kubernetes/pkg/volume/flexvolume/driver-call.go
. That code itself is lacking sufficient synchronization, and is in a deprecated component.https://github.com/kubernetes/kubernetes/blob/99190634ab252604a4496882912ac328542d649d/pkg/volume/flexvolume/driver-call.go#L131
Environment:
kubectl version
): n/auname -a
): DarwinThe text was updated successfully, but these errors were encountered: