-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ssh: fix error on commandconn close, add ping and default timeout #4226
ssh: fix error on commandconn close, add ping and default timeout #4226
Conversation
4782087
to
4bcf2a7
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #4226 +/- ##
==========================================
+ Coverage 59.07% 59.29% +0.22%
==========================================
Files 287 288 +1
Lines 24752 24769 +17
==========================================
+ Hits 14623 14688 +65
+ Misses 9249 9197 -52
- Partials 880 884 +4 |
4bcf2a7
to
8841e13
Compare
8841e13
to
8dd8535
Compare
e36a5d5
to
02c34ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some nits, suggestions, and questions 😅
4f18381
to
592fa93
Compare
/cc @AkihiroSuda perhaps you want to have a look as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good (thanks!); could use some additional set of eyes on this in case I missed things.
I left some comments/suggestions inline; let me open a PR with those suggestions (feel free to ignore/update/modify them if you want)
opened laurazard#1 (please squash with your commits, and (as mentioned) feel free to update/ignore/whatever). |
--- commandconn: fix race on `Close()` During normal operation, if a `Read()` or `Write()` call results in an EOF, we call `onEOF()` to handle the terminating command, and store it's exit value. However, if a Read/Write call was blocked while `Close()` is called the in/out pipes are immediately closed which causes an EOF to be returned. Here, we shouldn't call `onEOF()`, since the reason why we got an EOF is because we're already terminating the connection. This also prevents a race between two calls to the commands `Wait()`, in the `Close()` call and `onEOF()` --- Add CLI init timeout to SSH connections --- connhelper: add 30s ssh default dialer timeout (same as non-ssh dialer) Signed-off-by: Laura Brehm <laurabrehm@hey.com>
50d81c8
to
a5ebe22
Compare
@thaJeztah I merged your changes (thank you!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@rumpl @AkihiroSuda ptal 🤗 |
assert.Check(t, !process.Alive(cmdConn.cmd.Process.Pid)) | ||
|
||
readErr := <-readErrC | ||
assert.ErrorContains(t, readErr, "EOF") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use errors.Is(readErr, io.EOF)
?
cmdMutex sync.Mutex // for cmd, cmdWaitErr | ||
cmd *exec.Cmd | ||
cmdWaitErr error | ||
cmdExited atomic.Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to moby/moby#45966 (comment), this means we should update our go 1.18
line in vendor.mod
to at least 1.19 😅
# github.com/docker/cli/cli/connhelper/commandconn
cli/connhelper/commandconn/commandconn.go:71:22: undefined: atomic.Bool
cli/connhelper/commandconn/commandconn.go:76:22: undefined: atomic.Bool
cli/connhelper/commandconn/commandconn.go:77:22: undefined: atomic.Bool
cli/connhelper/commandconn/commandconn.go:78:22: undefined: atomic.Bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @tianon, are you interested in contributing to open source and opening a PR? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naw, open source isn't really my thing 😂
(totally fair callout, I'll see what I can do)
- Related
- TLDR
During CLI initialization we ping the configured host. Initially, we set no context timeout for these connections, and after b397391 we set timeouts but only for TCP connections (which then got changed in #3722 to only specifically exclude SSH connections and not sockets).
This looks to be due to errory looking logs from
cli/connhelper/commandconn/commandconn.go
onClose()
– which gets called byhttp.Transport
when the request context times out.However, these logs are being thrown erroneously –
commandconn.kill()
states that it will return nil if the command terminates regardless of it's exit status:cli/cli/connhelper/commandconn/commandconn.go
Line 113 in 26a7357
But the implementation does not hold up to this contract:
cli/cli/connhelper/commandconn/commandconn.go
Line 130 in 26a7357
since
ProcessState.Exited()
returns true only if the program exited due to callingexit
and not if the program was terminated by a signal (this probably does works on Windows since hereProcessState.Exited()
returns true if the program has exited either way, but not on POSIXy OSs).If
commandconn.kill()
is fixed to always returnnil
when the command has terminated, setting timeouts on contexts passed to SSH connections works fine and we no longer need special handling.Other than that, I also added
-o ConnectTimeout=30
to the created SSH command, which serves to align with the timeout from the client returned for non-SSH connections incli/cli/context/docker/load.go
Line 134 in 26a7357
docker stats
or anattach
won't time out after 30s.During this I also found a racy call to
cmd.Wait()
betweenClose()
andonEOF()
, since callingClose
will close the command's stdin/out pipes, which will throw an EOF to pending reads, triggering a call toonEOF()
, but afterwardsClose
will also callkill()
which will also callcmd.Wait()
- What I did
cli/connhelper/commandconn/commandconn.go
:kill()
: Fix implementation so we don't return an error when process is successfully closedWrite()
/Read()
: Prevent race by checking if connection is being closedcli/command/cli.go
:initializeFromClient()
: remove SSH special handling so we also add timeout for SSH connectionscli/context/docker/load.go
:ClientOpts()
: add 30s timeout to align with non-SSH client- How to verify it
DOCKER_HOST=ssh://example.com docker
: no longer hangs forever, now it returns after the default 2s CLI init timeoutDOCKER_HOST=ssh://example.com docker version
: no longer hangs forever, now it will timeout after 30s (can be more if the host resolves to more than 1 IP address)- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)