Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssh: fix error on commandconn close, add ping and default timeout #4226

Merged
merged 1 commit into from
Jun 9, 2023

Conversation

laurazard
Copy link
Member

@laurazard laurazard commented Apr 24, 2023

- Related

- TLDR

During CLI initialization we ping the configured host. Initially, we set no context timeout for these connections, and after b397391 we set timeouts but only for TCP connections (which then got changed in #3722 to only specifically exclude SSH connections and not sockets).

This looks to be due to errory looking logs from cli/connhelper/commandconn/commandconn.go on Close() – which gets called by http.Transport when the request context times out.
However, these logs are being thrown erroneously – commandconn.kill() states that it will return nil if the command terminates regardless of it's exit status:

// kill returns nil if the command terminated, regardless to the exit status.

But the implementation does not hold up to this contract:
if wExitErr.ProcessState.Exited() {

since ProcessState.Exited() returns true only if the program exited due to calling exit and not if the program was terminated by a signal (this probably does works on Windows since here ProcessState.Exited() returns true if the program has exited either way, but not on POSIXy OSs).

If commandconn.kill() is fixed to always return nil when the command has terminated, setting timeouts on contexts passed to SSH connections works fine and we no longer need special handling.

Other than that, I also added -o ConnectTimeout=30 to the created SSH command, which serves to align with the timeout from the client returned for non-SSH connections in

Timeout: 30 * time.Second,
since it will be used only for "dialing" i.e., ssh command is unable to establish a connection within 30s it will error out, but if it can establish a connection it will keep running indefinitely, so commands like docker stats or an attach won't time out after 30s.

During this I also found a racy call to cmd.Wait() between Close() and onEOF(), since calling Close will close the command's stdin/out pipes, which will throw an EOF to pending reads, triggering a call to onEOF(), but afterwards Close will also call kill() which will also call cmd.Wait()

- What I did

  • cli/connhelper/commandconn/commandconn.go:
    • kill(): Fix implementation so we don't return an error when process is successfully closed
    • Write()/Read(): Prevent race by checking if connection is being closed
  • cli/command/cli.go:
    • initializeFromClient(): remove SSH special handling so we also add timeout for SSH connections
  • cli/context/docker/load.go:
    • ClientOpts(): add 30s timeout to align with non-SSH client

- How to verify it

  • DOCKER_HOST=ssh://example.com docker: no longer hangs forever, now it returns after the default 2s CLI init timeout
  • DOCKER_HOST=ssh://example.com docker version: no longer hangs forever, now it will timeout after 30s (can be more if the host resolves to more than 1 IP address)

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

image

@laurazard laurazard force-pushed the fix-connhelper-docker-example branch from 4782087 to 4bcf2a7 Compare April 24, 2023 09:34
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2023

Codecov Report

Merging #4226 (50d81c8) into master (20923df) will increase coverage by 0.22%.
The diff coverage is 84.50%.

❗ Current head 50d81c8 differs from pull request most recent head a5ebe22. Consider uploading reports for the commit a5ebe22 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4226      +/-   ##
==========================================
+ Coverage   59.07%   59.29%   +0.22%     
==========================================
  Files         287      288       +1     
  Lines       24752    24769      +17     
==========================================
+ Hits        14623    14688      +65     
+ Misses       9249     9197      -52     
- Partials      880      884       +4     

@laurazard laurazard force-pushed the fix-connhelper-docker-example branch from 4bcf2a7 to 8841e13 Compare April 26, 2023 21:33
@laurazard laurazard changed the title Fix all the SSH things 🥳 ssh: fix error on commandconn close, add ping and default timeout Apr 26, 2023
@laurazard laurazard marked this pull request as ready for review April 26, 2023 21:42
@laurazard laurazard force-pushed the fix-connhelper-docker-example branch from 8841e13 to 8dd8535 Compare April 26, 2023 21:46
@laurazard laurazard force-pushed the fix-connhelper-docker-example branch 4 times, most recently from e36a5d5 to 02c34ad Compare May 1, 2023 15:20
@laurazard laurazard requested a review from thaJeztah May 1, 2023 15:36
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some nits, suggestions, and questions 😅

cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn_unix_test.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/connhelper.go Outdated Show resolved Hide resolved
@laurazard laurazard force-pushed the fix-connhelper-docker-example branch 4 times, most recently from 4f18381 to 592fa93 Compare June 1, 2023 09:56
@laurazard laurazard requested a review from thaJeztah June 1, 2023 10:07
@thaJeztah
Copy link
Member

/cc @AkihiroSuda perhaps you want to have a look as well

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good (thanks!); could use some additional set of eyes on this in case I missed things.

I left some comments/suggestions inline; let me open a PR with those suggestions (feel free to ignore/update/modify them if you want)

cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
cli/connhelper/commandconn/commandconn.go Outdated Show resolved Hide resolved
@thaJeztah
Copy link
Member

opened laurazard#1 (please squash with your commits, and (as mentioned) feel free to update/ignore/whatever).

---
commandconn: fix race on `Close()`

During normal operation, if a `Read()` or `Write()` call results
in an EOF, we call `onEOF()` to handle the terminating command,
and store it's exit value.

However, if a Read/Write call was blocked while `Close()` is called
the in/out pipes are immediately closed which causes an EOF to be
returned. Here, we shouldn't call `onEOF()`, since the reason why
we got an EOF is because we're already terminating the connection.
This also prevents a race between two calls to the commands `Wait()`,
in the `Close()` call and `onEOF()`

---
Add CLI init timeout to SSH connections

---
connhelper: add 30s ssh default dialer timeout

(same as non-ssh dialer)

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
@laurazard laurazard force-pushed the fix-connhelper-docker-example branch from 50d81c8 to a5ebe22 Compare June 9, 2023 09:24
@laurazard
Copy link
Member Author

@thaJeztah I merged your changes (thank you!)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

@rumpl @AkihiroSuda ptal 🤗

@thaJeztah thaJeztah added this to the 25.0.0 milestone Jun 9, 2023
@thaJeztah thaJeztah merged commit 9e481e0 into docker:master Jun 9, 2023
74 checks passed
assert.Check(t, !process.Alive(cmdConn.cmd.Process.Pid))

readErr := <-readErrC
assert.ErrorContains(t, readErr, "EOF")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use errors.Is(readErr, io.EOF)?

cmdMutex sync.Mutex // for cmd, cmdWaitErr
cmd *exec.Cmd
cmdWaitErr error
cmdExited atomic.Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to moby/moby#45966 (comment), this means we should update our go 1.18 line in vendor.mod to at least 1.19 😅

# github.com/docker/cli/cli/connhelper/commandconn
cli/connhelper/commandconn/commandconn.go:71:22: undefined: atomic.Bool
cli/connhelper/commandconn/commandconn.go:76:22: undefined: atomic.Bool
cli/connhelper/commandconn/commandconn.go:77:22: undefined: atomic.Bool
cli/connhelper/commandconn/commandconn.go:78:22: undefined: atomic.Bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @tianon, are you interested in contributing to open source and opening a PR? 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naw, open source isn't really my thing 😂

(totally fair callout, I'll see what I can do)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants