Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: fix Windows tests #295

Closed
wants to merge 2 commits into from

Conversation

lemeurherve
Copy link
Member

@lemeurherve lemeurherve commented Aug 9, 2023

Fixes #292

Explanations:

  • Some environment variables were empty when using them in tests, adding $global: fixed it. (ex: PUBLIC_SSH_KEY, PRIVATE_SSH_KEY)
  • docker run was called with --publish but without specifying any port, calling it with --publish-all to publish all exposed ports to random ports fixed it.

Testing done

Submitter checklist

Edit tasklist title
Beta Give feedback Tasklist Submitter checklist, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
    Options
  2. Ensure that the pull request title represents the desired changelog entry
    Options
  3. Please describe what you did
    Options
  4. Link to relevant issues in GitHub or Jira
    Options
  5. Link to relevant pull requests, esp. upstream and downstream changes
    Options
  6. Ensure you have provided tests - that demonstrates feature works or fixes the issue
    Options

@lemeurherve
Copy link
Member Author

lemeurherve commented Aug 12, 2023

Note: some tests are fixed but I'm struggling on the remaining ones involving SSH, failing on both nanoserver and windowsservercore images:

Running tests from 'sshAgent.Tests.ps1'
 Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-nanoserver-1809-jdk11
 Describing [nanoserver-1809-jdk11] create agent container with pubkey as argument
 Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-nanoserver-1809-jdk11 22
 Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -i "C:\Windows\TEMP\tmpE779.tmp" -o LogLevel=quiet -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50103 pwsh.exe -NoLogo -C "Write-Host 'f00'"
 
 stdout:
  
 stderr:
  
   [-] runs commands via ssh 2.27s (2.04s|231ms)
    Expected 0, but got 255.
    at $exitCode | Should -Be 0, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1:128
    at <ScriptBlock>, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1:128

Additionally, there is an error from the nanoserver Docker image build unrelated to this PR, also on master branch: #302

@lemeurherve
Copy link
Member Author

Error log with `ssh -v`

OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2
debug1: Connecting to localhost [::1] port 50136.
debug1: connect to address ::1 port 50136: Connection refused
debug1: Connecting to localhost [127.0.0.1] port 50136.
debug1: Connection established.
debug1: identity file C:\Windows\TEMP\tmpEF2A.tmp type -1
debug1: identity file C:\Windows\TEMP\tmpEF2A.tmp-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_for_Windows_8.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_for_Windows_9.2
debug1: match: OpenSSH_for_Windows_9.2 pat OpenSSH* compat 0x04000000
debug1: Authenticating to localhost:50136 as 'jenkins'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:4ScISJjuOHGDkz1DbQ0AtkvLCpml0NABvLxfXso7i/8
debug1: checking without port identifier
Warning: Permanently added '[localhost]:50136' (ECDSA) to the list of known hosts.
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: pubkey_prepare: ssh_get_authentication_socket: No such file or directory
debug1: Will attempt key: C:\Windows\TEMP\tmpEF2A.tmp explicit
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
debug1: kex_input_ext_info: publickey-hostbound@openssh.com (unrecognised)
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Trying private key: C:\Windows\TEMP\tmpEF2A.tmp
Load key "C:\Windows\TEMP\tmpEF2A.tmp": invalid format
debug1: No more authentication methods to try.
jenkins@localhost: Permission denied (publickey).

@lemeurherve
Copy link
Member Author

https://ci.jenkins.io/job/Packaging/job/docker-ssh-agent/view/change-requests/job/PR-295/34/console:

Finished: SUCCESS

🎉

Now trying with the previous version of OpenSSH, then cleaning up to keep only fixes in this PR.

lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Sep 15, 2023
@lemeurherve
Copy link
Member Author

This is really frustrating: running the build and tests locally on a Windows 10 machine with .\make.ps1 test works flawlessly, but SSH tests fail for Windows Server Core (and not Nanoserver) in ci.jenkins.io 🤔

https://ci.jenkins.io/job/Packaging/job/docker-ssh-agent/job/PR-295/58/console

Describing [jdk11-windowsservercore-ltsc2019] create agent container with pubkey as argument
Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019
Starting Run-ThruSSH with container = pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019, privateKeyVal.Length = 1674, cmd = powershell.exe -NoLogo -C "Write-Host 'f00'"
Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 22
Run-ThruSSH > Get-Port = 50167
Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -v -i "C:\Windows\TEMP\tmpE026.tmp" -o LogLevel=verbose -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50167 powershell.exe -NoLogo -C "Write-Host 'f00'"

stdout:

stderr:
OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2
debug1: Connecting to localhost [::1] port 50167.
debug1: connect to address ::1 port 50167: Connection refused
debug1: Connecting to localhost [127.0.0.1] port 50167.
debug1: connect to address 127.0.0.1 port 50167: Connection timed out
ssh: connect to host localhost port 50167: Connection timed out

Run-ThruSSH > Run-Program > stdout =
[-] runs commands via ssh 22.19s
Expected 0, but got 255.
123: $exitCode | Should -Be 0
at , C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1: line 123

Describing [jdk11-windowsservercore-ltsc2019] create agent container with pubkey as envvar
Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019
Starting Run-ThruSSH with container = pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019, privateKeyVal.Length = 1674, cmd = powershell.exe -NoLogo -C "Write-Host 'f00'"
Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 22
Run-ThruSSH > Get-Port = 50172
Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -v -i "C:\Windows\TEMP\tmp50F2.tmp" -o LogLevel=verbose -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50172 powershell.exe -NoLogo -C "Write-Host 'f00'"

stdout:

stderr:
OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2
debug1: Connecting to localhost [::1] port 50172.
debug1: connect to address ::1 port 50172: Connection refused
debug1: Connecting to localhost [127.0.0.1] port 50172.
debug1: connect to address 127.0.0.1 port 50172: Connection timed out
ssh: connect to host localhost port 50172: Connection timed out

Run-ThruSSH > Run-Program > stdout =
[-] runs commands via ssh 22.11s
Expected 0, but got 255.
140: $exitCode | Should -Be 0
at , C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1: line 140

Describing [jdk11-windowsservercore-ltsc2019] create agent container like docker-plugin with '/usr/sbin/sshd -D -p 22' as argument
Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019
Starting Run-ThruSSH with container = pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019, privateKeyVal.Length = 1674, cmd = powershell.exe -NoLogo -C "Write-Host 'f00'"
Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 22
Run-ThruSSH > Get-Port = 50178
Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -v -i "C:\Windows\TEMP\tmpC1AE.tmp" -o LogLevel=verbose -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50178 powershell.exe -NoLogo -C "Write-Host 'f00'"

stdout:

stderr:
OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2
debug1: Connecting to localhost [::1] port 50178.
debug1: connect to address ::1 port 50178: Connection refused
debug1: Connecting to localhost [127.0.0.1] port 50178.
debug1: connect to address 127.0.0.1 port 50178: Connection timed out
ssh: connect to host localhost port 50178: Connection timed out

Run-ThruSSH > Run-Program > stdout =
[-] runs commands via ssh 22.1s
Expected 0, but got 255.
160: $exitCode | Should -Be 0
at , C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1: line 160

Any idea @jenkinsci/team-docker-packaging?

@timja
Copy link
Member

timja commented Sep 28, 2023

This is really frustrating: running the build and tests locally on a Windows 10 machine with .\make.ps1 test works flawlessly, but SSH tests fail for Windows Server Core (and not Nanoserver) in ci.jenkins.io

Create a VM based on the image in the image gallery, that's how I have always debugged failures in jenkinsci/docker.

@dduportal
Copy link
Contributor

This is really frustrating: running the build and tests locally on a Windows 10 machine with .\make.ps1 test works flawlessly, but SSH tests fail for Windows Server Core (and not Nanoserver) in ci.jenkins.io

Create a VM based on the image in the image gallery, that's how I have always debugged failures in jenkinsci/docker.

+1 with Tim: your Windows 10/11 with Docker Desktop uses a different isolation for containers that a fully fledged Win 2019 / 2022 server with Docker-CE windows containers (not the same kernel, not the same hypervisor and system APIs).

@lemeurherve
Copy link
Member Author

@timja @dduportal I've spawned a VM using this image : prod-packer-images/providers/Microsoft.Compute/images/jenkins-agent-windows-2019

And... All tests passed, including those with Windows Server Core 🎉 😅 🤔

What could I try now?

Note that I've tried a replay with windows-2019 as agent label, still failing on ci.jenkins.io with Windows Server Core

@timja
Copy link
Member

timja commented Sep 28, 2023

The last build seemed to pass?

@lemeurherve
Copy link
Member Author

The last build seemed to pass?

Many of them are green, but #291

@lemeurherve
Copy link
Member Author

lemeurherve commented Sep 28, 2023

About the green builds even with some tests failing:

The good news is that it fixes #302, the error message is gone 🎉 (cf https://ci.jenkins.io/job/Packaging/job/docker-ssh-agent/job/PR-319/2/console)

The (less?) good news is that it also fixes #291, the tests are now failing the build as expected 😅

From #319 (comment)

I can put #319 in "ready for review" so it can be merged already, but I don't really know why it restored the ability of failing the build.
I've already tried integrating this OpenSSH update in this PR (commit fa20ba1) but the corresponding build was green while (SSH) tests were failing on Windows Server Core.

@@ -92,7 +92,7 @@ Describe "[$global:AGENT_IMAGE] checking image metadata" {

Describe "[$global:AGENT_IMAGE] image has correct version of java installed and in the PATH" {
BeforeAll {
docker run --detach --tty --name="$global:CONTAINERNAME" --publish "$global:AGENT_IMAGE" $global:CONTAINERSHELL
docker run --detach --tty --name="$global:CONTAINERNAME" --publish-all "$global:AGENT_IMAGE" $global:CONTAINERSHELL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docker run --detach --tty --name="$global:CONTAINERNAME" --publish-all "$global:AGENT_IMAGE" $global:CONTAINERSHELL
docker run --detach --name="$global:CONTAINERNAME" --publish-all "$global:AGENT_IMAGE" $global:CONTAINERSHELL

I'm not convinced that this change will solve all problems, but removing the --tty when the container is executed in background is a good practice (unless an exotic case is done such as running cat command on a linux container in background), we don't use TTY for background containers

tests/test_helpers.psm1 Outdated Show resolved Hide resolved
@lemeurherve
Copy link
Member Author

Should be started over now that all tests pass with nanoserver images and that only the SSH tests are failing for the Windows Server Core images.

lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 28, 2024
lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 28, 2024
lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 28, 2024
lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 28, 2024
lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 29, 2024
lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 29, 2024
lemeurherve added a commit to lemeurherve/docker-ssh-agent that referenced this pull request Apr 29, 2024
@lemeurherve lemeurherve deleted the fix-windows-tests branch May 15, 2024 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Windows tests
3 participants