New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows CI: Add support for testing with containerd #41479
Conversation
Integration API tests starts running but some of the are failing to timeout:
But integration CLI most probably crashes containerd.exe or dockerd.exe lost connection to it so tests are just hanging until they timeout:
|
4c7204c
to
2ca57b3
Compare
2476bf0
to
77806fb
Compare
Looks that I found those most problematic test cases which made CI timeout. These tests are still failing which need investigation:
|
This appears to just be a change in error message from locally-generated "cannot pause Windows Server Containers" to "Cannot pause container ...: not implemented" which might bubble up from hcsshim (although I can't track down the exact source in either hcsshim or containerd). So checking for It'd be easier and more consistent if both old and new errors were wrapping a |
I had a quick look since I was looking at the other failure, and I believe this error is coming from func (he *hcsExec) ResizePty(ctx context.Context, width, height uint32) error {
he.sl.Lock()
defer he.sl.Unlock()
if !he.io.Terminal() {
return errors.Wrapf(errdefs.ErrFailedPrecondition, "exec: '%s' in task: '%s' is not a tty", he.id, he.tid)
}
if he.state == shimExecStateRunning {
return he.p.Process.ResizeConsole(ctx, uint16(width), uint16(height))
}
return nil
} Hmm, this might just be a fault in cID := container.Run(ctx, t, client) should be cID := container.Run(ctx, t, client, container.WithTty(true)) and we've never noticed that non-WCOW containers will accept a It might be interesting to have an integration test to see what happens with The other two tests would have the same problem, except that |
04943af
to
4c99dc7
Compare
@thaJeztah @StefanScherer I think that is it time to discuss that how we want setup CI for this one? Should we have two RS5 builds running on parallel? One like it is currently and another one with environment variable: Also note that I did split those modified tests to #42164 and will rebase this one after it is merged. |
@olljanat A parallel step with the environment variable sounds good to me. |
4c99dc7
to
84028ca
Compare
38b9708
to
07120bf
Compare
@thaJeztah rebased, updated to match with #42528 and skipped default runtime change. PTAL |
@@ -252,6 +255,16 @@ RUN ` | |||
Remove-Item C:\binutils.zip; ` | |||
Remove-Item C:\gitsetup.zip; ` | |||
` | |||
Write-Host INFO: Downloading containerd; ` | |||
Install-Package -Force 7Zip4PowerShell; ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. we should ask them to upload .zip
for Windows.
(But should no longer be an issue if we start uploading containerd binaries to download.docker.com)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, just noticed this. Windows has included tar
for a long time, can we not just use that directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL! (It's clearly been a while since I worked on Windows). Yes if we don't need to install, that'd be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows Server 2016 does not include tar
so win-RS1 (which is still enabled on master branch builds) would stop working if we add requirement for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated but looks that I still missed couple of ContainerD
texts which still need to be updated and Win 2022 without containerd did hit #42612 but what we want to do with this one?
Options are:
- stay on 7zip
- switch to
tar
and add logic to Dockerfile that containerd will be only installed if OS version is at least RS5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an equivalent of a command -v tar
work? (check if the command exists, and if not, download it)? A quick Google search brought me to this page; https://www.shellhacks.com/windows-which-equivalent-cmd-powershell/, and a stackoverflow thread; https://superuser.com/questions/34492/powershell-equivalent-to-the-unix-which-command
I'm ok with doing it in a follow-up if it's too complicated though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The win-RS1 parallel in the build is disabled or inactive, last time I looked, and I recall someone trying it briefly in one of the related PRs, and discovering it to be non-working, or at least needs a bunch of tests skipped.
Not that I'm advocating breaking it further, but there was a discussion about dropping support for it in the 22.x release while moving to containerd anyway, since Server LTSC2016 falls out of mainstream support in January 2022.
I would not be shocked if containerd doesn't support Windows Server LTSC 2016 and no one noticed. There's stuff in containerd master (in the snapshotter) that doesn't seem to work on LTSC 2019, but I've never proven this in isolation as the tests that trigger it are part of my WIP, and trigger other issues as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RS1 still run from master branch after PR is merged to master. You can see build results by browsing commit history https://github.com/moby/moby/commits/master and clicking that green dot / red x from there. Also afaik those builds are still used as part of Docker EE packaging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But CI green 🟢 so maybe we go with this one now?
76c65e2
to
90930f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@cpuguy83 PTAL |
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
90930f7
to
1285c6d
Compare
Rebased as was required by #42720 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -616,6 +638,15 @@ Try { | |||
Write-Host -ForegroundColor Green "INFO: Args: $dutArgs" | |||
New-Item -ItemType Directory $env:TEMP\daemon -ErrorAction SilentlyContinue | Out-Null | |||
|
|||
# Start containerd first | |||
if (-not ("$env:DOCKER_WINDOWS_CONTAINERD_RUNTIME" -eq "")) { | |||
Start-Process "$env:TEMP\binary\containerd.exe" ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make sure containerd-shim-runhcs-v1.exe
is in PATH
, or is containerd.exe
smart enough to look next to itself for that? (So we don't accidentally get one from the host.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/containerd/containerd/blob/v1.5.5/runtime/v2/shim/util.go#L66-L100 uses os/exec.LookPath
first, and then if not found, it checks next to the containerd binary. So yes, I guess it would find a system-$PATH
-installed containerd shim before the one next to containerd.exe.
Per https://github.com/containerd/containerd/blob/v1.5.5/docs/managed-opt.md it might make sense to put the shim binary in $env:ProgramData\containerd\root\opt
(or rather, the test containerd's isolated root directory), although I didn't notice code in containerd to ensure that's searched before $PATH
, and I haven't actually experimented with this myself.
Late edit: I checked, "managed opt" works by prepending itself to the PATH
. So ignore that idea here, it doesn't bring anything more to the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. $env:PATH
gets already overwritten on
Line 570 in ba2adee
$env:PATH="$env:TEMP\go\bin;$env:PATH" |
and
Line 843 in ba2adee
$env:PATH="$env:TEMP\binary;$env:PATH;" # Force to use the test binaries, not the host ones. |
so combining those together and move it to be done before containerd.exe is started should be enough.
Will verify that and update PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed on latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tianon is that good now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/containerd/containerd/blob/v1.5.5/runtime/v2/shim/util.go#L66-L100 uses os/exec.LookPath first, and then if not found, it checks next to the containerd binary. So yes, I guess it would find a system-$PATH-installed containerd shim before the one next to containerd.exe.
Hm.. reminds me of a security fix in Go 1.15. Opened containerd/containerd#5906 to fix that 😅
…v1.exe is used Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
😞 looks like it timed out after 2 hours; let me kick CI again to see if that was a one-off |
Should be as it already passed on runs 37 and 38 https://ci-next.docker.com/public/blue/organizations/jenkins/moby/activity?branch=PR-41479 (assuming that breaking changes have not been merged to master after that). |
Hmm. Timed out second time. @StefanScherer was there changes on build servers between those runs? EDIT: Ah, it was not timeout but cancelled for other reasons (some Jenkins logic I guess). So need just one more try I guess. |
CI 💚 |
Let's get this merged 👍 Thanks everyone! |
- What I did
Set Windows Server Preview Build 20295 and later to use ContainerD as default runtime (like agreed on #41455 (comment)) and provided CI for it (by modifying Win 2022 CI added by #39846).Updated to working with #42528
- How I did it
TestExecWithCloseStdin
andTestPsListContainersFilterHealth
which got stuck forever.- Set Windows build greater or equal of 20295 defaulting to ContainerD and enabling CI for it.TestAPIStatsNoStreamGetCpu
,TestAPIStatsNetworkStats
,TestCommitAfterContainerIsDone
andTestRunSetMacAddress
which looks to be broken after updating to latest ContainerD version.- How to verify it
Pass CI on Win 2022 with and without ContainerD
- What is left to later PRs
- A picture of a cute animal (not mandatory but encouraged)
Relates to #41455