Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux GCS tests and benchmarks #1352

Merged
merged 2 commits into from Sep 16, 2022
Merged

Linux GCS tests and benchmarks #1352

merged 2 commits into from Sep 16, 2022

Conversation

helsaawy
Copy link
Contributor

@helsaawy helsaawy commented Apr 13, 2022

Added testing suite that can built and run directly on the Linux uVM by sharing or adding it to the rootfs.
It primarily focuses on container (standalone and CRI) management.

This PR adds the ability to build a standalone testing executable that is shared into an LCOW uvm (or added to its rootfs beforehand).
Once on the uVM, the executable will rely on code from internal\guest\test to test and benchmark container creation, and directly interface with runc.
The executable does not replace the Linux GCS (since the GCS is needed to setup the uVM and to interact with the host), but does not interact with it at all.
Additionally, a container image VHD file is needed to mount into the uVM and use as the container base image. By default, the same rootfs.vhd that is used to boot the uVM is used, but any LCOW snapshot VHD can be used.

Functional tests that interact with the Linux GCS are in this PR: #1351, along with supporting code, ie test\internal\*

Script scripts/Test-LCOW-UVM.ps1 is used to launch an LCOW uVM and run the tests locally.
Module scripts/Testing.psm1 wraps the benchmarking portion, and is shared with a helper script in the functional benchmarks in #1351.

Relies on PRs:

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Apr 19, 2022
Exposed data and functionality for testing GCS:
* `internal\guest\runtime\hcsv2.Container.InitProcess()`
* `internal\guest\runtime\hcsv2.GetOrAddNetworkNamespace()`
* `internal\guest\runtime\hcsv2.RemoveNetworkNamespace()`
* `internal\guest\runtime\hcsv2.Host.SecurityPolicyEnforcer()`
* `internal\guest\runtime\hcsv2.Host.Transport()`

Fixed bug where `host.RemoveContainer` did not remove the network
namespace for standalone and pod containers.

Updated go-winio version to include bugfixes for closing hvsockets,
specifically to close a socket for writing (needed by internal\cmd
to signal that the stdin stream has finished).

The tests themselves are broken out here:
microsoft#1352
microsoft#1351

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Apr 19, 2022
Exposed data and functionality for testing GCS:
* `internal\guest\runtime\hcsv2.Container.InitProcess()`
* `internal\guest\runtime\hcsv2.GetOrAddNetworkNamespace()`
* `internal\guest\runtime\hcsv2.RemoveNetworkNamespace()`
* `internal\guest\runtime\hcsv2.Host.SecurityPolicyEnforcer()`
* `internal\guest\runtime\hcsv2.Host.Transport()`

Fixed bug where `host.RemoveContainer` did not remove the network
namespace for standalone and pod containers.

Updated go-winio version to include bugfixes for closing hvsockets,
specifically to close a socket for writing (needed by internal\cmd
to signal that the stdin stream has finished).

The tests themselves are broken out here:
microsoft#1352
microsoft#1351

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Apr 19, 2022
Exposed data and functionality for testing GCS:
* `internal\guest\runtime\hcsv2.Container.InitProcess()`
* `internal\guest\runtime\hcsv2.GetOrAddNetworkNamespace()`
* `internal\guest\runtime\hcsv2.RemoveNetworkNamespace()`
* `internal\guest\runtime\hcsv2.Host.SecurityPolicyEnforcer()`
* `internal\guest\runtime\hcsv2.Host.Transport()`

Fixed bug where `host.RemoveContainer` did not remove the network
namespace for standalone and pod containers.

Updated go-winio version to include bugfixes for closing hvsockets,
specifically to close a socket for writing (needed by internal\cmd
to signal that the stdin stream has finished).

Added doc.go files to guest packages to prevent linter/compiler errors
under windows.

The tests themselves are broken out here:
microsoft#1352
microsoft#1351

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Apr 25, 2022
Exposed data and functionality for testing GCS:
* `internal\guest\runtime\hcsv2.Container.InitProcess()`
* `internal\guest\runtime\hcsv2.GetOrAddNetworkNamespace()`
* `internal\guest\runtime\hcsv2.RemoveNetworkNamespace()`
* `internal\guest\runtime\hcsv2.Host.SecurityPolicyEnforcer()`
* `internal\guest\runtime\hcsv2.Host.Transport()`

Fixed bug where `host.RemoveContainer` did not remove the network
namespace for standalone and pod containers.

Updated go-winio version to include bugfixes for closing hvsockets,
specifically to close a socket for writing (needed by internal\cmd
to signal that the stdin stream has finished).

Added doc.go files to guest packages to prevent linter/compiler errors
under windows.

The tests themselves are broken out here:
microsoft#1352
microsoft#1351

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request May 3, 2022
Exposed data and functionality for testing GCS:
* `internal\guest\runtime\hcsv2.Container.InitProcess()`
* `internal\guest\runtime\hcsv2.GetOrAddNetworkNamespace()`
* `internal\guest\runtime\hcsv2.RemoveNetworkNamespace()`
* `internal\guest\runtime\hcsv2.Host.SecurityPolicyEnforcer()`
* `internal\guest\runtime\hcsv2.Host.Transport()`

Fixed bug where `host.RemoveContainer` did not remove the network
namespace for standalone and pod containers.

Updated go-winio version to include bugfixes for closing hvsockets,
specifically to close a socket for writing (needed by internal\cmd
to signal that the stdin stream has finished).

Added doc.go files to guest packages to prevent linter/compiler errors
under windows.

The tests themselves are broken out here:
microsoft#1352
microsoft#1351

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit that referenced this pull request May 3, 2022
Exposed data and functionality for testing GCS:
* `internal\guest\runtime\hcsv2.Container.InitProcess()`
* `internal\guest\runtime\hcsv2.GetOrAddNetworkNamespace()`
* `internal\guest\runtime\hcsv2.RemoveNetworkNamespace()`
* `internal\guest\runtime\hcsv2.Host.SecurityPolicyEnforcer()`
* `internal\guest\runtime\hcsv2.Host.Transport()`

Fixed bug where `host.RemoveContainer` did not remove the network
namespace for standalone and pod containers.

Updated go-winio version to include bugfixes for closing hvsockets,
specifically to close a socket for writing (needed by internal\cmd
to signal that the stdin stream has finished).

Added doc.go files to guest packages to prevent linter/compiler errors
under windows.

The tests themselves are broken out here:
#1352
#1351

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
@helsaawy helsaawy force-pushed the he/gcstest branch 4 times, most recently from 9863dc7 to 39313fa Compare May 4, 2022 20:29
@helsaawy helsaawy marked this pull request as ready for review May 4, 2022 20:37
@helsaawy helsaawy requested a review from a team as a code owner May 4, 2022 20:37
Copy link
Contributor

@anmaxvl anmaxvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! A few comments regarding some potential cleanup.

Cleanup functions seemed to be called in wrong order in a couple of places. we could potentially defer call them, so it's easier to track? but that would involve refactoring the for loop bodies into functions.

scripts/Test-LCOW-UVM.ps1 Show resolved Hide resolved
test/gcs/main_test.go Outdated Show resolved Hide resolved
test/gcs/helper_container_test.go Show resolved Hide resolved
test/gcs/helper_conn_test.go Outdated Show resolved Hide resolved
test/gcs/helper_conn_test.go Outdated Show resolved Hide resolved
test/gcs/cri_bench_test.go Show resolved Hide resolved
test/gcs/container_test.go Show resolved Hide resolved
@helsaawy
Copy link
Contributor Author

Thanks for this! A few comments regarding some potential cleanup.

Cleanup functions seemed to be called in wrong order in a couple of places. we could potentially defer call them, so it's easier to track? but that would involve refactoring the for loop bodies into functions.

I like the function idea a lot. Ill try and knock that out in another PR after this gets merged, if thats allright?

Copy link
Contributor

@anmaxvl anmaxvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

$RootfsMount = '/run/rootfs',

[string]
$RootfsPath = (Join-Path $BootFilesPath 'rootfs.vhd'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also plan to later add support for using an initrd.img instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about that; yes, we should add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a flag to switch the uVM boot from vhd to initrd, but also, this flag is actually for mounting the rootfs vhd into the uVM to use as a container image for runc. Ideally it would be something like "C:\ContainerPlatData\root\io.containerd.snapshotter.v1.windows-lcow\snapshots\2\layer.vhd", but I didnt want to add a dependency on using crictl or ctr to get layer paths...
I updated the variable names to clarify

@katiewasnothere
Copy link
Contributor

If I understand the PR correctly, there are two use cases here: normal testing of the guest code and benchmarking new kernel images.

My biggest concern here is that since we don't have an explicit contract for things like what fields need to be filled out or what needs to be setup and when to run a container, these tests could break easily or become outdated pretty quickly. Even though we would likely want to run these nightly, given that we can't run these as part of CI I feel like that concern is a little higher.

So I guess the question is, how do we avoid that? Do we have a contract somewhere that defines what's needed?

@helsaawy
Copy link
Contributor Author

helsaawy commented Jun 8, 2022

If I understand the PR correctly, there are two use cases here: normal testing of the guest code and benchmarking new kernel images.

I imagine the benchmarking won't just be for new kernel images, but for code changes in general.
But yes.

So I guess the question is, how do we avoid that? Do we have a contract somewhere that defines what's needed?

Ideally the contract would be the tests themselves. Since the benchmarks rely on the same code as the tests, if the tests fail, the benchmarks will also fail, and updating the tests should (hopefully) keep the benchmarks up to date.
However, I anticipate adding both to the build pipeline, so that require us to keep them updated.

If we do add interfaces on the GCS side that provide a more rigid contract that testing can consume, that would also help allay the issue, I think.

@katiewasnothere
Copy link
Contributor

Ideally the contract would be the tests themselves. Since the benchmarks rely on the same code as the tests, if the tests fail, the benchmarks will also fail, and updating the tests should (hopefully) keep the benchmarks up to date.
However, I anticipate adding both to the build pipeline, so that require us to keep them updated.

If we do add interfaces on the GCS side that provide a more rigid contract that testing can consume, that would also help allay the issue, I think.

Yeah I think that would be good. That and making sure there's code sharing between the tests and the "normal" code flow as much as possible so we can guarantee that if the contract is broken, the tests do actually fail like you mentioned.

I added a few additional comments, but overall I think this LGTM though I'd like to explore what I mentioned above.

@helsaawy
Copy link
Contributor Author

The GCS test code does eventually call normal-flow code, but I agree that we should integrate the two closer.
Ideally we'd do that as we refactor the GCS code, since its kinda spaghetti-ish rn, and having a test framework up and running should make rewriting that code easier.

-TestCmd $boot `
-TestCmdPreamble $testcmd `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't $boot include $cmd which has $testcmd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but -TestCmdPreamble just writes out the string argument before execing the tests.
its a bit convoluted, but this way, before the test output runs, we get headers with the test info, including the command that was run, so we can just save the output on its own, if need be, and still have all the data we need:

date: 20220412T1701329409
note: 46-117
test.command: .\bin\test\functional.exe -feature="LCOW"  "-test.shuffle=on" "-test.run=^#" "-test.bench=."  "-test.benchmem" "-test.benchtime=5s" "-test.count=5" 
repo.commit: 5631c613afeabbac116133f06a4b2ca17fe83d8e
-test.shuffle 1649797293380880100
goos: windows
goarch: amd64
pkg: github.com/Microsoft/hcsshim/test/functional
cpu: Intel(R) Xeon(R) W-2235 CPU @ 3.80GHz
BenchmarkLCOWStart-12                            	       7	 806345986 ns/op	   16400 B/op	     157 allocs/op
# ...

If we printed $boot, it would have all the boot and mount and a bunch of other redundant info

Added testing suite that can built and run directly on the Linux uVM
by sharing or adding it to the rootfs.
It primarily focuses on container (standalone and CRI) management.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Fixed bug with calling `*hcsv2.Host.GetContainer` instead of
`*hcsv2.Host.GetCreatedContainer`.

Removed left over comments, added clarifying comments to
`assertNumberContainers` and `listContaienrStates` interactions.

Reordered namespace and rootfs cleanup.

Removed underscore from consts
Removed unneeded constants
Added flag to test-lcow-uvm script to change boot type from vhd to
initrd.

Update security policy code to use enforcer.
Updated script for changes to `uvmboot` and to use default executable
name (`gcs.test`), as produced by `go test -c`.

Linting issues:
 - `switch` to `if`
 - unused `getContainer()`
 - unused receivers

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Copy link
Contributor

@katiewasnothere katiewasnothere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@helsaawy helsaawy merged commit ed32773 into microsoft:main Sep 16, 2022
@helsaawy helsaawy deleted the he/gcstest branch September 16, 2022 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants