Skip to content

Latest commit

 

History

History

e2e

Agentbaker E2E Testing

This directory contains all files pertaining to our own implementation of an E2E testing framework for AgentBaker.

E2E testing for Linux is currently implemented using a Golang framework built from the ground-up. Note that we soon plan on moving Windows over to this testing framework as well.

The goal of E2E testing with AgentBaker is to ensure that the node bootstrapping artifacts generted and returned by the primary AgentBaker API not only contain expected content, but also contain correct content that can be used as-is to bootstrap real Azure VMs so they can join real AKS clusters.

From a high-level, each E2E scenario makes a call out to the primary node-bootstrapping API GetLatestNodeBootstrapping with a set of parameters (represented by a NodeBootstrappingConfiugration) which define the given scenario to generate CSE and custom data. A new VMSS containing a single VM will then be created and associated with an AKS cluster that is already running in Azure. The CSE and custom data generated by AgentBaker will then be applied to the new VM so it can bootstrap and register itself with the apiserver of the running cluster. Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady to the cluster's apiserver, and that workload pods can successfully be run on it. Lastly, a set of validation commands are remotely executed on the VM to ensure its live state (file existsnce, sysctl settings, etc.) is as expected.

Running Locally

Note: if you have changed code or artifacts used to generate custom data or custom script extension payloads, you should first run make generate from the root of the AgentBaker repository.

To run the Go implementation of the E2E test suite locally, simply use e2e-local.sh. This script will setup the go test command for you while also implementing defaulting logic for a set of required environment variables used to interact with Azure. These environment variables include:

  • SUBSCRIPTION_ID - default 8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8 (ACS Test Subscription)
  • LOCATION - default: eastus
  • AZURE_TENANT_ID - default: 72f988bf-86f1-41af-91ab-2d7cd011db47

SCENARIOS_TO_RUN may also optionally be set to specify a subset of the E2E scenarios to run during the testing session as a comma-separated list, for example:

SCENARIOS_TO_RUN=ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3 ./e2e-local.sh

Furthermore, SCENARIOS_TO_EXCLUDE may also optionally be set to specify the set of scenarios which will be excluded from the testing session as a commma-separated list. If both SCENARIOS_TO_RUN and SCENARIOS_TO_EXCLUDE are specified, SCENARIOS_TO_RUN will take precedence.

KEEP_VMSS can also be optionally specified to have the test suite retain the bootstrapped VM(s) for further debugging. When this option is specified, the private SSH key used to connect to each VM will be included within each scenario's log bundle respectively.

Note that when using e2e-local.sh, a timeout value of 90 minutes is applied to the go test command.

You may also run the test command with custom arguments yourself (assuming you've properly setup the required environment variables) from within the e2e/ directory like so:

go test -timeout 90m -v -run Test_All ./

Running locally in VS code debug mode

You can also run the test locally in VS code debug mode by adding the required test env variables to settings.json.

Steps:

  1. Go to Settings (click gear button at the bottom left corner or press Ctrl + , (windows))
  2. Type testenv in search bar and you should see one of the results is Go: Test Env Vars.
  3. Click on Edit in settings.json link and then a global settings.json will pop up.
  4. Add the following default settings in the first level. See the screenshot below for reference.
"go.testEnvVars": {
    "SUBSCRIPTION_ID": "8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8",
    "LOCATION": "eastus",
    "AZURE_TENANT_ID": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "SCENARIOS_TO_RUN": "ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3",
},

Note: SCENARIOS_TO_RUN specifies what scenarios you want to test. You can change it as desired.

  1. Now you can go to VS code > Test tab > github.com/Azure/agentbakere2e > suite_test.go > Test_All. On the right you will see a Debug test button. You can also set breakpoints.

alt text

Note: You can probably edit the /.vscode/settings.json for local project use but I haven't tried.

Package Structure

The top-level package of the Golang E2E implementation is named e2e_test and is entirely separate from all AgentBaker packages.

The e2e_test package has a dependency on subpackage located in the scenario directory. Package scenario is where all E2E scenarios are defined, each in their own separate files. This package also defines common types related to scenario and scenario configuration, as well as the hard-coded list of SIG version IDs located in images.go used for testing different OS distros. Package scenario also contains the implementation of common cluster selectors and mutators within clusterconfiguration.go, though each scenario could define their own implementations if needed.

The primary testing function is located in suite_test.go, which is run by go test ....

E2E VHDs

When configuring E2E scenarios, a VHDSelector must be specified in order to tell the suite which particular VHD it should use to bootstrap the VM.

VHDSelectors select from a "base" VHD catalog, initialized from scenario/base_vhd_catalog.json as an embedding. Each entry in the catalog is represented as a VHD, which contains a resource ID that gets injected into the VMSS model when the given scenario is ran. The aforementioned JSON file contains configurations for the current set of default catalog entries. At any given time, those default entries will point to VHDs stored within our testing subscription, guarded by resouce deletion locks.

For example, scenario_ubuntu2204.go defines the Ubuntu 2204 scenario, which specifies the Ubuntu2204Gen2Containerd VHD selector. This selector will always select the Ubuntu2204/gen2 VHD catalog entry from the base catalog. If running the suite using some arbitrary VHD build for testing, then the selector will take the corresponding Ubuntu2204/gen2 VHD from the given build instead of the default entry.

Updating Default Catalog Entries

To update the set of default VHD catalog entries to point towards new VHDs, simply update the resourceId field of the respective VHD within scenario/base_vhd_catalog.json. If you're making this change as a part of a PR, you need to make sure to lock the new VHDs with resource deletion locks to ensure they're always available going forward. Note that if you run the suite in a region other than eastus, you'll need to make sure the VHDs you point the suite towards are appropriately replicated in the given region as well.

Using Arbitrary VHD Builds

If you'd like to run the E2E suite using a set of VHDs built from some arbitrary run of the VHD build pipeline in the MSFT tenant, you can do so by specifying the ID of the build. This is an alternative to manually updating the set of default VHD catalog entries. If a given scenario is ran which selects a VHD that was not built as a part of the specified VHD build, the selector will select the corresponding default catalog entry instead.

NOTE: This feature can only be used with test VHD builds, using builds from official build pipeline is not supported.

VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3 ./e2e-local.sh

NOTE: To utilize this feature, you'll also need to provide the suite with an ADO PAT (personal access token) with which it can access the ADO resources to download the appropriate build artifacts.

To specify your PAT, simply set the ADO_PAT environment variable accordingly:

ADO_PAT=<secret> VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3 ./e2e-local.sh

or:

export ADO_PAT=<secret>
VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3 ./e2e-local.sh
VHD_BUILD_ID=234567891 SCENARIOS_TO_RUN=ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3 ./e2e-local.sh
...
VHD_BUILD_ID=345678912 SCENARIOS_TO_RUN=ubuntu2204,ubuntu2204-arm64,ubuntu2204-gpu-ncv3 ./e2e-local.sh

Registering New VHD SKUs for E2E Testing

When adding a new scenario which uses a VHD that doesn't currently have an associated entry in the base catalog, please make sure to follow these steps to register it with the suite:

  1. Build and delete-lock the underlying image version to be referenced in the base catalog
  2. Update base_vhd_catalog.json with a new entry, referencing the resource ID of the new VHD built in the previous step, as well as the VHD's artifact name. The artifact name is used when downloading publishing info artifacts from VHD builds in ADO. To determine this value:
    1. Navigate to the latest run of the [TEST All VHDs] AKS Linux VHD Build - Msft Tenant build which has built the SKU you'd like to register (or queue a new build which includes the particular SKU).
    2. Navigate to the particular run's published artifacts and identitfy the publishing-info-<artifactName> artifact for your SKU. The suffix of this string after publishing-info- is the name of the artifact.
    3. Alternatively, you can get this value from navigating to .vsts-vhd-builder-release.yaml, identifying the corresponding build stage for your SKU, and looking at the value of artifactName specified when calling the .builder-release-template.yaml template.
  3. Within scenario/vhd.go, update the corresponding subcatalog struct (e.g. Ubuntu2204, AzureLinuxV2) with the new entry, and correctly add its corresponding JSON tag used to unmarshal from base_vhd_catalog.json
  4. Also within scenario/vhd.go, add a corresponding case block to the switch statement within addEntryFromPublishingInfo() to make sure the VHD's name (parsed from the publishing info file) is associated with the new subcatalog entry added in the previous step - this is to ensure that catalog entries are properly overwritten when using VHDs from arbitrary testing builds
  5. Add a new VHDSelector within scenario/vhd.go in the form of a method on the *VHDCatalog type, which returns the new entry of the given subcatalog added in step 3
  6. Reference the new VHDSelector added in the previous step when defining the new E2E scenario(s).

Example PR: TODO(cameissner)

Scenarios

Minimally, each E2E scenario is parameterized with a set of "mutators" that change/set various properties of a base NodeBootstrappingConfiguration struct. This struct is then fed into GetLatestNodeBootstrapping to generate CSE and custom data. The most commonly mutated property of this struct across all scenarios is the OS distro. This is primarily because each scenario currently uses a separate VHD corresponding to the respective distro.

E2E scenarios can also be configured with VMSS configuration mutators that change/set properties on the VMSS model used to deploy the new VM to be bootstrapped. This is primarily useful when testing out different VM SKUs, especially for GPU-enabled scenarios which affect which code paths AgentBaker will use to generate CSE and custom data

Further, in order to support E2E scenarios which test different underlying AKS cluster configurations, such as the cluster's network plugin, each E2E scenario has its own "cluster selector" and "cluster mutator". Cluster selectors determine whether or not the given live AKS cluster is viable for running the given scenario, while cluster mutators will mutate a base AKS cluster model such that the model represents a cluster which is viable for running the given scenario. For example, a scenario meant to run on an AKS cluster configured with the kubenet network plugin would have a cluster selector which selects on the NetworkProfile.NetworkPlugin property specifically for kubenet, while its cluster mutator would set this property to kubenet so a new cluster can be created for it to run on.

Lastly, E2E scenarios also consist of a list of live VM validators. Each live VM validator consists of a description, a bash command which will actually be run on the newly bootstrapped VM, and an "asserter" function that will perform assertions on the contents of both the stdout and stderr streams that result from the execution of the command. The validators can be used to assert on numerous types of properties of the live VM, such as the live file system and kernel state.

You can find all implemented scenarios in the scenario pacakge within files prefixed with scenario_. The Scenario struct definition can be found in scenario/types.go.

Implementation

To implement a new scenario, you need to do the following:

  1. Create a new file in the scenario package directory named scenario_<scenario-name>.go
  2. Within this new file, implement a private function with a representative name which returns a *Scenario representing the scenario's configuration
  3. Add a call to the newly implemented function within the return value of the scenarios() function defined in scenarios/init.go
  4. Implement any additional logic in the testing framework required by the new scenario

Log Collection

Each E2E scenario will generate its own logs after execution. Currently, these logs consist of:

  • cluster-provision.log - CSE execution log, retrieved from /var/log/azure/aks/cluster-provision.log (collected in success and CSE failure cases)
  • kubelet.log - the kubelet systemd unit's logs retrived by running journalctl -u kubelet on the VM after bootstrapping has finished (collected in success and CSE failure cases)
  • vmssId.txt - a single line text file containing the unique resource ID of the VMSS created by the respective scenario, mainly collected for the purposes of posthoc resource deletion (collected in all cases where the VMSS is able to be created)

These logs will be uploaded in a bundle of the format:

└── scenario-logs
    └── <scenario>
        ├── cluster-provision.log
        ├── kubelet.log
        ├── vmssId.txt

Coverage report

After a PR is created in AgentBaker's repo on GitHub, a pipeline calculating code coverage changes will automatically run.

We are utilizing coveralls to display the coverage report. The coverage report will be available in the PR's description. You can also view previous runs for the AgentBaker repo here.

We calculate code coverage for both unit tests and E2E tests.

E2E coverage report

To generate E2E coverage reports, we use code coverage changes introduced in Go 1.20.

Coverage report is generated by running AgentBaker's API server locally as a binary created with the -cover flag. E2E tests are then ran against that binary.

The following packages are used during calculation of coverage for E2E tests:

- github.com/Azure/agentbaker/apiserver  
- github.com/Azure/agentbaker/cmd 
- github.com/Azure/agentbaker/cmd/starter
- github.com/Azure/agentbaker/pkg/agent
- github.com/Azure/agentbaker/pkg/agent/datamodel
- github.com/Azure/agentbaker/pkg/templates

Generating E2E coverage report locally

You can generate an E2E coverage report while running the E2E tests locally. To do so, follow the steps below:

  1. Build the AgentBaker server binary with -cover flag:
  cd cmd 
  go build -cover -o baker -covermode count 
  GOCOVERDIR=covdatafiles ./baker start &
  1. Create directory for coverage report files
  mkdir -p covdatafiles
  1. Run the binary
  GOCOVERDIR=covdatafiles ./baker start &
  1. Run the E2E tests locally
  /bin/bash e2e/e2e-local.sh
  1. Stop the binary - once the tests finish executing, you have to stop the binary with exit code 0 to generate the report. See the docs here.
  kill $(pgrep baker)
  1. Display the coverage report within the terminal
  go tool covdata percent -i=./cmd/somedata