Skip to content

Commit

Permalink
enos: use on-demand targets (#21459) (#21463)
Browse files Browse the repository at this point in the history
Add an updated `target_ec2_instances` module that is capable of
dynamically splitting target instances over subnet/az's that are
compatible with the AMI architecture and the associated instance type
for the architecture. Use the `target_ec2_instances` module where
necessary. Ensure that `raft` storage scenarios don't provision
unnecessary infrastructure with a new `target_ec2_shim` module.

After a lot of trial, the state of Ec2 spot instance capacity, their
associated APIs, and current support for different fleet types in AWS
Terraform provider, have proven to make using spot instances for
scenario targets too unreliable.

The current state of each method:
* `target_ec2_fleet`: unusable due to the fact that the `instant` type
  does not guarantee fulfillment of either `spot` or `on-demand`
  instance request types. The module does support both `on-demand` and
  `spot` request types and is capable of bidding across a maximum of
  four availability zones, which makes it an attractive choice if the
  `instant` type would always fulfill requests. Perhaps a `request` type
  with `wait_for_fulfillment` option like `aws_spot_fleet_request` would
  make it more viable for future consideration.
* `target_ec2_spot_fleet`: more reliable if bidding for target instances
  that have capacity in the chosen zone. Issues in the AWS provider
  prevent us from bidding across multiple zones succesfully. Over the
  last 2-3 months target capacity for the instance types we'd prefer to
  use has dropped dramatically and the price is near-or-at on-demand.
  The volatility for nearly no cost savings means we should put this
  option on the shelf for now.
* `target_ec2_instances`: the most reliable method we've got. It is now
  capable of automatically determing which subnets and availability
  zones to provision targets in and has been updated to be usable for
  both Vault and Consul targets. By default we use the cheapest medium
  instance types that we've found are reliable to test vault.

* Update .gitignore
* enos/modules/create_vpc: create a subnet for every availability zone
* enos/modules/target_ec2_fleet: bid across the maximum of four
  availability zones for targets
* enos/modules/target_ec2_spot_fleet: attempt to make the spot fleet bid
  across more availability zones for targets
* enos/modules/target_ec2_instances: create module to use
  ec2:RunInstances for scenario targets
* enos/modules/target_ec2_shim: create shim module to satisfy the
  target module interface
* enos/scenarios: use target_ec2_shim for backend targets on raft
  storage scenarios
* enos/modules/az_finder: remove unsed module

Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
  • Loading branch information
1 parent 39eb1d6 commit 84d2bb1
Show file tree
Hide file tree
Showing 17 changed files with 511 additions and 89 deletions.
19 changes: 7 additions & 12 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,18 +60,13 @@ Vagrantfile
!enos/**/*.hcl

# Enos
enos/.enos
enos/enos-local.vars.hcl
enos/support
# Enos local Terraform files
enos/.terraform/*
enos/.terraform.lock.hcl
enos/*.tfstate
enos/*.tfstate.*
enos/**/.terraform/*
enos/**/.terraform.lock.hcl
enos/**/*.tfstate
enos/**/*.tfstate.*
.enos
enos-local.vars.hcl
enos/**/support
enos/**/kubeconfig
.terraform
.terraform.lock.hcl
.tfstate.*

.DS_Store
.idea
Expand Down
39 changes: 26 additions & 13 deletions enos/enos-modules.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -58,27 +58,40 @@ module "shutdown_multiple_nodes" {
source = "./modules/shutdown_multiple_nodes"
}

# create target instances using ec2:CreateFleet
module "target_ec2_fleet" {
source = "./modules/target_ec2_fleet"

capacity_type = "on-demand" // or "spot", use on-demand until we can stabilize spot fleets
common_tags = var.tags
instance_mem_min = 4096
instance_cpu_min = 2
max_price = "0.1432" // On-demand cost for RHEL amd64 on t3.medium in us-east
project_name = var.project_name
ssh_keypair = var.aws_ssh_keypair_name
common_tags = var.tags
project_name = var.project_name
ssh_keypair = var.aws_ssh_keypair_name
}

# create target instances using ec2:RunInstances
module "target_ec2_instances" {
source = "./modules/target_ec2_instances"

common_tags = var.tags
project_name = var.project_name
ssh_keypair = var.aws_ssh_keypair_name
}

# don't create instances but satisfy the module interface
module "target_ec2_shim" {
source = "./modules/target_ec2_shim"

common_tags = var.tags
project_name = var.project_name
ssh_keypair = var.aws_ssh_keypair_name
}

# create target instances using ec2:RequestSpotFleet
module "target_ec2_spot_fleet" {
source = "./modules/target_ec2_spot_fleet"

common_tags = var.tags
instance_mem_min = 4096
instance_cpu_min = 2
max_price = "0.1432" // On-demand cost for RHEL amd64 on t3.medium in us-east
project_name = var.project_name
ssh_keypair = var.aws_ssh_keypair_name
common_tags = var.tags
project_name = var.project_name
ssh_keypair = var.aws_ssh_keypair_name
}

module "vault_agent" {
Expand Down
2 changes: 1 addition & 1 deletion enos/enos-scenario-agent.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ scenario "agent" {
}

step "create_vault_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand Down
4 changes: 2 additions & 2 deletions enos/enos-scenario-autopilot.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ scenario "autopilot" {
}

step "create_vault_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand Down Expand Up @@ -194,7 +194,7 @@ scenario "autopilot" {
}

step "create_vault_cluster_upgrade_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand Down
14 changes: 7 additions & 7 deletions enos/enos-scenario-replication.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ scenario "replication" {

# Create all of our instances for both primary and secondary clusters
step "create_primary_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [
step.create_vpc,
]
Expand All @@ -129,7 +129,7 @@ scenario "replication" {
}

step "create_primary_cluster_backend_targets" {
module = module.target_ec2_spot_fleet
module = matrix.primary_backend == "consul" ? module.target_ec2_instances : module.target_ec2_shim
depends_on = [
step.create_vpc,
]
Expand All @@ -139,7 +139,7 @@ scenario "replication" {
}

variables {
ami_id = step.ec2_info.ami_ids["amd64"]["ubuntu"]["22.04"]
ami_id = step.ec2_info.ami_ids["arm64"]["ubuntu"]["22.04"]
awskms_unseal_key_arn = step.create_vpc.kms_key_arn
cluster_tag_key = local.backend_tag_key
common_tags = local.tags
Expand All @@ -148,7 +148,7 @@ scenario "replication" {
}

step "create_primary_cluster_additional_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [
step.create_vpc,
step.create_primary_cluster_targets,
Expand All @@ -169,7 +169,7 @@ scenario "replication" {
}

step "create_secondary_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand All @@ -186,15 +186,15 @@ scenario "replication" {
}

step "create_secondary_cluster_backend_targets" {
module = module.target_ec2_spot_fleet
module = matrix.secondary_backend == "consul" ? module.target_ec2_instances : module.target_ec2_shim
depends_on = [step.create_vpc]

providers = {
enos = provider.enos.ubuntu
}

variables {
ami_id = step.ec2_info.ami_ids["amd64"]["ubuntu"]["22.04"]
ami_id = step.ec2_info.ami_ids["arm64"]["ubuntu"]["22.04"]
awskms_unseal_key_arn = step.create_vpc.kms_key_arn
cluster_tag_key = local.backend_tag_key
common_tags = local.tags
Expand Down
6 changes: 3 additions & 3 deletions enos/enos-scenario-smoke.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ scenario "smoke" {
}

step "create_vault_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand All @@ -128,15 +128,15 @@ scenario "smoke" {
}

step "create_vault_cluster_backend_targets" {
module = module.target_ec2_spot_fleet
module = matrix.backend == "consul" ? module.target_ec2_instances : module.target_ec2_shim
depends_on = [step.create_vpc]

providers = {
enos = provider.enos.ubuntu
}

variables {
ami_id = step.ec2_info.ami_ids["amd64"]["ubuntu"]["22.04"]
ami_id = step.ec2_info.ami_ids["arm64"]["ubuntu"]["22.04"]
awskms_unseal_key_arn = step.create_vpc.kms_key_arn
cluster_tag_key = local.backend_tag_key
common_tags = local.tags
Expand Down
6 changes: 3 additions & 3 deletions enos/enos-scenario-ui.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ scenario "ui" {
}

step "create_vault_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand All @@ -95,15 +95,15 @@ scenario "ui" {
}

step "create_vault_cluster_backend_targets" {
module = module.target_ec2_spot_fleet
module = matrix.backend == "consul" ? module.target_ec2_instances : module.target_ec2_shim
depends_on = [step.create_vpc]

providers = {
enos = provider.enos.ubuntu
}

variables {
ami_id = step.ec2_info.ami_ids["amd64"]["ubuntu"]["22.04"]
ami_id = step.ec2_info.ami_ids["arm64"]["ubuntu"]["22.04"]
awskms_unseal_key_arn = step.create_vpc.kms_key_arn
cluster_tag_key = local.backend_tag_key
common_tags = local.tags
Expand Down
6 changes: 3 additions & 3 deletions enos/enos-scenario-upgrade.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ scenario "upgrade" {
}

step "create_vault_cluster_targets" {
module = module.target_ec2_spot_fleet
module = module.target_ec2_instances
depends_on = [step.create_vpc]

providers = {
Expand All @@ -123,15 +123,15 @@ scenario "upgrade" {
}

step "create_vault_cluster_backend_targets" {
module = module.target_ec2_spot_fleet
module = matrix.backend == "consul" ? module.target_ec2_instances : module.target_ec2_shim
depends_on = [step.create_vpc]

providers = {
enos = provider.enos.ubuntu
}

variables {
ami_id = step.ec2_info.ami_ids["amd64"]["ubuntu"]["22.04"]
ami_id = step.ec2_info.ami_ids["arm64"]["ubuntu"]["22.04"]
awskms_unseal_key_arn = step.create_vpc.kms_key_arn
cluster_tag_key = local.backend_tag_key
common_tags = local.tags
Expand Down
15 changes: 12 additions & 3 deletions enos/modules/create_vpc/main.tf
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
data "aws_region" "current" {}
data "aws_availability_zones" "available" {
state = "available"

filter {
name = "zone-name"
values = ["*"]
}
}

resource "random_string" "cluster_id" {
length = 8
Expand Down Expand Up @@ -34,14 +41,16 @@ resource "aws_vpc" "vpc" {
}

resource "aws_subnet" "subnet" {
count = length(data.aws_availability_zones.available.names)
vpc_id = aws_vpc.vpc.id
cidr_block = var.cidr
cidr_block = cidrsubnet(var.cidr, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true

tags = merge(
var.common_tags,
{
"Name" = "${var.name}-subnet"
"Name" = "${var.name}-subnet-${data.aws_availability_zones.available.names[count.index]}"
},
)
}
Expand Down
5 changes: 0 additions & 5 deletions enos/modules/create_vpc/outputs.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
output "aws_region" {
description = "AWS Region for resources"
value = data.aws_region.current.name
}

output "vpc_id" {
description = "Created VPC ID"
value = aws_vpc.vpc.id
Expand Down
49 changes: 31 additions & 18 deletions enos/modules/target_ec2_fleet/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -76,15 +76,23 @@ resource "random_string" "unique_id" {
special = false
}

// ec2:CreateFleet only allows up to 4 InstanceRequirements overrides so we can only ever request
// a fleet across 4 or fewer subnets if we want to bid with InstanceRequirements instead of
// weighted instance types.
resource "random_shuffle" "subnets" {
input = data.aws_subnets.vpc.ids
result_count = 4
}

locals {
spot_allocation_strategy = "price-capacity-optimized"
spot_allocation_strategy = "lowestPrice"
on_demand_allocation_strategy = "lowestPrice"
instances = toset([for idx in range(var.instance_count) : tostring(idx)])
cluster_name = coalesce(var.cluster_name, random_string.random_cluster_name.result)
name_prefix = "${var.project_name}-${local.cluster_name}-${random_string.unique_id.result}"
fleet_tag = "${local.name_prefix}-spot-fleet-target"
fleet_tags = {
Name = "${local.name_prefix}-target"
Name = "${local.name_prefix}-${var.cluster_tag_key}-target"
"${var.cluster_tag_key}" = local.cluster_name
Fleet = local.fleet_tag
}
Expand Down Expand Up @@ -218,6 +226,20 @@ resource "aws_launch_template" "target" {
name = aws_iam_instance_profile.target.name
}

instance_requirements {
burstable_performance = "included"

memory_mib {
min = var.instance_mem_min
max = var.instance_mem_max
}

vcpu_count {
min = var.instance_cpu_min
max = var.instance_cpu_max
}
}

network_interfaces {
associate_public_ip_address = true
delete_on_termination = true
Expand Down Expand Up @@ -251,7 +273,9 @@ resource "aws_launch_template" "target" {
# Unless we see capacity issues or instances being shut down then we ought to
# stick with that strategy.
resource "aws_ec2_fleet" "targets" {
terminate_instances = true // termiante instances when we "delete" the fleet
replace_unhealthy_instances = false
terminate_instances = true // terminate instances when we "delete" the fleet
terminate_instances_with_expiration = false
tags = merge(
var.common_tags,
local.fleet_tags,
Expand All @@ -264,22 +288,11 @@ resource "aws_ec2_fleet" "targets" {
version = aws_launch_template.target.latest_version
}

override {
max_price = var.max_price
subnet_id = data.aws_subnets.vpc.ids[0]

instance_requirements {
burstable_performance = "included"

memory_mib {
min = var.instance_mem_min
max = var.instance_mem_max
}
dynamic "override" {
for_each = random_shuffle.subnets.result

vcpu_count {
min = var.instance_cpu_min
max = var.instance_cpu_max
}
content {
subnet_id = override.value
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion enos/modules/target_ec2_fleet/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ variable "common_tags" {
description = "Common tags for cloud resources"
type = map(string)
default = {
Project = "Vault"
Project = "vault-ci"
}
}

Expand Down

0 comments on commit 84d2bb1

Please sign in to comment.