Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken rotation of cert/key generated/used by Vault for Consul secrets engine backend #26670

Open
cheeseburgermotivated opened this issue Apr 26, 2024 · 2 comments
Labels
bug Used to indicate a potential bug ecosystem secret/consul

Comments

@cheeseburgermotivated
Copy link

Describe the bug
Rotating the client certificate/key used in a Consul secret backend for Vault does not seem possible when Vault itself is used to bootstrap the Consul ACL system. The /{mount}/config/access endpoint which is used to set the certificate/key will overwrite all fields, so when we send an updated cert without the bootstrap token, Vault assumes that we want to bootstrap the ACL system of the associated Consul cluster, which fails since Consul says No. We don't have the token because Vault swallows it when it issues the bootstrap command to Consul, and we shouldn't need to know it since Vault is in charge of the process and should have it stored somewhere.

This failure looks like:

vault_pki_secret_backend_cert.consul-dev-client-cert: Destroying... [id=internalca/vault-consul-dev/consul-dev-client.vault.dev.supercompany.com]
vault_pki_secret_backend_cert.consul-dev-client-cert: Destruction complete after 0s
vault_pki_secret_backend_cert.consul-dev-client-cert: Creating...
vault_pki_secret_backend_cert.consul-dev-client-cert: Creation complete after 1s [id=internalca/vault-consul-dev/consul-dev-client.vault.dev.supercompany.com]
vault_consul_secret_backend.consul-dev: Modifying... [id=consul-dev]
╷
│ Error: error configuring Consul configuration for "consul-dev": Error making API request.
│
│ URL: PUT https://vault.dev.supercompany.com:8200/v1/consul-dev/config/access
│ Code: 400. Errors:
│
│ * Token not provided and failed to bootstrap ACLs: Unexpected response code: 403 (Permission denied: rpc error making call: ACL bootstrap no longer allowed (reset index: 12345))
│
│   with vault_consul_secret_backend.consul-dev,
│   on consul-dev.tf line 32, in resource "vault_consul_secret_backend" "consul-dev":
│   32: resource "vault_consul_secret_backend" "consul-dev" {
│
╵

The above did not actually result in a failure immediately, however. The old certificate was deleted, the new one was generated, but the consul secrets engine was not yet aware of the change. After the certificate actually expired, communication between Consul and Vault stopped. This failure looked like so:

Vault error occurred: Put "https://consul.dev.supercompany.com:8501/v1/acl/token": remote error: tls: bad certificate, on get https://vault.dev.supercompany.com:8200/v1/consul-dev/creds/consul-server

To Reproduce
Steps to reproduce the behavior:

  1. Create docker network
    a. docker network create --driver bridge bstok

  2. Launch Vault container
    a. docker run -dit --name vault.superveryreallydefinitelybogus.com --network bstok -p 8200:8200 --cap-add=IPC_LOCK -e 'VAULT_DEV_ROOT_TOKEN_ID=myroot' -e 'VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:8200' hashicorp/vault

  3. Create sample PKI infrastructure using Terraform
    a. export VAULT_ADDR=http://localhost:8200
    b. export VAULT_TOKEN=myroot
    c. mkdir -p bstok/terraform bstok/docker/consul/config/tls
    d.

cat << 'EOF' > bstok/terraform/main.tf
terraform {
  required_version = ">= 1.5.3"
  required_providers {
    vault = "~> 3.18.0"
  }
}

provider "vault" {
  address = "http://localhost:8200"
}

variable "base_domain" {
  type        = string
  description = "The domain name the CA will issue certificates for"
  default     = "superveryreallydefinitelybogus.com"
}

# root CA
resource "vault_mount" "pki_root" {
  path        = "pki_root"
  type        = "pki"
  description = "This is an example PKI root"

  max_lease_ttl_seconds     = 315360000 #10y
}

resource "vault_pki_secret_backend_root_cert" "root" {
  backend     = vault_mount.pki_root.path
  type        = "internal"
  common_name = var.base_domain
  ttl         = "87600h" #10y
}

resource "vault_pki_secret_backend_config_urls" "config_urls" {
  backend                 = vault_mount.pki_root.path
  issuing_certificates    = ["http://localhost:8200/v1/pki/ca"]
  crl_distribution_points = ["http://localhost:8200/v1/pki/crl"]
}

# intermediate CA
resource "vault_mount" "pki_intermediate" {
  path = "pki_intermediate"
  type = "pki"
  description = "This is an example PKI intermediate"

  max_lease_ttl_seconds = 15780000 #5y
}

resource "vault_pki_secret_backend_intermediate_cert_request" "intermediate_request" {
  backend = vault_mount.pki_intermediate.path
  type = "internal"
  common_name = "${var.base_domain} Intermediate Authority"
}

resource "vault_pki_secret_backend_root_sign_intermediate" "signed_intermediate" {
  backend     = vault_mount.pki_root.path
  csr         = vault_pki_secret_backend_intermediate_cert_request.intermediate_request.csr
  common_name = vault_pki_secret_backend_intermediate_cert_request.intermediate_request.common_name
}

resource "vault_pki_secret_backend_intermediate_set_signed" "set_signed" {
  backend     = vault_mount.pki_intermediate.path
  certificate = vault_pki_secret_backend_root_sign_intermediate.signed_intermediate.certificate
}

# roles
resource "vault_pki_secret_backend_role" "server_role" {
   backend = vault_mount.pki_intermediate.path
   name    = "server_role"
   max_ttl = 259200 #72h

   allowed_domains    = [var.base_domain]
   allowed_uri_sans   = ["server.dc1.consul"]
   allow_any_name     = false
   allow_glob_domains = true
   allow_ip_sans      = true
   allow_subdomains   = true
   enforce_hostnames  = true

   client_flag = false
   server_flag = true
 }

 # client certs
resource "vault_pki_secret_backend_cert" "vault-consul" {
  backend = vault_mount.pki_intermediate.path
  name    = vault_pki_secret_backend_role.server_role.name

  common_name           = "consul.${var.base_domain}"
  ttl                   = 3000
  min_seconds_remaining = 2400
  auto_renew            = true
}

resource "vault_pki_secret_backend_cert" "consul-server" {
  backend = vault_mount.pki_intermediate.path
  name    = vault_pki_secret_backend_role.server_role.name

  common_name           = "consul.${var.base_domain}"
  ttl                   = 28800
  min_seconds_remaining = 14400
  auto_renew            = true
}

# consul secrets engine
##resource "vault_consul_secret_backend" "consul" {
##  path      = "consul"
##  address   = "consul.superveryreallydefinitelybogus.com:8501"
##  scheme    = "https"
##  bootstrap = true
##
##  ca_cert = vault_pki_secret_backend_cert.vault-consul.ca_chain
##  client_cert = join("\n", [
##    vault_pki_secret_backend_cert.vault-consul.certificate,
##    vault_pki_secret_backend_cert.vault-consul.ca_chain
##  ])
##  client_key = vault_pki_secret_backend_cert.vault-consul.private_key
##}
EOF

e. cd bstok/terraform
f. terraform init
g. terraform plan -out=myplan
h. terraform apply myplan

  1. Request certs for Consul
    a. cd ..
    b. vault write -format=json pki_intermediate/issue/server_role common_name="consul.superveryreallydefinitelybogus.com" uri_sans="server.dc1.consul" > certout.json
    c. cat certout.json | jq -r '.data.private_key' > docker/consul/config/tls/cert.key
    d. cat certout.json | jq -r '.data.ca_chain[0]' > docker/consul/config/tls/ca.crt
    e. cat certout.json | jq -r '.data.certificate' > docker/consul/config/tls/cert.crt
    f. cat docker/consul/config/tls/cert.crt docker/consul/config/tls/ca.crt > docker/consul/config/tls/certcombined.crt

  2. Create small consul config file
    a.

cat << EOF > docker/consul/config/config.json
{
  "server": true,
  "log_level": "DEBUG",
  "node_name": "consul-docker",
  "tls": {
      "defaults": {
        "ca_file": "/consul/config/tls/ca.crt",
          "cert_file": "/consul/config/tls/certcombined.crt",
          "key_file": "/consul/config/tls/cert.key",
          "verify_incoming": true,
          "verify_outgoing": true
      }
  },
  "acl": {
      "enabled": true,
      "default_policy": "allow"
  },
  "ports": {
      "https": 8501,
      "grpc_tls": 8503
  },
  "encrypt": "pmsKacTdVOb4x8/Vtr9PWw=="
}
EOF
  1. Launch consul container
    a. docker run -dit --name consul.superveryreallydefinitelybogus.com --network bstok -p 8300:8300 -p 8501:8501 -p 8600:8600/udp -v $(pwd)/docker/consul:/consul hashicorp/consul:1.15.10 agent -server -bootstrap -ui -client=0.0.0.0

  2. Validate that Consul is running
    a. docker exec -it consul.superveryreallydefinitelybogus.com consul members

  3. Uncomment Vault<->Consul backend creation in TF for Vault
    a. cd terraform
    b. sed -i.commented 's/^##//' main.tf

  4. Create Vault<->Consul backend
    a. terraform plan -out=myplan
    b. terraform apply myplan
    i. Should see from TF and Consul logs

vault_consul_secret_backend.consul: Creating...
vault_consul_secret_backend.consul: Creation complete after 0s [id=consul]
2024-04-26 11:07:45 2024-04-26T15:07:45.084Z [INFO]  agent.server.acl: ACL bootstrap completed
2024-04-26 11:07:45 2024-04-26T15:07:45.087Z [DEBUG] agent.http: Request finished: method=PUT url=/v1/acl/bootstrap from=172.20.0.2:55116 latency=6.781417ms
2024-04-26 11:08:48 2024-04-26T15:08:48.791Z [DEBUG] agent: Skipping remote check since it is managed automatically: check=serfHealth
  1. Rotate client cert used with Vault<->Consul
    a. terraform plan -replace vault_pki_secret_backend_cert.vault-consul -out=myplan
    i. Should see '# vault_consul_secret_backend.consul will be updated in-place' with a new cert and key
    Should see '# vault_pki_secret_backend_cert.vault-consul must be replaced'
    b. terraform apply myplan
    i. boom :(
    % terraform apply myplan
    vault_pki_secret_backend_cert.vault-consul: Destroying... [id=pki_intermediate/server_role/consul.superveryreallydefinitelybogus.com]
    vault_pki_secret_backend_cert.vault-consul: Destruction complete after 0s
    vault_pki_secret_backend_role.server_role: Modifying... [id=pki_intermediate/roles/server_role]
    vault_pki_secret_backend_role.server_role: Modifications complete after 0s [id=pki_intermediate/roles/server_role]
    vault_pki_secret_backend_cert.vault-consul: Creating...
    vault_pki_secret_backend_cert.vault-consul: Creation complete after 3s [id=pki_intermediate/server_role/consul.superveryreallydefinitelybogus.com]
    vault_consul_secret_backend.consul: Modifying... [id=consul]
    ╷
    │ Error: error configuring Consul configuration for "consul": Error making API request.
    │
    │ URL: PUT http://localhost:8200/v1/consul/config/access
    │ Code: 400. Errors:
    │
    │ * Token not provided and failed to bootstrap ACLs: Unexpected response code: 403 (Permission denied: ACL bootstrap no longer allowed (reset index: 23))
    │
    │   with vault_consul_secret_backend.consul,
    │   on main.tf line 125, in resource "vault_consul_secret_backend" "consul":
    │  125: resource "vault_consul_secret_backend" "consul" {

Expected behavior
We expected just the new client certificate+key to be set for the secret backend.

Environment:

  • Vault Server Version: 1.14.1
  • Vault CLI Version: 1.14.1
  • Server Operating System/Architecture: Debian 11

Vault server configuration file(s): Can reproduce with bare dev container as shown above

Additional context
I have to believe that we are doing something incorrectly here, this seems like a big oversight?
Probable related report: #9056

@hsimon-hashicorp hsimon-hashicorp added secret/consul bug Used to indicate a potential bug labels May 8, 2024
@hsimon-hashicorp
Copy link
Contributor

Hi there - I asked our resident Vault/Consul expert and she said that if you're in need of workarounds, this might help. In the meantime I'll bring this to our engineering teams as well. Thanks!

  • Find the bootstrap token via consul_acl_token_secret_id data source
  • Pass data.consul_acl_token_secret_id.example.secret_id into the "token" attribute for the Consul secrets engine

@cheeseburgermotivated
Copy link
Author

Thanks for the update. We kinda-sorta did something similar to recover. We turned off verify_incoming for the consul server nodes, requested a new global-management token from Vault, logged in to the UI with that token, dug around in the issued tokens until we happened to find the token that was initially issued for bootstrap, put that token at a path in Vault, and then use data.vault_generic_secret to pull that token from Vault for use within the vault_consul_secret_backend. Then we turned verify_incoming back on. However, I do like your idea better so I'll play with it in our lab.

Thanks for bringing this up to the engineering team as well as the Vault/Consul expert. I'll remain subscribed to this issue for future updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug ecosystem secret/consul
Projects
None yet
Development

No branches or pull requests

2 participants