Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault approle id is changed #17820

Closed
wuchuxiong opened this issue Nov 4, 2022 · 9 comments
Closed

vault approle id is changed #17820

wuchuxiong opened this issue Nov 4, 2022 · 9 comments

Comments

@wuchuxiong
Copy link

I face a problem that vault approle id is changed somehow.
My application run vault in a k8s pod. In the vault server, I enable a number of secrets engine with given path including approle id.

sh-4.4$ vault list -tls-skip-verify auth/approle/role
Keys
----
app-role1
app-role2

sh-4.4$ vault read -tls-skip-verify auth/approle/role/app-role1/role-id
Key        Value
---        -----
role_id    2f6f295d-f155-c7ba-2197-972aa3eae0ee

sh-4.4$ vault read -tls-skip-verify auth/approle/role/app-role2/role-id
Key        Value
---        -----
role_id    9cc81f6e-68c4-0a03-4b62-789450e5dfbf

sh-4.4$ vault secrets list -tls-skip-verify
Path                                                Type         Accessor              Description
----                                                ----         --------              -----------
cubbyhole/                                          cubbyhole    cubbyhole_dc17171c    per-token private secret storage
identity/                                           identity     identity_81c749ab     identity store
base/689d52e5-4697-3ffa-ef52-c65a5b1979d7/          generic      generic_8b01f10a      n/a
base/78f7f49f-51c8-9f2d-4943-d495daa93782/          generic      generic_7281c2de      n/a

And my sensitive information are saved under the path. It works perfectly for a few months. The clients can get the key values with path base/APPROLEID/KEYNAME. Last week, I update the vault version from 1.8.8 to 1.11.2 with a pod rolling update. Unfortunately, I notice that the approleId is updated. Consequently, The client can't get values with path base/NEWAPPROLEID/KEYNAME.

This is the second time I face this issue. Last time, I hit the issue without doing any vault update. It happens after a vault pod restart.

So I wonder why the approle Id could be changed? It looks more like a bug.

@maxb
Copy link
Contributor

maxb commented Nov 5, 2022

The only time a RoleID of an AppRole should ever change is if explicitly requested via the API for that purpose: https://developer.hashicorp.com/vault/api-docs/auth/approle#update-approle-role-id

If the AppRole RoleID did change, a client using it would no longer be able to log in at all, so you would see problems before it ever got to trying to retrieve a value from your generic secrets engines.

I wonder if you don't actually have a Vault cluster, but several Vault pods all of which are acting as their own separate Vaults?

Please share more information including your Vault configuration file.

@wuchuxiong
Copy link
Author

Thank you Maxb for your comments. My vault cluster is a single instance. There is only one vault pod. Here is the configuration file.

storage "file" {
  path = "/var/vault/data"
}
listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = "0"
  tls_cert_file = "/etc/ssl/server.crt"
  tls_key_file = "/etc/ssl/server.key"
  tls_min_version = "tls12"
  tls_cipher_suites = "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_R
SA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECD
HE_ECDSA_WITH_AES_128_CBC_SHA256"
}
disable_mlock = true

telemetry {
  prometheus_retention_time = "12h"
  disable_hostname = true
}

@wuchuxiong
Copy link
Author

Anyone else hit this issue? Any comments are appreciated.

@wuchuxiong
Copy link
Author

Hi Maxb,
My vault is launched as as k8s deployment. During the vault upgrade, the vault is upgraded through a k8s rolling update. That means the old vault pod is still running while the new vault pod is starting up. The old vault pod will be deleted until the new pod get ready. So in the startup phase, there indeed are two vault pods are connecting to the same backend storage files.
I notice the vault is running is a statefulset in official vault helm chart https://github.com/hashicorp/vault-helm/blob/main/templates/server-statefulset.yaml. I'm not sure if this is somehow the root cause of this issue.

@maxb
Copy link
Contributor

maxb commented Nov 23, 2022

That means the old vault pod is still running while the new vault pod is starting up. The old vault pod will be deleted until the new pod get ready. So in the startup phase, there indeed are two vault pods are connecting to the same backend storage files.

This is absolutely not safe at all with file storage, and could well explain the data corruption / misbehaviour you have described in this ticket.

@wuchuxiong
Copy link
Author

Do you mean the file storage backend isn't safe? Or it's not safe when a new vault pod is starting up before the old vault pod isn't stop?

@maxb
Copy link
Contributor

maxb commented Nov 24, 2022

The second.

It's only safe to use any storage backend with multiple Vault processes running simultaneously, if it's a storage backend that supports High Availability (and if it supports enabling/disabling High Availability via a configuration option, it needs to be turned on).

The HA status of each storage backend is documented on its page under https://developer.hashicorp.com/vault/docs/configuration/storage

https://developer.hashicorp.com/vault/docs/configuration/storage/filesystem says "No High Availability".

Although, there was a major bug in the file storage backend which previously made it unsafe to use, which was fixed in 1.9.8 / 1.10.5 / 1.11.1 / 1.12.0 :

core/seal: Fix possible keyring truncation when using the file backend. [https://github.com//pull/15946]

@wuchuxiong
Copy link
Author

Thank you very much for your very valuable information.
Then I have to start the vault pod through a single replica statefulset or single replica deployment with strategy type is Recreate.

@ncabatoff
Copy link
Collaborator

Thanks for helping to get to the bottom of this @maxb. I'm going to close this as it doesn't appear to be a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants