Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault cannot be unsealed with error "Vault is not initialized" But It is already initialized #15680

Closed
shujun10086 opened this issue May 30, 2022 · 7 comments · Fixed by #15946

Comments

@shujun10086
Copy link
Contributor

Describe the bug
The Vault cannot be unsealled anymore after restart.
Vault server is restarted due to liveness probe fail
Use curl -m 1 http://127.0.0.1:5817/v1/sys/health to check Vault health.
The liveness probe internal is 5s and 3 times.

After it restarted, our own programe will unseal it by the restored unseal key and root token file.

It will check Vault init status by http://127.0.0.1:5817/v1/sys/init,
Then the response is Vault init true.
But it cannot unseal Vault anymore.
URL: PUT http://127.0.0.1:5817/v1/sys/unseal
Code: 400. Errors:

  • Vault is not initialized

Check the Vault DB files. It seems _keyring file is not exist.
The following is the Vault DB core files. usually there will be a _keyring file. When problem happend, It seems gone. But not sure if it is just a result as Vault restart.

bash-5.1$ ls -l /mnt/services/vault/DB/core/
total 6
-rw-------. 1 9999 9999 397 May 24 09:31 _audit
-rw-------. 1 9999 9999 537 May 24 09:31 _auth
-rw-------. 1 9999 9999 133 May 24 09:31 _local-audit
-rw-------. 1 9999 9999 133 May 24 09:31 _local-auth
-rw-------. 1 9999 9999 417 May 24 09:31 _local-mounts
-rw-------. 1 9999 9999 209 May 29 20:51 _master
-rw-------. 1 9999 9999 709 May 24 09:31 _mounts
-rw-------. 1 9999 9999 169 May 24 09:31 _seal-config
-rw-------. 1 9999 9999 101 May 24 09:31 _shamir-kek
drwx------. 3 9999 9999 2 May 24 09:31 cluster
drwx------. 2 9999 9999 1 May 24 09:31 hsm
drwx------. 2 9999 9999 1 May 24 09:31 wrapping

To Reproduce
Steps to reproduce the behavior:

  1. Run the VaultServer until it is killed by livenessProbe
  2. After it is restarted, it cannot be unseal anymore

Still not clear why VaultServer does not response the health request. So it is difficult to reproduce

Expected behavior
After Vault restart, it still can be unsealled normally.

Environment:

  • Vault Server Version (retrieve with vault status):
    *bash-5.1$ vault status
    Key Value

Seal Type shamir
Initialized true
Sealed true
Total Shares 1
Threshold 1
Unseal Progress 0/1
Unseal Nonce n/a
Version 1.8.10
Storage Type file
HA Enabled false

  • Vault CLI Version (retrieve with vault version):
    bash-5.1$ vault version
    Vault v1.8.10 (cgo)

  • Server Operating System/Architecture:
    fedora

Vault server configuration file(s):

# Paste your Vault config here.
# Be sure to scrub any sensitive values

bash-5.1$ cat /etc/vaultserver/server.hcl
storage "file" {
path = "/mnt/services/vault/DB"
}

listener "tcp" {
address = "127.0.0.1:5817"
tls_disable = 1
}

cache_size = 100
disable_mlock = true

Additional context
Add any other context about the problem here.
If you can explain from the Vault source code point of view, why the "Vault is not initialized" when Vault is inited already.

@maxb
Copy link
Contributor

maxb commented May 30, 2022

Vault uses the presence of the keyring to test whether it has been initialized. So, mysterious loss of the keyring file from backing storage would seem to explain this behaviour.

@shujun10086
Copy link
Contributor Author

I suspect when Vault rotate the keyring file, it stuck and kill by liveness probe, the file will be gone as it maybe delete during rotating. From source code, It will rotate it in a stable time. If the time can be configured?

Another question is why http://127.0.0.1:5817/v1/sys/init return success as Vault is already init. Does it not use the keyring file to test if it init or not ?

@shujun10086
Copy link
Contributor Author

If Vault can make sure the keyring rotate successfully when it handle the kill 15 signal ?

shujun10086 added a commit to shujun10086/vault that referenced this issue Jun 11, 2022
Fix keyring file missing after Vault restart

    Vault can be killed by signal SIGTERM in anytime. If Vault is
    doing keyring rotation and the fd is still not closed at that time,
    the size of the keyring file will be zero. The orignal key will be
    lost totally.
    Because the file is opened with os.O_TRUNC, and the fd will be closed
    automatically after Vault exits.
    Then try to unseal Vault after it startup again, the keyring file will
    be deleted in getInternal function due to its size is zero. So that is
    why the keyring file is missing. And the Vault cannot be unsealed
    anymore due to the file missing.
    Write data into a temp file then move it can avoid the file crash.

    Fix:hashicorp#15680
@raskchanky raskchanky linked a pull request Jun 14, 2022 that will close this issue
@sathuish
Copy link

Is there a way to generate the file manually becoz we have faced it in our prod environment?

@shujun10086
Copy link
Contributor Author

Is there a way to generate the file manually becoz we have faced it in our prod environment?

It seems it cannot. The file is updated every 5 minutes by default. And it seems even I save the file and do manual replacing, the unseal still cannot be successful. Which version of vault do you use?

@sathuish
Copy link

sathuish commented Aug 26, 2022

vault - 1.7.1. Any other way to unseal the vault now?

@maxb
Copy link
Contributor

maxb commented Aug 29, 2022

Question was also asked at https://discuss.hashicorp.com/t/keyring-file-is-missing-under-core-directory/43588 . My answer from there:

There is no way - the keyring file contained the encryption keys with which all the user data in Vault is encrypted.

With it lost, unless you have your own backup elsewhere, all the data is permanently lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants