Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault does not start anymore #3573

Closed
mperriere opened this issue Nov 13, 2017 · 5 comments
Closed

Vault does not start anymore #3573

mperriere opened this issue Nov 13, 2017 · 5 comments

Comments

@mperriere
Copy link

Environment:
0.7.0

  • Vault Version:
    Amazon AMI
    Linux consul-i-0cca6d2136e1a2b8e 4.4.11-23.53.amzn1.x86_64 Initial Website Import #1 SMP Wed Jun 1 22:22:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Vault Config File:
disable_mlock=true

backend "consul" {
address = "127.0.0.1:8500"
path = "vault"
}

listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}

Startup Log Output:
vault server -config=/etc/vault/vault-config.hcl [84/15883]
==> Vault server configuration:

                 Cgo: disabled
     Cluster Address: https://10.196.74.35:8201
          Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", tls: "disabled")
           Log Level: info
               Mlock: supported: true, enabled: false
    Redirect Address: http://10.196.74.35:8200
             Storage: consul (HA available)
             Version: Vault v0.7.0
         Version Sha: 614deacfca3f3b7162bbf30a36d6fc7362cd47f0

==> Vault server started! Log data will stream in below:

2017/11/13 16:50:59.483992 [WARN ] physical/consul: appending trailing forward slash to path

2017/11/13 16:51:54.222936 [INFO ] core: vault is unsealed
2017/11/13 16:51:54.222953 [WARN ] physical/consul: Concurrent sealed state change notify dropped
2017/11/13 16:51:54.223000 [INFO ] core: entering standby mode
2017/11/13 16:51:54.235068 [INFO ] core: acquired lock, enabling active operation
2017/11/13 16:51:54.315911 [WARN ] physical/consul: Concurrent state change notify dropped
2017/11/13 16:51:54.315929 [INFO ] core: post-unseal setup starting
2017/11/13 16:51:54.317168 [INFO ] core: loaded wrapping token key
2017/11/13 16:51:54.319493 [INFO ] core: successfully mounted backend: type=generic path=secret/
2017/11/13 16:51:54.319589 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/11/13 16:51:54.319607 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/11/13 16:51:54.319666 [INFO ] rollback: starting rollback manager
2017/11/13 16:51:54.325838 [INFO ] expiration: restoring leases
panic: runtime error: slice bounds out of range

goroutine 59 [running]:
github.com/hashicorp/vault/vault.(*AESGCMBarrier).decryptKeyring(0x1a091770, 0x19ed65c0, 0x20, 0x0, 0x0, 0x0, 0xe, 0x13afe3c6, 0x9aeae00, 0x0, ...)
/gopath/src/github.com/hashicorp/vault/vault/barrier_aes_gcm.go:816 +0x497
github.com/hashicorp/vault/vault.(*AESGCMBarrier).Get(0x1a091770, 0x19ed65c0, 0x20, 0x0, 0x0, 0x0)
/gopath/src/github.com/hashicorp/vault/vault/barrier_aes_gcm.go:669 +0x197
github.com/hashicorp/vault/vault.(*BarrierView).Get(0x1a167640, 0x19ed6260, 0x12, 0x1a16cfb0, 0x807b41a, 0x1a16cf88)
/gopath/src/github.com/hashicorp/vault/vault/barrier_view.go:53 +0x94
github.com/hashicorp/vault/vault.(*ExpirationManager).loadEntry(0x1a13a990, 0x19ed6260, 0x12, 0x1a16cf00, 0x1, 0x0)
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:725 +0x38
github.com/hashicorp/vault/vault.(*ExpirationManager).Restore.func1(0x1a17def0, 0x1a07c5c0, 0x1a13a990, 0x1a07c680, 0x1a07c6c0, 0x1a07c600)
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:155 +0xdd
created by github.com/hashicorp/vault/vault.(*ExpirationManager).Restore
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:169 +0x305

Expected Behavior:
starting vault
unlocking vault
vault is running

Actual Behavior:
starting vault
unlocking vault
then vault crashes.

Steps to Reproduce:
start + unlock any of the 3 nodes, running inside a 3 nodes consul Cluster

Important Factoids:
After one week of vacation following intense Vault usage (integration in a Terraform/Ansible stack), i found Vault lying on the floor, with that error

References:

@jefferai
Copy link
Member

The error there is happening because the keyring coming into Vault is nil. This suggests an issue either in Consul or with that key in Consul. You may want to try looking at the path <vault prefix>/core/keyring in Consul and ensure that it has valid data.

@mperriere
Copy link
Author

mperriere commented Nov 14, 2017

vault/core/keyring is not readable from the consul UI.
from the REST API, i get that value:
curl 10.196.74.35:8500/v1/kv/vault/core/keyring
[{"LockIndex":0,"Key":"vault/core/keyring","Flags":0,"Value":"AAAAAQK2hhndWHCswNP3v8kOt32tmOv3reCsfBLcS6zZNtvBqjWgz/lvTL4ARlzMIkxUn+85p+SqMZJ+S2p5DylxJN4j50ZUTCiLm24qg80Djd760vRzTBVWEm2oMZL0DS8YbIVY/hVpPsA0IyVQX0UhhzQb9WTFMficJ1EchBW3Qip7NEpyjeCg8zSsURtWInb9xgQkRewEVMq2npiHgfcYQBKpaqx67pU7T+xGzS8RoSg2bFDHraz8/SzENlJbTtuIMeiz/DuQ6L1x8+Dxvsot4LijJ3kUOfxPqdeo6o5iaqg=","CreateIndex":74,"ModifyIndex":74}]

How can i check that this value is ok ?

Another way of solving that issue would be to reinitialize the Vault service, as the critical data can actually be reimported.
Is there a way to do that without reinstalling vault or consul?

@jefferai
Copy link
Member

Hi there. Due to the old version I misread something when I first looked at it. The issue isn't with the keyring; it's with one of the leases under sys/expire -- it appears that the value being read is truncated somehow.

The ideal thing to do would be to figure out which lease and remove it. A more brute-force approach would be to simply delete the underlying leases, but this will mean that revocations will not take place when they should for dynamic secrets (including tokens).

@mperriere
Copy link
Author

I upgraded my cluster to Consul-1.0.0 and Vault-0.9.0, and reimported a consul snapshot. Everything went ok.

Then i followed your suggestion, and removed the expiring keys:
opt/consul/bin/consul kv delete vault/sys/expire/
Success! Deleted key: vault/sys/expire/

But when i restart + unlock the vault again, i get the same error:

2017/11/15 16:55:56.447906 [WARN ] physical/consul: appending trailing forward slash to path
2017/11/15 16:58:59.396564 [INFO ] core: vault is unsealed
2017/11/15 16:58:59.396646 [INFO ] core: entering standby mode
2017/11/15 16:58:59.413413 [INFO ] core: acquired lock, enabling active operation
2017/11/15 16:58:59.480408 [INFO ] core: post-unseal setup starting
2017/11/15 16:58:59.481522 [INFO ] core: loaded wrapping token key
2017/11/15 16:58:59.481561 [INFO ] core: successfully setup plugin catalog: plugin-directory=
2017/11/15 16:58:59.483851 [INFO ] core: successfully mounted backend: type=generic path=secret/
2017/11/15 16:58:59.484002 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/11/15 16:58:59.484229 [INFO ] core: successfully mounted backend: type=identity path=identity/
2017/11/15 16:58:59.484255 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/11/15 16:58:59.490890 [INFO ] expiration: restoring leases
2017/11/15 16:58:59.491058 [INFO ] rollback: starting rollback manager
2017/11/15 16:58:59.493922 [INFO ] identity: entities restored
2017/11/15 16:58:59.495071 [INFO ] identity: groups restored
panic: runtime error: slice bounds out of range

goroutine 164 [running]:
github.com/hashicorp/vault/vault.(*AESGCMBarrier).decryptKeyring(0x1b058b40, 0x1b5e5cc0, 0x20, 0x0, 0x0, 0x0, 0xbe7b3a74, 0xa267d9cd, 0x2a, 0xa4d0020, ...)
/gopath/src/github.com/hashicorp/vault/vault/barrier_aes_gcm.go:819 +0x48a
github.com/hashicorp/vault/vault.(*AESGCMBarrier).Get(0x1b058b40, 0x1b5e5cc0, 0x20, 0x0, 0x0, 0x0)
/gopath/src/github.com/hashicorp/vault/vault/barrier_aes_gcm.go:671 +0x16e
github.com/hashicorp/vault/vault.(*BarrierView).Get(0x1b153640, 0x1b5e58e0, 0x12, 0x809bfc7, 0x1b45b780, 0x8049033)
/gopath/src/github.com/hashicorp/vault/vault/barrier_view.go:57 +0x94
github.com/hashicorp/vault/vault.(*ExpirationManager).loadEntryInternal(0x1b0626e0, 0x1b5e58e0, 0x12, 0x1, 0x0, 0x806e000, 0x80947e0)
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:1062 +0x42
github.com/hashicorp/vault/vault.(*ExpirationManager).processRestore(0x1b0626e0, 0x1b5e58e0, 0x12, 0x0, 0x0)
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:419 +0x152
github.com/hashicorp/vault/vault.(*ExpirationManager).Restore.func2(0x1b39eee0, 0x1b614180, 0x1b0626e0, 0x1b303300, 0x1b614200, 0x1b6141c0)
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:325 +0x14b
created by github.com/hashicorp/vault/vault.(*ExpirationManager).Restore
/gopath/src/github.com/hashicorp/vault/vault/expiration.go:314 +0x2ba

But, fortunately, with the new consul version, i managed to remove the path vault/sys/expire/ from the GUI, after that everything seems to be working again

Thank you for your help.

@jefferai
Copy link
Member

I believe the kv CLI command isn't recursive unless you specify a flag.

Glad it's fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants