Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: non-positive interval for NewTicker on 1.10.1 after upgrade #15147

Closed
arren-ru opened this issue Apr 23, 2022 · 10 comments · Fixed by #15156
Closed

panic: non-positive interval for NewTicker on 1.10.1 after upgrade #15147

arren-ru opened this issue Apr 23, 2022 · 10 comments · Fixed by #15156
Labels
bug Used to indicate a potential bug storage/raft
Milestone

Comments

@arren-ru
Copy link

Describe the bug
After upgrading from 1.10.0 to 1.10.1 unable to start vault, after unsealing it crashes with Go panic.

Environment:

  • Vault Server and CLI Versions are 1.10.1
  • Server Operating System/Architecture: archlinux with systemd

Vault server configuration file(s):

ui = true
disable_mlock = true

storage "raft" {
  path = "/var/lib/vault"
}

api_addr = "https://127.0.0.1:8200"
cluster_addr = "https://127.0.0.1:8201"

listener "tcp" {
  address       = ":8200"
  tls_cert_file = "***.crt"
  tls_key_file  = "***.key"
}

service_registration "consul" {
  address         = "127.0.0.1:8501"
  service_address = "10.10.0.10"
  scheme          = "https"
  token           = "***"
}

Crash logs are:

Apr 24 01:12:26 helios vault[696416]: 2022-04-24T01:12:26.899+0700 [INFO]  core: starting raft active node
Apr 24 01:12:26 helios vault[696416]: 2022-04-24T01:12:26.899+0700 [INFO]  storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s}" reconcile_interval=0s
Apr 24 01:12:26 helios vault[696416]: panic: non-positive interval for NewTicker
Apr 24 01:12:26 helios vault[696416]: goroutine 2346 [running]:
Apr 24 01:12:26 helios vault[696416]: time.NewTicker(0xdf8475800?)
Apr 24 01:12:26 helios vault[696416]:         time/tick.go:24 +0x10f
Apr 24 01:12:26 helios vault[696416]: github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased.func3()
Apr 24 01:12:26 helios vault[696416]:         github.com/hashicorp/vault/vault/raft.go:486 +0x10a
Apr 24 01:12:26 helios vault[696416]: created by github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased
Apr 24 01:12:26 helios vault[696416]:         github.com/hashicorp/vault/vault/raft.go:477 +0x42a
Apr 24 01:12:26 helios systemd[1]: vault.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 24 01:12:26 helios systemd[1]: vault.service: Failed with result 'exit-code'.

Additional note is that initializing new instance is ok and works fine, but in case of upgrading I don't want to loose data.
Version 1.10.0 works fine without any error.

@sjm42
Copy link

sjm42 commented Apr 25, 2022

Well, I can confirm the same issue.
Environment: Ubuntu 20.04 LTS on 64bit ARM/aarch64 on aws (t4g.small)
Vault binary is installed from official deb package built & published by hashicorp.

This "cluster" has only one node, if that matters.

Server config:

storage "raft" {
  path    = "/vault/i-XXXX"
  node_id = "i-XXXX"
  retry_join {
      leader_api_addr = "https://10.XXX.YY.ZZZ:8200"
  }
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  cluster_address     = "0.0.0.0:8201"
  tls_cert_file = "/vault/tls/vault-i.pem"
  tls_key_file  = "/vault/tls/vault-i.key"
}

seal "awskms" {
  region     = "eu-central-1"
  kms_key_id = "XXX-YYY-ZZZ"

}

api_addr = "https://10.XXX.YY.ZZZ:8200"
cluster_addr = "https://10.XXX.YY.ZZZ:8201"
disable_mlock = true
ui = true

crash log:

Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.215Z [INFO]  core: acquired lock, enabling active operation
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.237Z [INFO]  core: post-unseal setup starting
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.240Z [INFO]  core: loaded wrapping token key
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.240Z [INFO]  core: successfully setup plugin catalog: plugin-directory=""
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.241Z [INFO]  core: successfully mounted backend: type=system path=sys/
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.241Z [INFO]  core: successfully mounted backend: type=identity path=identity/
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.242Z [INFO]  core: successfully mounted backend: type=kv path=kv/
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.242Z [INFO]  core: successfully mounted backend: type=pki path=pki/
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.242Z [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.244Z [INFO]  core: successfully enabled credential backend: type=token path=token/
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.244Z [INFO]  rollback: starting rollback manager
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.244Z [INFO]  core: restoring leases
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.244Z [INFO]  identity: entities restored
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.245Z [INFO]  identity: groups restored
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.245Z [INFO]  core: starting raft active node
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.245Z [INFO]  storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s}" reconcile_interval=0s
Apr 25 09:11:01 vault-i vault[4935]: 2022-04-25T09:11:01.244Z [INFO]  expiration: lease restore complete
Apr 25 09:11:01 vault-i vault[4935]: panic: non-positive interval for NewTicker
Apr 25 09:11:01 vault-i vault[4935]: goroutine 434 [running]:
Apr 25 09:11:01 vault-i vault[4935]: time.NewTicker(0xfff82a6cbcb9a9f9)
Apr 25 09:11:01 vault-i vault[4935]: 	/opt/hostedtoolcache/go/1.17.9/x64/src/time/tick.go:24 +0x14c
Apr 25 09:11:01 vault-i vault[4935]: github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased.func3(0x400037ad20, 0x4000b9c300, {0x5c456a0, 0x400061bf20}, 0x4000b9c2c0, 0x40006d1d40)
Apr 25 09:11:01 vault-i vault[4935]: 	/home/runner/work/vault/vault/vault/raft.go:486 +0xe8
Apr 25 09:11:01 vault-i vault[4935]: created by github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased
Apr 25 09:11:01 vault-i vault[4935]: 	/home/runner/work/vault/vault/vault/raft.go:477 +0x344

The strange thing is, meanwhile I am running a 3-node cluster on practically identical setup, same vault version 1.10.1 on arm64 with raft backend with no issues. The said cluster has been very inactive, though.

@localhorst-org
Copy link

Same for me,

Ubuntu 20.04, vault installation from official deb-repo

deb [arch=amd64] https://apt.releases.hashicorp.com focal main

update vault from 1.10.0 to 1.10.1

i'am also on the way with a "raft storage single node" installation. Saw in the logs, after unseal

[...]
Apr 25 09:05:10 auth01 vault[3692820]: panic: non-positive interval for NewTicker
Apr 25 09:05:10 auth01 vault[3692820]: goroutine 492 [running]:
Apr 25 09:05:10 auth01 vault[3692820]: time.NewTicker(0xdf8475800)
Apr 25 09:05:10 auth01 vault[3692820]: #11/opt/hostedtoolcache/go/1.17.9/x64/src/time/tick.go:24 +0x10f
Apr 25 09:05:10 auth01 vault[3692820]: github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased.func3()
Apr 25 09:05:10 auth01 vault[3692820]: #11/home/runner/work/vault/vault/vault/raft.go:486 +0x10a
Apr 25 09:05:10 auth01 vault[3692820]: created by github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased
Apr 25 09:05:10 auth01 vault[3692820]: #11/home/runner/work/vault/vault/vault/raft.go:477 +0x3e5
Apr 25 09:05:11 auth01 systemd[1]: vault-dev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 25 09:05:11 auth01 systemd[1]: vault-dev.service: Failed with result 'exit-code'.
Apr 25 09:05:16 auth01 systemd[1]: vault-dev.service: Scheduled restart job, restart counter is at 2.
[...]


in a dev environent within docker container, container from official docker hub -> https://hub.docker.com/_/vault/ and also single node raft storage.

Update with "docker pull vault:latest" from 1.10.0 to 1.10.1

panic: non-positive interval for NewTicker

goroutine 1047 [running]:
time.NewTicker(0xdf8475800)
/opt/hostedtoolcache/go/1.17.9/x64/src/time/tick.go:24 +0x10f
github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased.func3()
/home/runner/work/vault/vault/vault/raft.go:486 +0x10a
created by github.com/hashicorp/vault/vault.(*Core).raftTLSRotatePhased
/home/runner/work/vault/vault/vault/raft.go:477 +0x3e5

In both cases downgrade to the prior Version 1.10.0 brings me back on track

@ccapurso
Copy link
Contributor

Thank you for bringing this to our attention, @arren-ru. We appreciate everyone's crash logs as they have helped identify the underlying issue. Our team has begun looking into this and are working to provide a fix. I will keep you posted on when it will be made available.

@tristanmorgan
Copy link
Member

Before the upgrade I think the new Raft TLS keys are not being marked as complete when used on a single node. as Evidence of messages like this:

2022-04-26T11:20:23.456+1000 [INFO]  core: starting raft active node
2022-04-26T11:20:23.457+1000 [INFO]  storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s}" reconcile_interval=0s
2022-04-26T11:20:23.457+1000 [WARN]  core.raft: skipping new raft TLS config creation, keys are pending
2022-04-26T11:20:23.561+1000 [INFO]  core: post-unseal setup complete
2022-04-26T11:25:23.456+1000 [WARN]  core.raft: skipping new raft TLS config creation, keys are pending

@marceloavilaoliveira
Copy link

Hello, is there a workaround to this problem? Is the 1.10.1 version unusable, and we must wait for 1.10.2?

@hsimon-hashicorp
Copy link
Contributor

hsimon-hashicorp commented Apr 28, 2022

Hello, is there a workaround to this problem? Is the 1.10.1 version unusable, and we must wait for 1.10.2?

At this time, our best guidance is to downgrade to 1.10.0, unfortunately. We understand that this is a major inconvenience, and we're working on releasing 1.10.2 as soon as is possible. We appreciate your understanding!

@marceloavilaoliveira
Copy link

No problem at all, it would be good to place a message in the 1.10.1 release notes. Thanks a lot for your prompt answer.

@ccapurso
Copy link
Contributor

it would be good to place a message in the 1.10.1 release notes

We have added this issue to the known list of issues for the 1.10.1 release in our upgrade docs. Please keep in mind that this affects more than just 1.10.1.

Per the upgrade notice:

Impacted versions: 1.10.1, 1.9.5, 1.8.10. Versions prior to these are unaffected.

With that said, it is recommended to not perform minor upgrades past 1.10.0, 1.9.4, and 1.8.9 at this time. As @hsimon-hashicorp has mentioned, we are working to get the fix in #15156 released for versions 1.8.11, 1.9.6, and 1.10.2 as soon as possible. I will update this issue when they are available. Thank you for your patience and understanding!

@marceloavilaoliveira
Copy link

Thank you very much for your detailed response.

@ccapurso
Copy link
Contributor

Thank you for your patience, everyone! We have released a fix for this issue as part of 1.8.11, 1.9.6, and 1.10.2. Below are a list of links where you can find the associated binaries and docker images. Let me know if you have any questions!

Open Source Binaries

Enterprise Binaries

Open Source Docker Images

Enterprise Docker Images

@mladlow mladlow modified the milestones: 1.10.2, 1.8.11 Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug storage/raft
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants