Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix raft tls key rotation panic when rotation time in past #15156

Merged
merged 6 commits into from Apr 26, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions changelog/15156.txt
@@ -0,0 +1,3 @@
```release-note:bug
rafft: fix Raft TLS key rotation panic that occurs if active key is more than 24 hours old
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: rafft

```
19 changes: 17 additions & 2 deletions vault/raft.go
Expand Up @@ -473,9 +473,24 @@ func (c *Core) raftTLSRotatePhased(ctx context.Context, logger hclog.Logger, raf
return errors.New("no active raft TLS key found")
}

getNextRotationTime := func(next time.Time) time.Time {
now := time.Now()

// active key's CreatedTime + raftTLSRotationPeriod might be in
// the past (meaning it is ready to be rotated) which will cause
// NewTicker to panic when used with time.Until, prevent this by
// pushing out rotation time into very near future
if next.Before(now) {
return now.Add(1 * time.Minute)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, though I think it's not quite what the behaviour was before, right? Prior to the breaking change, I assume a negative next would be treated as "immediately", not "now+1m"?

Copy link
Contributor Author

@ccapurso ccapurso Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true but with NewTicker I think all you can really do is try to be as close to "immediately" as possible. The reason for adding 1m was just to try to add enough buffer to prevent the negative NewTicker issue when using time.Until. It needs to be some time in the future in other words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced the padding to be 10 seconds. I'll admit that the 1 minute was a bit aggressive.

}

// push out to ensure proposed time does not elapse
return next.Add(1 * time.Minute)
}

// Start the process in a go routine
go func() {
nextRotationTime := activeKey.CreatedTime.Add(raftTLSRotationPeriod)
nextRotationTime := getNextRotationTime(activeKey.CreatedTime.Add(raftTLSRotationPeriod))

keyCheckInterval := time.NewTicker(1 * time.Minute)
defer keyCheckInterval.Stop()
Expand Down Expand Up @@ -509,7 +524,7 @@ func (c *Core) raftTLSRotatePhased(ctx context.Context, logger hclog.Logger, raf
continue
}
ncabatoff marked this conversation as resolved.
Show resolved Hide resolved

nextRotationTime = next
nextRotationTime = getNextRotationTime(next)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the ticker.Reset to here, as well as eliminate the backoff variable and just put all the time calculation logic here? Feels like it would be safer and clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your thought to put all of this logic into the current getNextRotationTime (maybe with a different name)? That might be tricky as we have to worry about rotation time for the initialization of the ticker, the reset, and the handling of the error returned by the rotateKeyring in the case of the backoff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I propose to leave getNextRotationTime alone, but get rid of the backoff bool, the continue, and just do all the calculations for nextRotationTime here in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way we can have more confidence that the padding we add to nextRotationTime is big enough (and possibly can be smaller than 1m), because we'll update the ticker with it immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I misunderstood, moved things around a bit.


case <-stopCh:
return
Expand Down