Skip to content

Commit

Permalink
Don't race for CRL rebuilding capability check (#17185)
Browse files Browse the repository at this point in the history
* Don't race for CRL rebuilding capability check

Core has recently seen some data races during SystemView/replication
updates between them and the PKI subsystem. This is because this
SystemView access occurs outside of a request (during invalidation
handling) and thus the proper lock isn't held.

Because replication status cannot change within the lifetime of a plugin
(and instead, if a node switches replication status, the entire plugin
instance will be torn down and recreated), it is safe to cache this
once, at plugin startup, and use it throughout its lifetime.

Thus, we replace this SystemView access with a stored boolean variable
computed ahead of time.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Update builtin/logical/pki/backend.go

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
  • Loading branch information
cipherboy committed Sep 19, 2022
1 parent 82f7b7d commit 9f3955b
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 7 deletions.
7 changes: 6 additions & 1 deletion builtin/logical/pki/backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,12 @@ func Backend(conf *logical.BackendConfig) *backend {

b.pkiStorageVersion.Store(0)

b.crlBuilder = newCRLBuilder()
// b isn't yet initialized with SystemView state; calling b.System() will
// result in a nil pointer dereference. Instead query BackendConfig's
// copy of SystemView.
cannotRebuildCRLs := conf.System.ReplicationState().HasState(consts.ReplicationPerformanceStandby) ||
conf.System.ReplicationState().HasState(consts.ReplicationDRSecondary)
b.crlBuilder = newCRLBuilder(!cannotRebuildCRLs)

// Delay the first tidy until after we've started up.
b.lastTidy = time.Now()
Expand Down
2 changes: 1 addition & 1 deletion builtin/logical/pki/crl_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ func TestCrlRebuilder(t *testing.T) {
require.NoError(t, err)

req := &logical.Request{Storage: s}
cb := newCRLBuilder()
cb := newCRLBuilder(true /* can rebuild and write CRLs */)

// Force an initial build
err = cb.rebuild(ctx, b, req, true)
Expand Down
9 changes: 4 additions & 5 deletions builtin/logical/pki/crl_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ import (

atomic2 "go.uber.org/atomic"

"github.com/hashicorp/vault/sdk/helper/consts"

"github.com/hashicorp/vault/sdk/helper/certutil"
"github.com/hashicorp/vault/sdk/helper/errutil"
"github.com/hashicorp/vault/sdk/logical"
Expand Down Expand Up @@ -70,6 +68,7 @@ type (
type crlBuilder struct {
_builder sync.Mutex
forceRebuild *atomic2.Bool
canRebuild bool
lastDeltaRebuildCheck time.Time

_config sync.RWMutex
Expand All @@ -86,9 +85,10 @@ const (
_enforceForceFlag = false
)

func newCRLBuilder() *crlBuilder {
func newCRLBuilder(canRebuild bool) *crlBuilder {
return &crlBuilder{
forceRebuild: atomic2.NewBool(false),
canRebuild: canRebuild,
// Set the last delta rebuild window to now, delaying the first delta
// rebuild by the first rebuild period to give us some time on startup
// to stabilize.
Expand Down Expand Up @@ -260,8 +260,7 @@ func (cb *crlBuilder) requestRebuildIfActiveNode(b *backend) {
// Only schedule us on active nodes, as the active node is the only node that can rebuild/write the CRL.
// Note 1: The CRL is cluster specific, so this does need to run on the active node of a performance secondary cluster.
// Note 2: This is called by the storage invalidation function, so it should not block.
if b.System().ReplicationState().HasState(consts.ReplicationPerformanceStandby) ||
b.System().ReplicationState().HasState(consts.ReplicationDRSecondary) {
if !cb.canRebuild {
b.Logger().Debug("Ignoring request to schedule a CRL rebuild, not on active node.")
return
}
Expand Down

0 comments on commit 9f3955b

Please sign in to comment.