Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect: Fix an issue with updating CA config in a secondary datacenter #9009

Merged
merged 7 commits into from
Nov 30, 2020

Conversation

kyhavlov
Copy link
Contributor

This PR fixes a case where updating the CA config in a secondary datacenter would cause an error when it should trigger the creation of a new intermediate certificate.

The test passes but I think this still needs some extra logic to prevent races between the RPC endpoint and secondaryCARootWatch/intermediateCertRenewalWatch. Currently secondaryCARootWatch will set things right on its next iteration if there's a race where it gets overwritten by a call to the RPC endpoint, but since it spends most of its time waiting on a blocking query that could be up to 5-10 minutes with the wrong intermediate/root, which is a pretty long time to wait for things to converge.

Fixes #7009.

@kyhavlov
Copy link
Contributor Author

kyhavlov commented Oct 27, 2020

Discussed fixing the racey parts of the secondary logic with @rboyer and decided a good solution is to pull some of the intermediate update code out into a separate struct to manage the different states (ready/signing/reconfig) in a safer/more understandable way, so I'm going to update the PR with that next.

@kyhavlov
Copy link
Contributor Author

I've updated the PR with the refactored CA logic - it still needs some test updates/fixes and a writeup of how the state machine logic avoids race conditions in specific situations.

agent/consul/server.go Outdated Show resolved Hide resolved
agent/structs/connect_ca.go Outdated Show resolved Hide resolved
if op != expected {
t.Fatalf("got unexpected op %q, wanted %q", op, expected)
}
case <-time.After(3 * time.Second):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could instead use the current time and the time portion of (*testing.T).Deadline() instead to compute remaining test time instead of using a made up "not forever" number here and several other places

Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming CI passes.

Further tests can come in a followup.

@kyhavlov kyhavlov merged commit c4eff42 into master Nov 30, 2020
@kyhavlov kyhavlov deleted the update-secondary-ca branch November 30, 2020 22:49
@hashicorp-ci
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/290464.

@hashicorp-ci
Copy link
Contributor

🍒✅ Cherry pick of commit c4eff42 onto release/1.9.x succeeded!

hashicorp-ci pushed a commit that referenced this pull request Nov 30, 2020
connect: Fix an issue with updating CA config in a secondary datacenter
@hashicorp-ci
Copy link
Contributor

🍒❌ Cherry pick of commit c4eff42 onto release/1.8.x failed! Build Log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

connect: cannot update CA configuration in secondary datacenter
5 participants