Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Manager.Elected() should beclosed after runnables are started #1354

Merged

Conversation

vincepri
Copy link
Member

During debugging a weird issue with some tests failing if they were
running too fast. In other words, calling Start() and Close() on
a manager too fast throws an error where the informers haven't been able
to sync, which then makes Start() fail with an error.

In an effort to improve this behavior, tried to use Elected() to wait
for leader election to start, which also waits for the cache.

During some debugging, this issue happened again and upon digging a bit
more it seems that the channel was closed before starting the runnables
in some cases.

This change reorders the close on cm.elected, which should fix the above
issue.

Signed-off-by: Vince Prignano vincepri@vmware.com

During debugging a weird issue with some tests failing if they were
running too fast. In other words, calling Start() and Close() on
a manager too fast throws an error where the informers haven't been able
to sync, which then makes Start() fail with an error.

In an effort to improve this behavior, tried to use Elected() to wait
for leader election to start, which also waits for the cache.

During some debugging, this issue happened again and upon digging a bit
more it seems that the channel was closed before starting the runnables
in some cases.

This change reorders the close on cm.elected, which should fix the above
issue.

Signed-off-by: Vince Prignano <vincepri@vmware.com>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 21, 2021
@alvaroaleman
Copy link
Member

Isn't the purpose of this channel to tell consumers when we are elected? Why do we need to start Runnables first? The starting is async anyways, so I don't think this will do much in terms of consistent behavior

@vincepri
Copy link
Member Author

The behavior is inconsistent in the two places, in one place we wait for the cache to sync and then close the channel, while in the other we were firing off a goroutine and then closing the channel.

The reordering of operation is to make sure we're consistent

@vincepri
Copy link
Member Author

/retest

@DirectXMan12
Copy link
Contributor

/lgtm

but I'd like to get one more set of eyes on it to approve, just in case

@@ -473,8 +473,8 @@ func (cm *controllerManager) Start(ctx context.Context) (err error) {
}
} else {
// Treat not having leader election enabled the same as being elected.
cm.startLeaderElectionRunnables()
close(cm.elected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri this is consistent today. We first close the channel, the we start the Runnables.

Also, is it intended to make the cm.StartLeaderElectionRunnables synchronous?

Copy link
Member Author

@vincepri vincepri Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was an intentional change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah now I get it, this is abusing the fact that startLeaderElectionRunnables synchronously waits for the cache to be synced. A tad implicit and has a good chance to be accidentally broken in the future again, but seems reasonable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now leader and leaderless starting of pre-election runnables is the same, that makes sense to me.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2021
@vincepri
Copy link
Member Author

/hold

for more 👀

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2021
@vincepri
Copy link
Member Author

/hold cancel

This was referenced Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants