Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add roleRefs to RSync override API #977

Closed
wants to merge 1 commit into from

Conversation

sdowell
Copy link
Contributor

@sdowell sdowell commented Oct 28, 2023

The roleRefs field is intended to allow more flexible and simple management of bindings for both RootSync and RepoSync objects. The field allows a list of references to Role and ClusterRole objects. For RepoSyncs, this will always create a RoleBinding in the same namespace as the RepoSync. For RootSyncs, the user can either choose to create a RoleBinding or ClusterRoleBinding via the namespace field.

This is an iteration upon the previous design which added a single ClusterRoleBinding configuration for RootSyncs. This design allows more flexible configuration for RootSyncs, as well as a consistent API for RepoSyncs which simplifies the management of RepoSync RoleBindings.

Issue: #935
Iterates on this PR: #938

@sdowell sdowell force-pushed the rsync-role-refs branch 3 times, most recently from 9236752 to 806abb3 Compare October 30, 2023 17:10
@sdowell
Copy link
Contributor Author

sdowell commented Oct 30, 2023

/assign @karlkfi
/assign @mortent

FYI @tomasaschan

Copy link
Contributor

@tomasaschan tomasaschan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for putting this together so quickly!

A few small comments on things I assume are either typos or just stylistic disagreements :)

docs/design-docs/02-custom-root-reconciler-clusterrole.md Outdated Show resolved Hide resolved
docs/design-docs/02-custom-root-reconciler-clusterrole.md Outdated Show resolved Hide resolved
pkg/reconcilermanager/controllers/build_names.go Outdated Show resolved Hide resolved
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from karlkfi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sdowell sdowell force-pushed the rsync-role-refs branch 3 times, most recently from 981af15 to a045489 Compare October 30, 2023 23:30
docs/design-docs/02-custom-root-reconciler-clusterrole.md Outdated Show resolved Hide resolved
This essentially requires for the `reconciler-manager` to be able to track an
inventory of bindings that were previously created for a given `RootSync` or `RepoSync`.
This can be accomplished by applying a label whenever a new binding is created,
and then querying using a label selector on subsequent reconciliation loops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we end up in a situation where the reconciler-manager will not be able to delete the bindings (maybe because it no longer has permissions)? If so, how is the issue surfaced to users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should ever happen, unless a user has a standalone Config Sync installation (no managing operator) and manually modifies the reconciler-manager ClusterRole/ClusterRoleBinding.

If such a error did happen, I expect the behavior would be the same regardless of which managed resource it's erroring on (e.g. ServiceAccount, Deployment, RoleBinding, etc). I haven't tested this to verify, but my expectation is that the reconciler-manager would stall and surface the error on the RSync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, when the reconciler-manager or reconciler cannot delete things that it manages, it errors and that error is surfaced in the RSync status.

manifests/rootsync-crd.yaml Show resolved Hide resolved
@@ -1171,3 +1171,51 @@ func RootSyncSpecOverrideEquals(expected *v1beta1.RootSyncOverrideSpec) Predicat
return nil
}
}

func subjectsEqual(want []string, got []rbacv1.Subject) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be replaced with cmp.Equal() with a cmpopts.SortSlices option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, since this function is only comparing the name field of each Subject. We would still need to convert the data structures to look the same in order to use cmp.Equal

_, err := r.upsertClusterRoleBinding(ctx, reconcilerRef, roleConfig.Name)
return errors.Wrap(err, "upserting cluster role binding")
func (r *RootSyncReconciler) configureClusterRoleBinding(ctx context.Context, reconcilerRef types.NamespacedName, roleRefs []v1beta1.RootSyncRoleRef) error {
currentRefMap, err := r.listCurrentRoleRefs(ctx, reconcilerRef)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems risky. If a ClusterRoleBinding/RoleBinding is manually created with the same label, or declared in the source repo with the label, it will be deleted or cause controller fights.

What about adding the Namespace/Name of created RoleBindings to the annotations (preferred) or the status field of the RSync, and delete those RoleBindings in the annotations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thoughts is that it's fair game to assume that a resource key that is prefixed with our GroupName (configsync.gke.io) is a resource that we can manage. The same argument could be made for other resources as well, say if a user tried to manage the configsync.gke.io/ns-reconciler base RoleBinding in the source repo.

If this is a concern, we could potentially add another label to make it more explicit - say something like configsync.gke.io/managed-by: reconciler-manager.

As for tracking the inventory of bindings in RSync annotations - I'm not sure how this would scale to larger numbers of bindings. We would need to track Kind, Name, and Namespace for an arbitrary number of bindings. This is unlikely to fit in a single annotation, and it's not clear to me how we would shard this across multiple annotations.

I could potentially see adding new Status fields on RootSync/RepoSync, however I'm not sure how compelling the case for this is over using a label selector.

for _, roleRef := range declaredRoleRefs {
if _, ok := currentRefMap[roleRef]; !ok {
// we need to call a separate create method here for generateName
if _, err := r.createRBACBinding(ctx, reconcilerRef, roleRef.RoleRefBase, roleRef.Namespace); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to verify whether the referenced Role/ClusterRole exists, and fail earlier if it doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it would add much value to verify that earlier on, as such condition would cause creation of the binding to error and surface on the RSync. Even if we did check for the existence of the Role/ClusterRole early on, it's always possible for race condition where another client deletes the Role/ClusterRole after our existence check.

@sdowell sdowell force-pushed the rsync-role-refs branch 2 times, most recently from 8e8257f to a2e477d Compare November 1, 2023 21:59
// associated with a reconciler.
func (nt *NT) ListReconcilerRoleBindings(syncKind string, rsRef types.NamespacedName) ([]rbacv1.RoleBinding, error) {
opts := &client.ListOptions{}
opts.LabelSelector = client.MatchingLabelsSelector{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly redundant to do this twice. You could have a function that returns the label selector. in fact, I'm surprised there isn't one under pkg/reconcilermanager/controllers already. Don't we need it there too for cleanup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one but it's a method of reconcilerBase. I suppose I can refactor that to a function


func subjectsEqual(want []string, got []rbacv1.Subject) error {
if len(got) != len(want) {
return errors.Wrapf(ErrFailedPredicate, "want %v subjects; got %v", want, got)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ErrFailedPredicate pattern seems weird. I would expect the caller of the predicates to add it as a wrapper, rather that every individual predicate to need to return it, especially as the root cause...

@@ -0,0 +1,346 @@
// Copyright 2022 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new files should use the current year copyright

// Set some custom roleRef overrides
rootSyncA.Spec.SafeOverride().RoleRefs = []v1beta1.RootSyncRoleRef{
{
RoleRefBase: v1beta1.RoleRefBase{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this nesting of the base type help make something easier?
seems like there could just be a namespaced ref and a non-namespaced ref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal here is that code paths can be shared for reposync/rootsync controllers, given they share a common struct. If the structs were not nested, we would either need separate implementations or some glue logic to translate between struct types


if roleConfig.Name == "" {
roleConfig.Name = "cluster-admin"
func (r *RootSyncReconciler) cleanRBACBindings(ctx context.Context, reconcilerRef, rsRef types.NamespacedName) error {
Copy link
Contributor

@karlkfi karlkfi Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanRBACBindings -> cleanBindings or cleanRoleBindings ? This cleans Cluster and Namespaced RoleBindings, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this cleans both ClusterRoleBindings and RoleBindings, so I figured cleanRBACBindings would be a precise yet generic name


_, err := r.upsertClusterRoleBinding(ctx, reconcilerRef, roleConfig.Name)
return errors.Wrap(err, "upserting cluster role binding")
func (r *RootSyncReconciler) configureClusterRoleBinding(ctx context.Context, reconcilerRef, rsRef types.NamespacedName, roleRefs []v1beta1.RootSyncRoleRef) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindof a weird name. might need a func comment to explain. This is upserting and deleting? Maybe it should just say upsert? or apply?

@karlkfi
Copy link
Contributor

karlkfi commented Nov 1, 2023

Lotta code for one PR.
Would it be possible to pull out the labels added for GC into a separate PR to merge first?
Or maybe do the RootSync changes and RepoSync changed as separate PRs?
Anything else that might come out cleanly?

@sdowell
Copy link
Contributor Author

sdowell commented Nov 2, 2023

Lotta code for one PR. Would it be possible to pull out the labels added for GC into a separate PR to merge first? Or maybe do the RootSync changes and RepoSync changed as separate PRs? Anything else that might come out cleanly?

@karlkfi Let's start with approving the design doc so I don't spend too much time spinning my wheels #990

The roleRefs field is intended to allow more flexible and simple
management of bindings for both RootSync and RepoSync objects. The field
allows a list of references to Role and ClusterRole objects. For
RepoSyncs, this will always create a RoleBinding in the same namespace
as the RepoSync. For RootSyncs, the user can either choose to create a
RoleBinding or ClusterRoleBinding via the namespace field.

This is an iteration upon the previous design which added a single
ClusterRoleBinding configuration for RootSyncs. This design allows more
flexible configuration for RootSyncs, as well as a consistent API for
RepoSyncs which simplifies the management of RepoSync RoleBindings.
Copy link

@sdowell: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kpt-config-sync-presubmit-e2e-multi-repo-test-group1 1b37e01 link true /test kpt-config-sync-presubmit-e2e-multi-repo-test-group1
kpt-config-sync-presubmit-e2e-multi-repo-test-group3 1b37e01 link true /test kpt-config-sync-presubmit-e2e-multi-repo-test-group3
kpt-config-sync-presubmit-e2e-multi-repo-test-group2 1b37e01 link true /test kpt-config-sync-presubmit-e2e-multi-repo-test-group2
kpt-config-sync-presubmit 1b37e01 link true /test kpt-config-sync-presubmit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sdowell
Copy link
Contributor Author

sdowell commented Nov 10, 2023

Ended up breaking this up into several smaller PRs. Pending review on #991 for the final change to implement the RootSync API

@sdowell sdowell closed this Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants