Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of multi-tenancy proposal #1149

Merged

Conversation

gab-satchi
Copy link
Member

/kind design
What this PR does / why we need it:
Proposing a way to allow CAPV to accommodate multiple tenants within a single instance

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
#1123

Thanks to the AWS and Azure folks for pioneering the multi-tenancy efforts in their respective providers. @devigned @randomvariable @nader-ziada @sedefsavas

@k8s-ci-robot k8s-ci-robot added kind/design Categorizes issue or PR as related to design. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 18, 2021
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 18, 2021
// account from any namespace. This field is intentionally not a
// pointer because the nil behavior (no namespaces) is undesirable here.
// +optional
AllowedNamespaces []string `json:"allowedNamespaces"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how CAPZ defines AllowedNamespaces, in CAPA we have 2 main differences:

  1. We modified AllowedNamespaces to also support labelSelectors:
type AllowedNamespaces struct {
// An nil or empty list indicates that AWSClusters cannot use the principal from any namespace.
//
// +optional
// +nullable
NamespaceList []string `json:"list"`
// AllowedNamespaces is a selector of namespaces that AWSClusters can
// use this ClusterPrincipal from. This is a standard Kubernetes LabelSelector,
// a label query over a set of resources. The result of matchLabels and
// matchExpressions are ANDed.
//
// An empty selector indicates that AWSClusters cannot use this
// AWSClusterPrincipal from any namespace.
// +optional
Selector metav1.LabelSelector `json:"selector"`
}
  1. Also the default behaviour is "allow none" whenAllowedNamespaces is nil to enforce users to make explicit choices.

It may be useful to have a common way of defining this across providers, we can discuss what it may look like. cc @nader-ziada @devigned

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sedefsavas we had some issues with implementing the label selector and that's why we only did the []string, but I do agree its good a consistent behaviour for the default across providers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like both the differences with CAPA. Was originally thinking label selector was overkill but I can see how that's useful with targeting multiple namespaces without having to list each of them out. Will update, thanks Sedef

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to consistent behavior across providers.

What is the behavior when NamespaceList is nil and a selector is set? Is the result always the union of NamespaceList + Selector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah in CAPA's implementation, it is a union of the two. That'll offer the most flexibility

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty allowedNamespaces --> allows all namespaces
nil allowedNamespaces --> blocks all namespaces
Note: This is similar to how networkPolicy label matching is done.

nil NamespaceList --> blocks all
nil Selector --> blocks all

What is the behavior when NamespaceList is nil and a selector is set? Is the result always the union of NamespaceList + Selector?

Yes, if at least one of them matches a namespace, it is allowed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I drafted some examples for how we use allowedNamespaces in CAPA:
https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/2319/files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to say we pull out the list option for v1alpha4. I will take the conversation to CAPI, but kubernetes/kubernetes#101342 is going to move the NamespaceDefaultLabelName feature gate to GA and has been defaulted to on from 1.21 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## Summary
The CAPV controller is capable of managing infrastructure resources on a vCenter using the credentials it was provided during initialization. The credentials are provided via environment variables that get saved onto a secret that's used by the CAPV deployment.

Credentials provided are used for the entire lifetime of the CAPV deployment which means a CAPV cluster can become broken if the provisioning CAPV deployment were to be reconfigred for another set of credentials.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How multitenancy can fix CAPV deployment credential issue?
CAPA will start with credentials retrieved locally, then other roles could be used for creating workload clusters.
When the local credentials (that CAPV uses) change, is the plan to update the secret?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think MT can fix any existing clusters from breaking when the deployment credentials are changed. This was more highlighting a cluster's dependance on the deployment credentials and if MT is fully utilized, it can move that dependency to a VSphereAccount

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't do much if the creds are no longer valid. the user would have to go an edit them, but CAPV should still be on the hook of reading creds at each reconciliation and avoid caching them, to support cred rotation

@yastij yastij self-assigned this Mar 19, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2021
@gab-satchi gab-satchi changed the base branch from release-0.7 to master March 19, 2021 15:46
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 19, 2021

@gab-satchi: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-cluster-api-provider-vsphere-verify-lint 4489ea6 link /test pull-cluster-api-provider-vsphere-verify-lint
pull-cluster-api-provider-vsphere-e2e 4489ea6 link /test pull-cluster-api-provider-vsphere-e2e
pull-cluster-api-provider-vsphere-test 4489ea6 link /test pull-cluster-api-provider-vsphere-test
pull-cluster-api-provider-vsphere-verify-crds 4489ea6 link /test pull-cluster-api-provider-vsphere-verify-crds
pull-cluster-api-provider-vsphere-verify-shell 4489ea6 link /test pull-cluster-api-provider-vsphere-verify-shell

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@yastij yastij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass on the API

## Summary
The CAPV controller is capable of managing infrastructure resources on a vCenter using the credentials it was provided during initialization. The credentials are provided via environment variables that get saved onto a secret that's used by the CAPV deployment.

Credentials provided are used for the entire lifetime of the CAPV deployment which means a CAPV cluster can become broken if the provisioning CAPV deployment were to be reconfigred for another set of credentials.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't do much if the creds are no longer valid. the user would have to go an edit them, but CAPV should still be on the hook of reading creds at each reconciliation and avoid caching them, to support cred rotation

docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved

#### Controller Changes

- If IdentityRef is specified, and it references a `VsphereClusterAccount`, the CRD is fetched and used to create a session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@randomvariable @fabriziopandini - it might worthwhile to inject this reference if nothing is set and there are VsphereClusterAccount that are allowed for that namespace. This avoids having to communicate the kind and Name to folks that want to provision clusters and that don't care much about creds ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the idea interesting, because it makes it easier to decouple roles and responsibilities
What there are more than one Account eligible for a cluster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a mechanism in CAPA for that called AWSClusterControllerIdentity that respects the existing behaviour as well as EC2 instance profiles for cross-account role assumption which is a key requirement for AWS. See https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/docs/proposal/20200506-single-controller-multitenancy.md#upgrade-strategy

Azure doesn't have cross-account role assumption so this wasn't an important use case for them, so was not implemented there.

docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved

#### Clusterctl Changes

Today, clusterctl move operates by tracking objectreferences within the same namespace, since we are now proposing to use cluster-scoped resources, we will need to add requisite support to clusterctl's object graph to track cluster-scoped resources that are used by the source cluster, and ensure they are moved. We will naively not delete cluster-scoped resources during a move, as they maybe referenced across namespaces. If a cluster uses a `Secret` for account credentials, the OwnerReference will get set by the controller and the `Secret` will be moved to the target cluster. Any `VsphereClusterAccount` that's required by the resources being moved, will need to be manually created on the target cluster prior to the move operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clusterctl's object graph to track cluster-scoped resources that are used by the source cluster,

This is kind of complex, given that the only relation between cluster and VsphereClusterAccount is defined in the identityRef field, and thus unknown to the clusterctl move discovery algorithm (that should be generic/provider agnostic).

There are two possible workarounds:

  • Move all the VsphereClusterAccount (by adding the force move label on the VsphereClusterAccount CRD)
  • Move only specific VsphereClusterAccount (by adding the force move label only on those objects)

docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
Copy link
Member

@yastij yastij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

this should be ready to go, I'll leave lgtm to @fabriziopandini re: move choices

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yastij

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2021
AllowedNamespaces *AllowedNamespaces `json:"allowedNamespaces"`
}

type AllowedNamespaces struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according the general guidelines being proposed in CAPI, NamespaceList is missing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR calls out to deprecate NamespaceList for v1alpha4

Providers SHOULD mark NamespaceList as deprecated from v1alpha4

docs/proposal/20210311-single-controller-multitenancy.md Outdated Show resolved Hide resolved
- adds Secret as a credential option
- adds VSphereIdentityRef
- adds VSphereClusterIdentity as a credential option
- utilizes AllowedNamespaces to restrict access to credentials
- follows CAPI guidelines for multi-tenancy
@yastij
Copy link
Member

yastij commented May 5, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2021
@k8s-ci-robot k8s-ci-robot merged commit b1e3f2f into kubernetes-sigs:master May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants