-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve failure domains #10476
Comments
This issue is currently awaiting triage. CAPI contributors will take a look as soon as possible, apply one of the Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I have been giving this some thought recently specifically in the context of CAPO, but also with a view to how it could be implemented more generally. The two principal problems we have with the current implementation are: In OpenStack specifically, a 'failure domain' can in practice be an arbitrarily complex set of configurations spanning separate configurations for at least compute, storage, and network. In order to use MachineSpec.FailureDomain we would effectively have to make this a reference to some other data structure. This dramatically increases complexity for both developers and users. As failure domains are arbitrarily complex configuration, they can change over time. There is currently no component which can recognise that a machine is no longer compliant with its failure domain and perform some remediation. In OpenShift we have the Control Plane Machine Set operator (CPMS). This works well for us, but this is because, being in OpenShift, it can take a number of liberties which are unlikely to be acceptable in CAPI, specifically the following are baked directly into the controller:
However, this is the extent of the provider-specific code in CPMS. It's quite a simple interface. I had an idea that we might be able to borrow ideas from CPMS and the kube scheduler to implement something relatively simple but very flexible. What follows is very rough. It's intended for discussion rather than as a concrete design proposal. The high level overview is that we would add a FailureDomainPolicyRef to MachineSpec. If a Machine has a FailureDomainPolicyRef, the Machine controller will not create an InfrastructureMachine until the MachineSpec also has a FailureDomainRef. A user might create: MachineTemplate: spec:
template:
spec:
...
failureDomainPolicyRef:
apiVersion: ...
kind: DefaultCAPIFailureDomainPolicy
name: MyClusterControlPlane DefaultCAPIFailureDomainPolicy: metadata:
name: MyClusterControlPlane
spec:
spreadPolicy: Whatever
failureDomains:
apiVersion: ...
kind: OpenStackFailureDomain
names:
- AZ1
- AZ2
- AZ3 OpenStackFailureDomain metadata:
name: AZ1
spec:
computeAZ: Foo
storageAZ: Bar
networkAZ: Baz If OpenStackFailureDomain is immutable, it can only be 'changed' by creating a new one and updating the failure domain policy. The failure domain policy controller would watch Machines with a failureDomainPolicyRef. It would assign a failureDomain from the list according to the configured policy. It also has the opportunity to notice that a set of Machines is no longer compliant with the policy and remediate by deleting machines so new, compliant machines can replace them. Because the failure domain is now a reference to a provider-specific CRD, the infrastructure machine controller can take provider-specific actions to apply the failure domain to an infrastructure machine. For users who don't need this complexity, the infrastructure cluster controller could create a default policy much the way it does now which could be applied to a KCP machine template. A design like this in the MachineSpec would also have the advantage that it could be used without modification for any set of machines. So, for example, users who want to spread a set of workers in an MD across 2 FDs would be able to do that. |
I believe something like this would also be effective for vSphere, where failure domains are also complex as one cluster could in theory span multiple clusters. Not sure exactly how this is handled in CAPV today. |
Grouping a couple of issue/ideas about failure domain which are not getting attention from the community
To address this issue we need a proposal that looks into how to handle operations for failure domain (going behind the initial placement of machines currently supported)
#4031
#5667
#7417
/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: