Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cinder-csi-plugin] Multi region/clouds support for controllerServer #2551

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

MatthieuFin
Copy link

@MatthieuFin MatthieuFin commented Feb 14, 2024

Affected binaries:

  • cinder-csi-plugin

What this PR does / why we need it:
Enable multi region / multi clouds support on cinder-csi-plugin controllerServer.

Which issue this PR fixes(if applicable):
fixes #2035

Special notes for reviewers:
I have a kubernetes cluster stretch on multiples openstack clusters (3 OVH and 1 onPremise linked by a private dark fiber).
That why my approche is more "multi clouds" than only "multi region"

I don't touch nodeServer behavior (I simply deploy a Daemonset with a nodeSelector on a label whitch select nodes based on their hypervisor, and mount a dedicated secret with openstack creds associated)

The purpose is on controllerServer which handle all kube requests to manage pvc, he had be able to CRUD volumes on all managed openstack cluster.

I propose to use gcfg subsection feature to be abble to handle multiple "sections" Global in config files. This permit to be backaward compatible with configfiles syntax.

I choose to use StorageClass parameters to deal with cloud selection (i add an optionnal field cloud which containt config Global subsection name). In this way when a createVolumeRequest income controllerServer could identify OSInstance to used based on match between SC field parametes.cloud and his config Global subsections.

Issue
This PR is currently a MVP, which permit to create volumes and attach them to correct VM / OpenStack cluster, but i have a bit issue to troubleshoot, indeed all csi spec message don't contains volume_context or parameters, especially for DeleteVolumeRequest which only contain volumeID.

I have 2 ideas to troubleshoot this issue:

  • query each managed OS clouds to find one who had a volume with searched ID (That's linear complexity)
  • implement a kubernetes client and as kubernetes API to find (bad idea, I just realized that volume_id is openstack id not pv name)
  • maybe use secrets field as it seems suggested in external-provisioner discussion

Release note:

feat add multi regions/clouds support.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 14, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @MatthieuFin!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 14, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @MatthieuFin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jichenjc for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 14, 2024
@MatthieuFin
Copy link
Author

As mentioned here and after some local tests it seems to be the proper implementation way.

I propose to avoid usage of storageClass parameter.cloud and volume context to propagate config cloud name between gRPC calls, and instead use a secret with a key cloud which contain the Global config subsection name, which permit to controller to retrieve proper credentials from his configuration.

@dulek
Copy link
Contributor

dulek commented Feb 16, 2024

/ok-to-test

Sorry, I won't have time to take a closer look before my vacations, will be back in 2 weeks. @kayrus might be the proper person to check this PR out.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 16, 2024
@dulek
Copy link
Contributor

dulek commented Feb 16, 2024

Looks like the tests have to be fixed here.

@MatthieuFin
Copy link
Author

/retest

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 16, 2024
@MatthieuFin
Copy link
Author

/retest

@MatthieuFin MatthieuFin marked this pull request as ready for review February 19, 2024 20:07
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2024
@MatthieuFin
Copy link
Author

Hi,
After a few days of use on my cluster, I encountered an issue, regard allowedTopologies set on StorageClass (first of all don't forget to enable feature gate Topology add argument --feature-gates Topology=true on container csi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.

My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.

@MatthieuFin MatthieuFin marked this pull request as draft February 26, 2024 09:52
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2024
@MatthieuFin MatthieuFin marked this pull request as ready for review February 26, 2024 14:38
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2024
@sebastienmusso
Copy link

Hi, After a few days of use on my cluster, I encountered an issue, regard allowedTopologies set on StorageClass (first of all don't forget to enable feature gate Topology add argument --feature-gates Topology=true on container csi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.

My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.

we have exactly the same problem (several openstack with same availability zone name)

I'm interesseted by this feature let me know if you need some test feedback

regards

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2024
…ubernetes#2035)

* feat(cinder-csi-plugin-controller): add support --cloud-name arg which permit to manage multiple config Global sections

* feat(cinder-csi-plugin-controller): add support of key `cloud` from secret specified in storageClass to reference specific config Global section

* feat(cinder-csi-plugin-node): add support --cloud-name arg which permit to specify one of config Global section

Signed-off-by: MatthieuFin <matthieu2717@gmail.com>
…ubernetes#2035)

* feat(cinder-csi-plugin-node): Additionnal topology keys `--additionnal-topology` are announced on NodeGetInfo to create proper CSINode object
    and could used in storage class allowedTopologies field, in combination with csi-provisioner `--feature-gates Topology=true`
    created PV will have nodeAffinity set with topologies presents in storageClass allowedTopologies and in `--additionnal-topology` argument.

Signed-off-by: MatthieuFin <matthieu2717@gmail.com>
Signed-off-by: MatthieuFin <matthieu2717@gmail.com>
Copy link

linux-foundation-easycla bot commented May 14, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 14, 2024
@MatthieuFin MatthieuFin force-pushed the multi-clouds branch 2 times, most recently from e4e424e to 5861f43 Compare May 14, 2024 12:31
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 14, 2024
add documentation and examples about multi cloud support configuration

Co-authored-by: sebastienmusso <smusso@lfdj.com>
Co-authored-by: FlorentLaloup <f.laloup@qwant.com>
Signed-off-by: MatthieuFin <matthieu2717@gmail.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 14, 2024
@MatthieuFin
Copy link
Author

Thanks to @sebastienmusso, @fllaap and @Krast76 who offered to help write documentation.

@dulek, @jichenjc, @kayrus, @mdbooth how can we advance this PR ?

@sergelogvinov
Copy link
Contributor

Thank you for this PR, very interesting idea. 👍

So the complexity of this PR is that node-pluging needs openstack api credentials for ephemeral storage which is deprecated. Perhaps we should ask about the removal process for it?

The killer feature is to use one name of storageClass in stetefulset deployment. Base on nodAffinity/podAntiAffinity kubernetes scheduler spread the pods across the regions/zones.

I quit to using allowedTopologies in storageClass they are immutable. It is very hard to maintain. It's better to use nodeSelector/nodeAffinity instead; you can change these parameters during the lifetime.

To help kubernetes scheduler to choose the best region/zone (if not set) we need to expose free disk capacity.

@sergelogvinov
Copy link
Contributor

I've added a capacity capabilities - #2597

@sebastienmusso
Copy link

sebastienmusso commented May 22, 2024

@dulek, @jichenjc, @kayrus, @mdbooth how can we advance this PR ?
And if this capabilities is not available within openstack cinder-csi-plugin, how can we handle
the stateful storage with cinder in a kubernetes cluster spread over multiple openstack cluster ?

@jichenjc
Copy link
Contributor

And if this capabilities is not available within openstack cinder-csi-plugin, how can we handle
the stateful storage with cinder in a kubernetes cluster spread over multiple openstack cluster ?

I don't know how other cloud are handling this kind of cross region (e.g AWS cloud provider handle this?) maybe someone can help comment

and I knew openstack itself handling multiple cloud (region) is that not good
especialyly I don't think we knew the free size of the storage cinder backend IIRC
and neutron might give us additional issue as cross region usually means no connection (not sure the impact to our network related things implementation)

so overall I think we may easily focus on compute (VM) provisioning and claim others as gap/follow up?

@MatthieuFin
Copy link
Author

MatthieuFin commented May 24, 2024

Hi @jichenjc ,
Other cloud provider (I'll take example of GCP) offer a unique storage class which is able to create pods across multiple "regions".
To do that they implements topology spec in their csi implementation with help of PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS

Technically our openstack implementation doesn't support PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS so GRPc calls createVolume doesn't have field accessibility_requirements fill.

We can implement support of PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS to get informations in field accessibility_requirements and retreive information from scheduler or git topology choose by csi plugin to scheduler (depending of volumeBindingMode). anyway this approach doesn't help us here.

Indeed, GCP or AWS could implement that because all regions are uniformly managed by cloud API, so when you ask for a volume_id in DeleteVolumeRequest for example (same for others calls) to your cloud API, you have got response never mind in which region is the volume.

In our case, manage multi openstack region is the same approach than manage multiple openstack clusters. So I have many n clouds api to n regions, where in GCP you have n region for only 1 cloud API.

The "GCP/AWS like" behavior is probably to manage "regions" with AZ zone notion in openstack. But that is not my use case, I need to be able to support multiple openstack clusters not multiple AZ in one openstack cluster. And I don't know public cloud provider based on openstack which properly implement openstack availability zones...

Moreover if we wanna handle multi clouds (openstack regions) with the help of PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS we have 2 possibility:

  • modify csi spec to add some additional data which permit to trace cloud volume owner in GRPc calls (I don't think that will be possible/accepted)
  • Implement an "openstack api like" software which expose a unique API to csi plugin and manage multiple clouds in backend and maintain an internal database to know which cloud has each volume_id, and ovoid possible collision in volume_id between 2 differents clouds.

Personally these 2 solution really to complex to implement and maintain.

This PR permit to do it easier way to understand and maintain imo.

I have a storageClass per regions/clouds. and if i wanna deploy 1 sts cross 3 regions I create 3 pvc one per regions (each with a different SC: data-0, data-1, data-2) and I use sts:

spec.volumeClaimTemplates:
  - apiVersion: v1
    kind: 
      name: data

In that way I'm able to dispatch pods from same sts cross different sc and different openstack cluster.

If you wanna a "multi-region" implementation like GCP or AWS

  • Currently I couldn't implement it because I haven't openstack cluster environment with multi AZ.
  • You miss a feature that GCP/AWS doesn't offer which is storage support cross multi openstack cluster.

From my limited experience as a cloud consumer, openstack providers offer openstacks with multiple regions and not with multiple AZs. It's sad I know, but it seems to be the public offerings.

Concerning network cross region indeed it shouldn't have to be managed by neutron, personally i managed it by myself out of openstack with some BGP routing and physical interconnections L2 (dark fiber) cross my DCs and ipsec tunnels for dev/poc links cross different DCs not physically linked (I shared my firsts reflections about it here but that's out of this PR scope). I know that @sebastienmusso run k8s clusters spread on multiples openstack clusters too, I don't know how you handle this point ?

Implement capacity capabilities as proposed by @sergelogvinov could probably help in multi AZ environment but in this PR context it should probably not have impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cinder-csi-plugin] Multi-region support for provisioning Cinder storage.
6 participants