SubnetSpec changes in `v1beta2` break existing use-cases #4026

AverageMarcus · 2023-01-31T17:25:46Z

/kind bug
/kind api-change

The changes introduced in #3748 (aimed at solving server side apply issues with Cluster Class) break existing use-cases around subnets created by CAPA.

The change makes the ID property on SubnetSpec now be a required field where previously this was optional. This change means that configuration to subnets used for CAPA clusters can now only be done if infrastructure is created by the user resulting in CAPA only allowing for either the default subnet layout or a bring-your-own network where the user would be required to create everything (VPC, subnets, etc.) externally before the creation of the AWSCluster CR.

With v1beta1 and earlier it was possible to specify some of the subnet configuration (such as the CIDR block and the AWS tags) but still have CAPA create the resources for you.

For example, we have a situation where we need more control over what resources end up in what subnets and rely on AWS tags and subnet filters to achieve this.

spec:
  network:
    subnets:
      - cidrBlock: 10.0.0.0/24
        availabilityZone: eu-west-1a
        tags:
          subnet-role: control-plane
      - cidrBlock: 10.0.1.0/24
        availabilityZone: eu-west-1b
        tags:
          subnet-role: control-plane
... etc ...

With this approach we can configure CAPA to create all the subnets we need, with the tags we can then filter on, without having to use an external process to create these resources. This is no longer possible with v1beta2.

In the PR it was acknowledged that this was a breaking change and that we'd need to come up with an alternative to solve this problem going forward but unfortunately was forgotten about so I'm opening this issue to have that discussion and try to come up with a solution to both the original issue the PR was fixing and the use cases currently in use that require configuring subnets in CAPA.

This also fits into the wider discussion around networking in CAPA and the improvements / changes desired.

Related:

Environment:

Cluster-api-provider-aws version: v2.0.2
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

AverageMarcus · 2023-01-31T17:27:40Z

/cc @anmolsachan @joeunlog as you both either commented on the PR or raised an issue about this problem

richardcase · 2023-02-01T12:52:29Z

/triage accepted
/priority critical-urgent

Skarlso · 2023-02-01T12:54:35Z

To keep both things happy, what about adding a new entry called subnetMutation and then providing user values there and we would merge that with the existing settings?

Though, that might pose a challenge in finding the correct subnet to apply it to. Uh....

luthermonson · 2023-02-01T22:29:11Z

Just a thought to add to the pile. It's my opinion that CAPA should only offer two methods to get a cluster...

you are new to this and have aws secret/key and you want a kube cluster, this is great for small/medium sized orgs who are ok having 1 VPC per cluster and will likely never reach the 5 vpc soft limit amazon sets on new accounts any time soon. in this scenario capa maintains the entire infra lifecycle.
you know everything, and you want to tell capa exactly what it should use, this is great for enterprise orgs who have different teams who manage the different pieces (networking team, storage team) and will need to be told where/when their cluster/nodes can exist. in this scenario capa manages nothing if IDs are provided...

I don't feel there should be a middle ground where CAPA acts on its own to try and reconcile infrastructure based on partial user input. this will lead to bloating the CAPA API just to avoid using an existing tool specialized in solving this problem. To get people who want to graduate from CAPA maintaining all infra to having specified infra the CAPA repo could maintain a sample terraform module to help migrate off the default CAPA infra.

@Skarlso the idea of adding subnetMutation won't stop, there will always be something else to mutate and code will fill up with if/else blocks trying to figure out what the user wants and when CAPA assumed wrong the users will just come complaining. CAPA is a consumer of AWS infrastructure, it shouldn't be in the business of creating/maintaining it, but I can concede that method 1 above lowers the barrier to entry to new users and should be maintained.

Skarlso · 2023-02-02T07:39:50Z

To add to this, if you would like to stay in the Kubernetes world but still use terraform together with CAPA, there is an option to use tf-controller. It's a Terraform Kubernetes controller which a lot of options and nice things to reconcile terraform manifests into a cluster. It's great if you are looking for a hybrid solution.

AverageMarcus · 2023-02-02T07:56:36Z

CAPA is a consumer of AWS infrastructure, it shouldn't be in the business of creating/maintaining it,

Not sure I agree with this. The whole purpose of CAPA as far as I see it is to manage AWS infrastructure - creating, updating and deleting it.

There is a third (and like more) scenarios where CAPI/CAPA is used besides the two you proposed - there are many platform offerings built on top of it that leverage the standard nature of cluster-api to build a larger product for customers.

I think it's also worth pointing out that...

Manages the bootstrapping of VPCs, gateways, security groups and instances.

is the second listed feature of CAPA in the readme. If that is was change I think that's a much bigger discussion that would also need to include CAPI and other providers to ensure consistency.

Skarlso · 2023-02-02T08:01:50Z

There is a third (and like more) scenarios where CAPI/CAPA is used besides the two you proposed - there are many platform offerings built on top of it that leverage the standard nature of cluster-api to build a larger product for customers.

Sure, but you understand that CAPA cannot cater for every possible need of these platforms. :)

I, too, believe that most of CAPA's problems today stem from the fact it's trying to "merge" partial user settings with defaults and stuff that comes from AWS.

I wouldn't go as far as to completely rule out user input, but I sure as heck would recommend not touching unmanaged things AT ALL. There are only problems stemming from that route. I even created a proposal to remove settings any kind of tags for unmanaged content. We document everything that we require, then the user can set them up as they like. There are at least 3 issues because of how capa tries to juggle various tags on unmanaged entities.

I'm am of the same mind regarding other content, such as subnets. You should be able to have some user input, sure, but at some point, we just have to stop trying to merge partial input with defaults and information coming from AWS.

AverageMarcus · 2023-02-02T09:00:07Z

I wouldn't go as far as to completely rule out user input, but I sure as heck would recommend not touching unmanaged things AT ALL. There are only problems stemming from that route. I even created a proposal to remove settings any kind of tags for unmanaged content. We document everything that we require, then the user can set them up as they like. There are at least 3 issues because of how capa tries to juggle various tags on unmanaged entities.

I agree with this. If the user is providing their own infrastructure then CAPA should be "read-only". But I do still think (and think you agree if I understood) that when CAPA is managing the infra it should be possible to configure them to some degree. My example of this is being able to set the CIDRs and AWS tags used for subnets.

phillipsj · 2023-02-02T15:20:23Z

We are currently investigating the use of CAPA and this is a huge blocker. The possibility that it can alter resources it doesn't manage with the permission set that is currently required is horrifying.

AndiDog · 2023-02-09T10:27:43Z

In Giant Swarm's fork, I have reverted the buggy changes and added a unit test on top which covers the "managed VPC/subnet" use case. With that change, it seems the functionality is again working as before.

See giantswarm#437.

We will use that in production in order to get unblocked, and then help drafting the design document for this issue ("Rethinking Networking in CAPA").

k8s-triage-robot · 2023-03-11T10:35:27Z

This issue is labeled with priority/critical-urgent but has not been updated in over 30 days, and should be re-triaged.
Critical-urgent issues must be actively worked on as someone's top priority right now.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority {important-soon, important-longterm, backlog}
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Skarlso · 2023-03-11T16:17:58Z

/priority important-longterm

Skarlso · 2023-03-11T16:18:06Z

/triage accepted

k8s-triage-robot · 2023-05-11T06:40:22Z

This issue is labeled with priority/critical-urgent but has not been updated in over 30 days, and should be re-triaged.
Critical-urgent issues must be actively worked on as someone's top priority right now.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority {important-soon, important-longterm, backlog}
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Skarlso · 2023-05-11T06:50:07Z

/triage accepted

k8s-triage-robot · 2023-06-10T07:08:54Z

This issue is labeled with priority/critical-urgent but has not been updated in over 30 days, and should be re-triaged.
Critical-urgent issues must be actively worked on as someone's top priority right now.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority {important-soon, important-longterm, backlog}
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Skarlso · 2023-06-11T18:13:24Z

/triage accepted

k8s-triage-robot · 2023-07-11T18:59:48Z

This issue is labeled with priority/critical-urgent but has not been updated in over 30 days, and should be re-triaged.
Critical-urgent issues must be actively worked on as someone's top priority right now.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority {important-soon, important-longterm, backlog}
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

AndiDog · 2023-07-11T21:47:29Z

We will try covering this issue in a prototype for the CAPA - Rethinking Networking Proposal. See also meeting minutes of CAPA office hours for what was discussed.

vincepri · 2023-08-31T20:46:48Z

Thanks @AndiDog and @AverageMarcus for creating an alternative proposal.

I'm looking through the open issues here and I agree that this is a big regression that shouldn't have happened in the first case. @richardcase and other maintainers, we should prepare for a breaking change and revert the behavior in this case; this was one of the primary values for Cluster API AWS providers.

Was the SSA bug originally due to ClusterClass and Managed Topologies?

vincepri · 2023-08-31T20:46:57Z

/triage accepted

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 31, 2023

AverageMarcus mentioned this issue Jan 31, 2023

API Evolution for VPC and Networking Topologies #1484

Open

AndiDog mentioned this issue Feb 1, 2023

CAPA v1beta2 breaks the subnet config giantswarm/roadmap#1932

Closed

This was referenced Feb 6, 2023

Upgrade CAPA to v2.x giantswarm/cluster-api-provider-aws-app#152

Closed

Workaround to get managed subnets working again giantswarm/cluster-api-provider-aws#437

Merged

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 11, 2023

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 11, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 11, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 11, 2023

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 11, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 11, 2023

AmitSahastra mentioned this issue May 11, 2023

On Hold: Spectro v1beta1 1.5.2 spectrocloud/cluster-api-provider-aws#619

Open

4 tasks

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jun 10, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 11, 2023

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 11, 2023

This was referenced Aug 26, 2023

feat: support IPAM Manager for VPC creation #4444

Merged

Workaround for managed subnets adapted to v2.2 giantswarm/cluster-api-provider-aws#543

Merged

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 31, 2023

vincepri mentioned this issue Aug 31, 2023

🐛 Restore subnet management functionality #4474

Merged

4 tasks

AndiDog mentioned this issue Sep 5, 2023

REQUEST: New membership for AndiDog kubernetes/org#4437

Closed

9 tasks

k8s-ci-robot closed this as completed in #4474 Oct 8, 2023

AndiDog mentioned this issue Oct 8, 2023

cluster-aws: Adapt to CAPA subnet spec changes giantswarm/roadmap#2870

Closed

AndiDog mentioned this issue Nov 15, 2023

✨ Add test for provider-managed infrastructure, fix writing latest status back into ClusterScope #4637

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SubnetSpec changes in `v1beta2` break existing use-cases #4026

SubnetSpec changes in `v1beta2` break existing use-cases #4026

AverageMarcus commented Jan 31, 2023

AverageMarcus commented Jan 31, 2023

richardcase commented Feb 1, 2023

Skarlso commented Feb 1, 2023

luthermonson commented Feb 1, 2023

Skarlso commented Feb 2, 2023 •

edited

AverageMarcus commented Feb 2, 2023

Skarlso commented Feb 2, 2023

AverageMarcus commented Feb 2, 2023

phillipsj commented Feb 2, 2023

AndiDog commented Feb 9, 2023

k8s-triage-robot commented Mar 11, 2023

Skarlso commented Mar 11, 2023

Skarlso commented Mar 11, 2023

k8s-triage-robot commented May 11, 2023

Skarlso commented May 11, 2023

k8s-triage-robot commented Jun 10, 2023

Skarlso commented Jun 11, 2023

k8s-triage-robot commented Jul 11, 2023

AndiDog commented Jul 11, 2023

vincepri commented Aug 31, 2023

vincepri commented Aug 31, 2023

SubnetSpec changes in v1beta2 break existing use-cases #4026

SubnetSpec changes in v1beta2 break existing use-cases #4026

Comments

AverageMarcus commented Jan 31, 2023

AverageMarcus commented Jan 31, 2023

richardcase commented Feb 1, 2023

Skarlso commented Feb 1, 2023

luthermonson commented Feb 1, 2023

Skarlso commented Feb 2, 2023 • edited

AverageMarcus commented Feb 2, 2023

Skarlso commented Feb 2, 2023

AverageMarcus commented Feb 2, 2023

phillipsj commented Feb 2, 2023

AndiDog commented Feb 9, 2023

k8s-triage-robot commented Mar 11, 2023

Skarlso commented Mar 11, 2023

Skarlso commented Mar 11, 2023

k8s-triage-robot commented May 11, 2023

Skarlso commented May 11, 2023

k8s-triage-robot commented Jun 10, 2023

Skarlso commented Jun 11, 2023

k8s-triage-robot commented Jul 11, 2023

AndiDog commented Jul 11, 2023

vincepri commented Aug 31, 2023

vincepri commented Aug 31, 2023

SubnetSpec changes in `v1beta2` break existing use-cases #4026

SubnetSpec changes in `v1beta2` break existing use-cases #4026

Skarlso commented Feb 2, 2023 •

edited