Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Improve dry run for topology changes to dry run server side apply #6710

Merged
merged 1 commit into from Jul 7, 2022

Conversation

chrischdi
Copy link
Member

@chrischdi chrischdi commented Jun 23, 2022

What this PR does / why we need it:

This proposes an improvement on how we implement dry run for detecting changes by actually using dry run server side apply.

The proposed implementation has the following advantages

  • It uses the same logic as using the real server side apply patch operation
  • The diff logic relies on comparing the current object to the object how it would look like after server side apply
  • The diff logic does not rely on how managed fields are used by the kube-apiserver and requires no knowledge about the handled object or used schema.

Disadvantages:

  • Using dry run server side apply requires more requests to the kube-apiserver because it requires an additional request for each comparison.

TODOs:

  • Add provider implementation documentation:
    • for template resources which get rotated (known types are: InfrastructureMachineTemplate or BootstrapTemplate): If there are existing validating webhooks which block due to immutability when a resource gets updated (ValidateUpdate) they need to implement the sigs.k8s.io/cluster-aip/util/webhooks.TopologyAwareValidator instead which provides a bool to decide skipping immutability checks

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 23, 2022
@chrischdi chrischdi force-pushed the poc-ssa-dryrun branch 7 times, most recently from 718866b to ae40566 Compare June 23, 2022 14:40
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very high level review & ignored the tests for now

Looks good

@chrischdi
Copy link
Member Author

/hold

competes with #6709

/test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2022
@chrischdi
Copy link
Member Author

/test all

@chrischdi
Copy link
Member Author

chrischdi commented Jun 24, 2022

e2e currently fails due to:

error reconciling the Cluster topology: failed to create patch helper for DockerMachineTemplate/k8s-upgrade-with-runtimesdk-1uasfg-control-plane-52rwq: failed to determine changes via dryRunSSAPatch: failed to request dry-run server side apply: admission webhook "validation.dockermachinetemplate.infrastructure.cluster.x-k8s.io" denied the request: DockerMachineTemplate.infrastructure.cluster.x-k8s.io "k8s-upgrade-with-runtimesdk-1uasfg-control-plane-52rwq" is invalid: spec.template.spec: Invalid value: v1beta1.DockerMachineTemplate{TypeMeta:v1.TypeMeta{Kind:"DockerMachineTemplate", APIVersion:"infrastructure.cluster.x-k8s.io/v1beta1"}, ObjectMeta:v1.ObjectMeta{Name:"k8s-upgrade-with-runtimesdk-1uasfg-control-plane-52rwq", GenerateName:"", Namespace:"k8s-upgrade-with-runtimesdk-t58uv4", SelfLink:"", UID:"a52c48ad-71e8-423b-98f5-5c4f0861479b", ResourceVersion:"2014", Generation:2, CreationTimestamp:time.Date(2022, time.June, 24, 8, 12, 20, 0, time.Local), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"cluster.x-k8s.io/cluster-name":"k8s-upgrade-with-runtimesdk-1uasfg", "topology.cluster.x-k8s.io/owned":""}, Annotations:map[string]string{"cluster.x-k8s.io/cloned-from-groupkind":"DockerMachineTemplate.infrastructure.cluster.x-k8s.io", "cluster.x-k8s.io/cloned-from-name":"quick-start-control-plane"}, OwnerReferences:[]v1.OwnerReference{v1.OwnerReference{APIVersion:"cluster.x-k8s.io/v1beta1", Kind:"Cluster", Name:"k8s-upgrade-with-runtimesdk-1uasfg", UID:"1836653c-8570-43a8-b20b-a677d9f9c689", Controller:(*bool)(nil), BlockOwnerDeletion:(*bool)(nil)}}, Finalizers:[]string(nil), ZZZ_DeprecatedClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"capi-topology", Operation:"Apply", APIVersion:"infrastructure.cluster.x-k8s.io/v1beta1", Time:time.Date(2022, time.June, 24, 8, 18, 27, 0, time.Local), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0x4000d99140), Subresource:""}}}, Spec:v1beta1.DockerMachineTemplateSpec{Template:v1beta1.DockerMachineTemplateResource{ObjectMeta:v1beta1.ObjectMeta{Labels:map[string]string(nil), Annotations:map[string]string(nil)}, Spec:v1beta1.DockerMachineSpec{ProviderID:(*string)(nil), CustomImage:"kindest/node:v1.24.0", PreLoadImages:[]string(nil), ExtraMounts:[]v1beta1.Mount{v1beta1.Mount{ContainerPath:"/var/run/docker.sock", HostPath:"/var/run/docker.sock", Readonly:false}}, Bootstrapped:false}}}}: DockerMachineTemplate spec.template.spec field is immutable. Please create a new resource instead.

We have to catch immutability some more...

@sbueringer
Copy link
Member

Hm. That's an interesting issue. Any ideas on how to work around this? A lot of InfrastructureMachineTemplates are immutable

@chrischdi
Copy link
Member Author

Member

Yes, catch the error (what we already do) but detect it as return true, true, nil in this case

@sbueringer
Copy link
Member

sbueringer commented Jun 24, 2022

Member

Yes, catch the error (what we already do) but detect it as return true, true, nil in this case

I probably misunderstood. My impression is we can't run SSA dryrun on immutable templates because the Validation Webhook will block it so then we won't get a meaningful diff?

Does the SSA dryrun give us a result despite the validation webhook blocking the request?

@chrischdi
Copy link
Member Author

Member

Yes, catch the error (what we already do) but detect it as return true, true, nil in this case

I probably misunderstood. My impression is we can't run SSA dryrun on immutable templates because the Validation Webhook will block it so then we won't get a meaningful diff?

Does the SSA dryrun give us a result despite the validation webhook blocking the request?

See the implementation. Thats imho the best we can do here: identify if the error causes are only of spec fields which got are marked as invalid.

@chrischdi chrischdi force-pushed the poc-ssa-dryrun branch 2 times, most recently from 0b269dc to cc65936 Compare June 24, 2022 09:40
@sbueringer
Copy link
Member

sbueringer commented Jun 24, 2022

Member

Yes, catch the error (what we already do) but detect it as return true, true, nil in this case

I probably misunderstood. My impression is we can't run SSA dryrun on immutable templates because the Validation Webhook will block it so then we won't get a meaningful diff?
Does the SSA dryrun give us a result despite the validation webhook blocking the request?

See the implementation. Thats imho the best we can do here: identify if the error causes are only of spec fields which got are marked as invalid.

Just that I interpret the code correctly. If we get an CauseTypeFieldValueInvalid error from a field in spec. we assume we have changes?

So as long as immutable errors are returned as CauseTypeFieldValueInvalid and with the correct path we detect immutable errors as changes. Which is correct because without changes no immutable error.

I think this doesn't cover cases where e.g. the CAPA provider returns a BadRequest error?

@sbueringer
Copy link
Member

Just a short status update: we discussed the issue a bit and evaluated options. We will explore them and then make a suggestion

@sbueringer
Copy link
Member

lgtm pending squash + #6710 (comment)

I would be fine with moving #6710 (comment) to a follow-up issue so we get this PR merged ASAP and still get a few days time in CI.

@chrischdi
Copy link
Member Author

lgtm pending squash + #6710 (comment)

I would be fine with moving #6710 (comment) to a follow-up issue so we get this PR merged ASAP and still get a few days time in CI.

Regarding squash: @fabriziopandini do we still want to keep the CAPD change as seperate commit?

Sounds good to move it to a follow-up. I think we could even link the commit for the CAPD change if we want (which we can do after merge only)?!

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-informing-ipv6-main
/test pull-cluster-api-e2e-informing-main
/test pull-cluster-api-e2e-workload-upgrade-1-21-1-22-main

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 7, 2022

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-workload-upgrade-1-24-latest-main d6f0297 link false /test pull-cluster-api-e2e-workload-upgrade-1-24-latest-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@chrischdi
Copy link
Member Author

@sbueringer
Copy link
Member

Sounds good to move it to a follow-up. I think we could even link the commit for the CAPD change if we want (which we can do after merge only)?!

I would prefer having relevant documentation directly in the documentation vs. linking to commits on merged PRs. No objections to an additional link. I think the link to the commit stays the same (now vs after merge)

@chrischdi
Copy link
Member Author

  • Added unit tests for the ShouldSkipImmutabilityChecks func
  • Squashed commits

cc @fabriziopandini @sbueringer

/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-informing-ipv6-main
/test pull-cluster-api-e2e-informing-main
/test pull-cluster-api-e2e-workload-upgrade-1-21-1-22-main

@chrischdi
Copy link
Member Author

/cherry-pick release-1.2

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

* Adds helper function for CustomValidators to skip immutability checks
* CAPD: skip immutability checks for topology dry run
* book: add provider documentation

Co-authored-by: fabriziopandini <fpandini@vmware.com>
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-informing-ipv6-main
/test pull-cluster-api-e2e-informing-main
/test pull-cluster-api-e2e-workload-upgrade-1-21-1-22-main

@sbueringer
Copy link
Member

Thx
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 7, 2022
@fabriziopandini
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 7, 2022
@sbueringer
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 7, 2022
@k8s-ci-robot k8s-ci-robot merged commit a9b5887 into kubernetes-sigs:main Jul 7, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Jul 7, 2022
@k8s-infra-cherrypick-robot

@chrischdi: new pull request created: #6861

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants