Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'ignoreMissing' Flag to replacement options to allow opting for pre-5.0.0 behavior #5440

Open
2 tasks done
renaudguerin opened this issue Nov 13, 2023 · 12 comments
Open
2 tasks done
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@renaudguerin
Copy link

Eschewed features

  • This issue is not requesting templating, unstuctured edits, build-time side-effects from args or env vars, or any other eschewed feature.

What would you like to have added?

Add an ignoreMissing flag in the ReplacementTransformer's options field.
It would allow users to opt for the pre 5.0.0 / #4789 behavior where missing fields in a target resource were ignored instead of resulting in errors.

Why is this needed?

This feature is needed to address a significant change in Kustomize 5.0.0 where replacements now fail if a targeted field is missing from a resource and options.create isn't set.

The previous behavior was to ignore invalid targets, which allowed users to package commonly used replacements (such as GCP Project ID replacements) as widely reusable components, that could be imported from various service directories with similar-but-not-quite-identical resources, modifying only relevant fields and skipping missing ones without errors. In the absence of parameterized components, this allowed for much needed flexibility in broadly applying replacements to 90% similar resources.

The new behavior, where replacements fail if a target field is missing, significantly disrupts workflows that previously depended on silent skipping of non-matching replacements. This is especially problematic in scenarios like ours, where environment-specific GCP project IDs appear in various formats across Kubernetes manifests, necessitating a universal replacement approach.

Detailed Use Case

In our Kustomize codebase, we deal with environment-specific GCP project IDs in otherwise identical resources. These IDs can appear in multiple formats - as a standalone string, as projects/GCP_PROJECT_ID, or as part of a service account ID (service-account@GCP_PROJECT_ID.iam.gserviceaccount.com), etc. A shared component performing generic replacements of these IDs in our Config Connector resource types is crucial for reducing repetition in our repository.

Here is an example generic replacement I intended to use, which lists all possible targets for GCP_PROJECT_ID. From 5.0.0 it will throw an error if any of the fields in the selected targets are missing.

# _components/replacements/common/gcp_project_id.yaml
# This replacement fills in the GCP project ID (e.g. development-1234567) in places where it can be easily delimited.

source:
  kind: ConfigMap
  name: replacements
  fieldPath: data.GCP_PROJECT_ID
targets:
  # IAMPolicyMember external resourceRefs (projects/PROJECT_ID)
  - select:
      kind: IAMPolicyMember
    fieldPaths:
      - spec.resourceRef.external
    options:
      delimiter: "/"
      index: 1
  # PubSubSchema external projectRef (PROJECT_ID)
  - select:
      kind: PubSubSchema
    fieldPaths:
      - spec.projectRef.external

Here's another generic replacement, covering a different format for the GCP project ID (GCP_PROJECT_ID.iam.gserviceaccount.com)
(As a side note : breaking this down into several replacements is only necessary because options.delimiter is quite limited and doesn't support regexes. And of course, this would be a trivial task with unstructured edits ;) )

# _components/replacements/common/gcp_sa_domain.yaml
# This replacement fills in the GCP service account domain (e.g. development-1234567.iam.gserviceaccount.com).
# In most cases it only replaces the part after the @ sign with the GCP_SA_DOMAIN value from the ConfigMap, and keeps the service account name intact.
# The GCP_PROJECT_ID replacement is too generic for this purpose, because there can be only one delimiter and index per replacement target.
source:
  kind: ConfigMap
  name: replacements
  fieldPath: data.GCP_SA_DOMAIN
targets:
  # iam.gke.io/gcp-service-account annotations (service-account@domain)
  - select:
      kind: ServiceAccount
    fieldPaths:
      - metadata.annotations.[iam.gke.io/gcp-service-account]
    options:
      delimiter: "@"
      index: 1
  # IAMPolicyMember member field (service-account@domain)
  - select:
      kind: IAMPolicyMember
    fieldPaths:
      - spec.member
    options:
      delimiter: "@"
      index: 1

These generic replacement "recipes" are part of a common component, used by our service definitions through environment-specific wrapper components that add a Configmap with relevant source values for this environment, as referenced in the replacements :

# _components/replacements/development/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

configMapGenerator:
  - name: replacements
    literals:
      - "GCP_PROJECT_ID=development-1234567"
      - "GCP_SA_DOMAIN=development-1234567.iam.gserviceaccount.com"

components:
  - ../common

# Clean up the ConfigMap after applying replacements
patches:
- patch: |-
    $patch: delete
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: replacements

This environment-specific component is then imported in the corresponding service overlays, like this :

# services/myservice/overlays/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base

components:
  - ../../../_components/replacements/development

# [More definitions here]

Can you accomplish the motivating task without this feature, and if so, how?

Only very inelegantly or with much repetition, AFAICT :

  • I could keep the generic replacements component idea but target the resources more precisely using select/ reject (realistically, it would have to be done by name. But regex support is broken by the same change, making that even harder). Also, adding specifics about the callers' resources in a component is an ugly case of leaky abstraction.

  • I could stick to a generic replacement component that only applies the absolute lowest common denominator list of replacements that will work with each service that uses this component. But that means more service-specific replacements to be implemented, with a repetition of the source configMap (GCP Project ID in its different string incarnations) in the service overlays themselves, which is what we were trying to get away from in the first place. Also, the surface of lowest common denominators will shrink dramatically with each service we add, requiring the move of replacements previously used by other services in the services themselves. Sounds hellish.

  • I could give up on the idea of a generic "replacements" component, and apply replacements in each service. 90% of them would be repeated.

What other solutions have you considered?

  • Sticking to Kustomize <5.0, but our codebase requires this 5.1 behavior anyway.

  • Maintaining our own version of Kustomize with an ignoreMissing flag added in.

  • Pushing for the introduction of parameterized components, or some other similar sort of flexibility-enhancing Kustomize feature.

  • Living with duplication in our code base, as a result of not being able to implement flexible enough reusable Kustomize components.

  • Giving up on a 100% Kustomize-native solution and introducing something like envsubst as a pre-processing step.

  • Moving to Helm.

We haven't made a firm decision yet, but none of these options are appealing and I'd really like to give native Kustomize features a chance before we give up.

Anything else we should know?

My comment on #4789 explains the above perhaps more succinctly (apologies for the slightly passive-aggressive tone : written as I had just discovered this change which makes our solution fall apart.)

Both the consequences of this change and a flag that allows restoring the previous behavior have been discussed by several other users already in the comments of #4789 and #5128

Feature ownership

  • I am interested in contributing this feature myself! 🎉
@renaudguerin renaudguerin added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 13, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2024
@bagel-dawg
Copy link

Very sad to see basically our exact use-case for such an option go unheard.

Did you come up with a solution or workaround that you were happy with? This is holding up our upgrade of ArgoCD because they've moved on to kustomize >5.0.

@renaudguerin
Copy link
Author

renaudguerin commented Feb 29, 2024

Did you come up with a solution or workaround that you were happy with? This is holding up our upgrade of ArgoCD because they've moved on to kustomize >5.0.

Unfortunately, no.
We are slowly coming to the conclusion that Kustomize maintainers seem more interested in making a work of art and paragon of software purity, rather than a tool that is powerful enough to address moderately complex real world scenarios on its own.

Version after version, they unabashedly plug loopholes or "unintended behaviors" that users relied on for some much needed flexibility, and provide no credible alternative.

I've just read this issue again : I can't believe I had to jump through so many hoops in the first place (get creative with replacements, components, a ConfigMapGenerator that creates an ephemeral resource then a patch that deletes it), all for the modest goal of : replacing a friggin' GCP Project ID across manifests that are otherwise identical between overlays. And... they managed to break even that in 5.0.

Look, I know complexity often comes from stubbornly using a tool against its design philosophy.
I'd love to be told how I'm "holding it wrong" and how to fulfill the extremely common real world need described in this issue (patch a value across many resources wherever it is found, without having to explicit list each location) the "Kustomize way" without extra tooling.

Because what I'm not going to do is write a custom ArgoCD Config Management plugin to add a "non-structured search & replace" step before Kustomize (suddenly our manifests are no longer valid YAML), or a Kustomize Go plugin that I'll need to maintain and distribute across our systems, just so I can end up with a friggin' different GCP Project ID per environment in a DRY manner.

Such basic stuff needs to be native if Kustomize is to be used as a self-sufficient solution in any kind of non-trivial GitOps setup.

I'm genuinely open to the idea that I'm missing something : in search of answers I watched one of @KnVerey's presentations. I came to the conclusion that Kustomize is suitable for either trivial setups, or very large ones like the one she describes at Shopify, where it's one composable part of a pipeline together with automation generating the actual Kustomize manifests from a higher level app definitions.

But the use case of relying solely on Kustomize with developer-maintained DRY resource manifests in a moderately complex GitOps setup is not well catered for, and seems to be a blind spot for the maintainers. I'd rather not go back to Helm, but it seems to be the pragmatic choice in this situation. Any other suggestions very welcome...

@bagel-dawg
Copy link

bagel-dawg commented Feb 29, 2024

@renaudguerin Yeah, we're on the same page here.

Kustomize does have a massive gap in how last-mile cluster configuration is supposed to be achieved. Replacements was the closest we came to it and even then it had quirk that made it difficult to work with.

Right now my workaround is to continue using Kustomize 4.5.7 for applications that require replacements in ArgoCD.

@bagel-dawg
Copy link

bagel-dawg commented Mar 16, 2024

@renaudguerin I've done some tinkering in this area over the past couple weeks. I have now opted to use ArgoCD Vault Plugin as a psuedo-templating engine for last-mile configuration.

It allows you to put in well-known placeholder values in your manifests that the plugin can use to fetch the values from a secret store. In my case I'm using the Kubernetes Secret store alongside External Secrets to fetch from AWS SSM.

The best part, you don't even need to be using ArgoCD, you can pipe any kind of input to it and receive the applyable output.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 15, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024
@renaudguerin
Copy link
Author

/reopen

@renaudguerin
Copy link
Author

/remove-lifecycle rotten

@k8s-ci-robot
Copy link
Contributor

@renaudguerin: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot reopened this May 15, 2024
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants