Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kustomize 4.5.3+ breaks remote resources #4559

Closed
rumstead opened this issue Mar 30, 2022 · 23 comments · Fixed by #4605
Closed

Kustomize 4.5.3+ breaks remote resources #4559

rumstead opened this issue Mar 30, 2022 · 23 comments · Fixed by #4605
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@rumstead
Copy link
Contributor

rumstead commented Mar 30, 2022

Describe the bug

I am having a hard time understanding what is going on because some remote bases work and others don't. I can consistently reproduce the problem with one of our repos but unsure how to datamine more information about what is going on. I tried adding ssh and curl verbose flags but it isn't getting to that point.

$ kustomize version                                                                                                                                                                                                              
{Version:kustomize/v4.5.4 GitCommit:cf3a452ddd6f83945d39d582243b8592ec627ae3 BuildDate:2022-03-28T23:06:20Z GoOs:darwin GoArch:amd64}
$ kustomize build <my path with many remote bases> --enable-helm                                                                                                                                                                    
Error: accumulating resources: accumulating resources from 'https://dev.azure.com/repo': MalformedYAMLError: yaml: line 27: mapping values are not allowed in this context in File: https://dev.azure.com/<repo>

Yet the same repo works fine with kustomize 4.5.2

$ ~/Downloads/kustomize version                                                                                                                                                                                                  
{Version:kustomize/v4.5.2 GitCommit:9091919699baf1c5a5bf71b32ca73a993e98088b BuildDate:2022-02-09T23:26:42Z GoOs:darwin GoArch:amd64}
$ ~/Downloads/kustomize build <my path with many remote bases> --enable-helm | wc -l                                                                                                                                                                            
    6840

Expected output

The same output as 4.5.2, no errors and the manifests rendered.

Actual output

Error message complaining about mapping contexts.

Kustomize version

$ kustomize version                                                                                                                                                                                                              
{Version:kustomize/v4.5.4 GitCommit:cf3a452ddd6f83945d39d582243b8592ec627ae3 BuildDate:2022-03-28T23:06:20Z GoOs:darwin GoArch:amd64}

Platform

Mac

@rumstead rumstead added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 30, 2022
@TheDoubleJo
Copy link

Same issue here.

The remote resource is reached with: (hosted on Gitlab)

resources:
  - https://mydomain.com/infrastructure/kubernetes/kustomize_bff.git//base?ref=master

And for information, the remote resource looks like that:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- configmap.yaml
- deployment.yaml
- hpa.yaml
- service.yaml
- stripprefix.yaml
- ingressRoute.yaml

configurations:
- nameReferenceConfiguration.yaml

@artbegolli
Copy link

Seeing the same issue with our remote repos

@alfredomagallon
Copy link

Same here

@MarcoPolo
Copy link

This issue may also be related: #4455

@MarcoPolo
Copy link

A workaround is to use ssh (e.g. git@github.com:...) or prefixing git:: to https urls, (e.g. git::https://github.com/user/repo.git?ref=...)

@rumstead
Copy link
Contributor Author

rumstead commented Apr 6, 2022

Thanks for the link but I don't think they are fixed by that PR.

Using git:: or ssh does not work for me and the error messages are different. Additionally, they mention in the issue that it breaks in 4.5.*. That isn't the case, I see the error after 4.5.3, 4.5.2 works fine.

❯ kustomize build <my path with https remote bases> | wc -l                                                                                                                                       
    1464
❯ kustomize version                                                                                                                                                                  
{Version:kustomize/v4.5.2 GitCommit:9091919699baf1c5a5bf71b32ca73a993e98088b BuildDate:2022-02-09T23:26:42Z GoOs:darwin GoArch:amd64}

vs

❯ ~/Downloads/kustomize-4.5.3/kustomize build <my path with https remote bases>
Error: accumulating resources: accumulating resources from 'https://dev.azure.com/...?ref=0.8.5': MalformedYAMLError: yaml: line 27: mapping values are not allowed in this context in File: https://dev.azure.com/...?ref=0.8.5
❯ ~/Downloads/kustomize-4.5.3/kustomize version                                                                                                                                         
{Version:kustomize/v4.5.3 GitCommit:de6b9784912a5c1df309e6ae9152b962be4eba47 BuildDate:2022-03-24T20:51:20Z GoOs:darwin GoArch:amd64}

@masih
Copy link
Contributor

masih commented Apr 7, 2022

Looks like HTTPS remote source specified in example.com/org/repo?ref=v1.2.3 instead of https://example.com/org/repo.git?ref=v1.2.3 also does the trick while using HTTPS as the git clone transport protocol.

@KnVerey
Copy link
Contributor

KnVerey commented Apr 13, 2022

/triage needs-information

Would one of you please be able to provide a public sample we could use to reproduce?

And could you please confirm whether kustomize 4.5.3 is able to successfully build the Kustomization if you clone it locally yourself? The fact that the error is about a specific line in a remote file is curious.

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 13, 2022
@masih
Copy link
Contributor

masih commented Apr 13, 2022

@KnVerey to reproduce the issue:

kustomization.yaml content:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://github.com/prometheus-operator/kube-prometheus.git?ref=v0.9.0

Running build:

$ kustomize build .
Error: accumulating resources: accumulating resources from 'https://github.com/prometheus-operator/kube-prometheus.git?ref=v0.9.0': MalformedYAMLError: yaml: line 25: mapping values are not allowed in this context in File: https://github.com/prometheus-operator/kube-prometheus.git?ref=v0.9.0

Version:

$ kustomize version
{Version:kustomize/v4.5.4 GitCommit:cf3a452ddd6f83945d39d582243b8592ec627ae3 BuildDate:2022-03-28T23:06:20Z GoOs:darwin GoArch:amd64}

In the example above, changing the remote build URL to the following seems to work: github.com/prometheus-operator/kube-prometheus?ref=v0.9.0

@natasha41575
Copy link
Contributor

@m-Bilal would you be able to take a look? I'm curious if this is related to your MalformedYamlError change

@m-Bilal
Copy link
Member

m-Bilal commented Apr 13, 2022

Sure I'll into this.

/assign

@natasha41575
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 13, 2022
@KnVerey
Copy link
Contributor

KnVerey commented Apr 13, 2022

FYI @m-Bilal I was able to confirm that building the https://github.com/prometheus-operator/kube-prometheus example given does not work when referenced remotely and does work if you clone it and build.

@m-Bilal
Copy link
Member

m-Bilal commented Apr 17, 2022

The error occurs in this section:

func (r *ByteReader) decode(originalYAML string, index int, decoder *yaml.Decoder) (*yaml.RNode, error) {
node := &yaml.Node{}
err := decoder.Decode(node)
if err == io.EOF {
return nil, io.EOF
}
if err != nil {
return nil, errors.WrapPrefixf(err, "MalformedYAMLError")
}

err := decoder.Decode(node) returns an error yaml: line 25: mapping values are not allowed in this context. Interestingly, it does not return any error when remote resources is prefixed with git:: (So I think it is more of a problem with the Decoder we're using?).

How should I go about this? Try fixing Decoder in our forked version (once I have the issue cleared out) or should it be an issue in go-yaml/yaml?

Also, the idea behind introducing MalformedYAMLError here was that if there's an error when trying to decode a yaml file, it very likely is because the file is malformed.

@m-Bilal
Copy link
Member

m-Bilal commented Apr 18, 2022

Looked at this a little more. The problem is that when resources is something like https://github.com/prometheus-operator/kube-prometheus.git?ref=v0.9.0, the first file that is passed to kyaml/kio/byteio_reader.go decode() in originalYAML parameter is a HTML file while when the resources https link is prefixed with git::, the first file is correctly a YAML file.

The below linked files are the log outputs of kustomize, and you can check for the first log with the prefix kyaml/kio/byteio_reader/decode() original yaml: (that should be the first 3rd or 5th line). You'll see how the response differs.
https resources (results in error)
git::https resources (working fine)

The problem is that in previous releases we were allowing problems with YAML files to go through while this was changed in #4497 here:

if errF := kt.accumulateFile(ra, path); errF != nil {
// not much we can do if the error is an HTTP error so we bail out
if errors.Is(errF, load.ErrorHTTP) {
return nil, errF
}
if kusterr.IsMalformedYAMLError(errF) { // Some error occurred while tyring to decode YAML file
return nil, errF
}

The return nil, errF (lines 414-416) is causing Kustomize to throw an error for the HTML file present in the response.
If we revert back to allowing YAML error to pass through, we'd have #3812 all over again. How should I go about fixing this? The way I see it is that HTML (or any other non-YAML file) should not be passed to kyaml/kio/byteio_reader.go decode(). So one way could be to have some sort of a filter that only allows YAML files to go through to decode() or a filter in decode() itself that only allows for parsing of YAML files. But I also think that silently ignoring non-YAML files passed to Kustomize could be an issue in itself in that when it is an actual problem with the files, we're not raising that error and notifying the user about it.

@natasha41575
Copy link
Contributor

natasha41575 commented Apr 18, 2022

I think we should move these lines:

// not much we can do if the error is an HTTP error so we bail out
if errors.Is(errF, load.ErrorHTTP) {
return nil, errF
}
if kusterr.IsMalformedYAMLError(errF) { // Some error occurred while tyring to decode YAML file
return nil, errF
}

down to inside these two if statements:

if err != nil {
return nil, errors.Wrapf(
err, "accumulation err='%s'", errF.Error())

if err != nil {
return nil, errors.Wrapf(
err, "accumulation err='%s'", errF.Error())

This is what I suggested in the PR here #4334 (comment) so that we don't break existing workflows, and considering that this is the second regression caused by adding an error check, I'm more convinced that my suggestion in that PR is the safer way to go. It should look like:

if err != nil {
 	if errors.Is(errF, load.ErrorHTTP) { 
 		return nil, errF 
 	} 
 	if kusterr.IsMalformedYAMLError(errF) { // Some error occurred while tyring to decode YAML file 
 		return nil, errF 
 	} 
 	return nil, errors.Wrapf(
 		err, "accumulation err='%s'", errF.Error())
}

This way, we are only improving the error message; we won't be throwing any errors where we weren't previously already throwing an error.

@m-Bilal is that clear to you? Would you be able to make that change and verify that #3812 would still be solved?

At some point we should also build up a larger suite of tests with different remote URLs so that we can be more confident about changes like this, but until we have that IMO we should err on the safer side.

@natasha41575
Copy link
Contributor

Btw, thank you @m-Bilal for the thorough investigation and narrowing down what caused the problem. It's super helpful to have you take some time to work on kustomize!

@rumstead
Copy link
Contributor Author

At some point we should also build up a larger suite of tests with different remote URLs so that we can be more confident about changes like this, but until we have that IMO we should err on the safer side.

Hard agree! Breaking changes in patch versions makes it hard to have confidence in upgrades.

@m-Bilal
Copy link
Member

m-Bilal commented Apr 18, 2022

Sure @natasha41575, will submit a PR in a few days. And thanks!! :D

@natasha41575 natasha41575 added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 18, 2022
@natasha41575
Copy link
Contributor

I have verified that the kustomization file provided by #4559 (comment) no longer throws an error, so this issue will no longer be there in the next release.

@alexmeise
Copy link

alexmeise commented May 25, 2022

what do we do until next release? downgrade? any workaround?

@natasha41575
Copy link
Contributor

There was a release done last week.

@x45dev
Copy link

x45dev commented Sep 12, 2023

For anyone encountering this issue - even in late 2023 - it's probably because you're using a version of ArgoCD that includes an old Kustomize version.
As mentioned above, the one workaround is to use SSH rather than HTTPS.
If you really need to use HTTPS, I've found that the underlying problem with particular repos is something to do with LFS! (???) The workaround is to remove LFS entries in the .gitattributes file... HTTPS works after that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.