Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration generates invalid yaml for multiline and object content #179

Closed
dee0 opened this issue Apr 20, 2024 · 22 comments · Fixed by open-component-model/ocm#734
Closed

Comments

@dee0
Copy link

dee0 commented Apr 20, 2024

What happened:

My release failed with
ocm-system dmi-broker 10h False Could not load chart: cannot load values.yaml: error converting YAML to JSON: yaml: line 23: could not find expected ':'

Each of my dictionary and multiline string values were improperly indented. In the screenshot below you can see on the right hand side my file after Configuration was applied. Note that certs.rs.crt is improperly indented. On the left is a copy that I fixed 'by hand' and which is parsable.

image

The values.yaml in the dir resource in my component version contains

certs:
 intermediate: unset
 root: unset
 ingress:
  crt: unset
  key: unset
 rs:
  intermediate: unset
  root: unset
  crt: unset
  key: unset

My config.yaml in my component version, that is my config for the ocm controller, contains
defaults

configuration:
  defaults:
    certs:
      rs:
        crt: null

rules

  - value: (( certs.rs.crt ))
    file: dmi-broker/values.yaml
    path: certs.rs.crt

Running the command
kubectl get -n broker configmap dmi-broker -o=yaml | yq '.data.config' | yq -C | less -iR
to look at the content of my configmap that is used in my cluster I see the following

image

Image below shows the 'pipeline' I have defined for the ocm-controller
image

I believe this is an ocm-controller bug because

  • values.yaml in the Localization Snapshot is valid, and
  • data in the ConfigMap is valid, but
  • values.yaml in the Configuration Snapshot is not valid yaml

What you expected to happen:

I expected the release to work of course :)

How to reproduce it (as minimally and precisely as possible):

I believe you just need to have a rule that is setting a multiline string value or an object value.

Anything else we need to know:

Not that I can think of

Environment:

My ocm-controller was built from this hash of this fork
https://github.com/dee0sap/ocm-controller/tree/228d0be45540590ea51b712d3fb21d2c2082ef72
because I need my quick fix for #68

@dee0 dee0 added kind/bug Bug kind/task General Task labels Apr 20, 2024
@dee0sap
Copy link

dee0sap commented Apr 20, 2024

I was able to add to the ocm project a test case which I believe reproduced the problem. You can see what I added in this PR open-component-model/ocm#734

@dee0sap
Copy link

dee0sap commented Apr 20, 2024

Think this may be a goccy/go-yaml bug. Either that or ocm is using replace incorrectly.

https://go.dev/play/p/rfnbY5rbSk9

@morri-son morri-son added the component/ocm-controllers OCM Controllers label Apr 22, 2024
@morri-son
Copy link
Contributor

adding @Skarlso

@Skarlso
Copy link
Contributor

Skarlso commented Apr 22, 2024

Yeah, we are just using whateverm OCM does :)

@Skarlso Skarlso self-assigned this Apr 24, 2024
@Skarlso
Copy link
Contributor

Skarlso commented Apr 24, 2024

Uwe found the issue in the corresponding YAML library. goccy/go-yaml#447

@morri-son
Copy link
Contributor

@mandelsoft created goccy/go-yaml#447, but since @Skarlso found that the same issue has been reported already two years ago without any fix, most-likely we need to find a solution on our own. @mandelsoft mentioned that he may know an older lib which might act as basis for a fix.

@morri-son
Copy link
Contributor

@dee0 confirmed that with using a certain format as input, the lib produces the wanted output. We'll close this issue now and open a new one, checking for a lib that can handle things better.

@dee0
Copy link
Author

dee0 commented Apr 27, 2024

Hey @morri-son
Actually I haven't confirmed anything yet.

I just tried a new ocm-controller build with @Skarlso's 'don't marshal primitives' change and I think the ocm-controller is still unusable for me. This because object data is not stored correctly by the configuration.

Object data is still yielding invalid yaml. e.g. I get

  orca_env_stable_values:
 certificate_authority_url: http://example1.com 
 deployment: deveaws
 deployment_size: xsmall
 domain: example2.com 
 landscape_region: eu12
 org: deveaws
 service_hostname_suffix: .example3.com 
 service_kubernetes_hostname_suffix: .example4.com 

but it needs to be

  orca_env_stable_values:
    certificate_authority_url: http://example1.com 
    deployment: deveaws
    deployment_size: xsmall
    domain: example2.com 
    landscape_region: eu12
    org: deveaws
    service_hostname_suffix: .example3.com 
    service_kubernetes_hostname_suffix: .example4.com 

I have not tried the 'formatting the text the right way' trick for the multiline string data yet. And I don't know how I can reasonably do that. In the 'last mile config' we won't know how much indenting is required and in spiff++ I am not seeing a good way to determine and then apply the required amount of indenting.

What I maybe able to do is to use spiff++'s 'asjson' function to double quote the multiline string. I'll try to give that a shot this weekend.

@Skarlso
Copy link
Contributor

Skarlso commented Apr 28, 2024

Uhh. Probably our best option is to switch to yq lib. But it's going to be disruptive.

@dee0
Copy link
Author

dee0 commented Apr 28, 2024

Hey @Skarlso

Actually it seems like it is pretty easy. At least if the change is limited to subst.go.

While I have not pushed it up to github yet, I made the change in subst.go last night. While some of the tests in subst_test.go failed it looked like it was just due to formatting differences in the output.

I plan on taking a closer look at the failures today. Hopefully I'll be able to resolve them and I'll push my change up to my fork.

Oh, and I think as part of this I want to make sure the tests are covering

  • My object case that is failing ( need to add something for that )
  • Updating with yaml sequences, maps, each type of string, at least one non-string scalar and null
  • Making multiple updates to the same yaml document

Regarding these tesst additions my thinking is

  • if the first two bullets had been done then we wouldn't be having this problem now.
  • In my change SubstitutionTarget now has a yqlib CandidateNode instead of an ast.File. This and the structure of the yqlib API make me think it is necessary to confirm multiple updates are having the desired effect

@Skarlso
Copy link
Contributor

Skarlso commented Apr 28, 2024

I didn't say it's hard. It's more like tedious. Also a completely breaking change. I did implement it using yq. I was able to throw out most of the code. The only missing feature was the struct based thing.

The braking change is that the path must start with '.' And values need to be quoted. Which might break some other things and will definitely break existing tests and demos and component versions.

@morri-son
Copy link
Contributor

adding @hilmarf and @fabianburth to follow the discussion. We can plan to switch to another lib, but as Gergely mentioned it will be both, requiring changes on implementation as well after that breaking change, on the consumer side...

@Skarlso
Copy link
Contributor

Skarlso commented Apr 28, 2024

I even included a simple example on how to use yqlib on the yq repository. mikefarah/yq#2021
You don't necesserily need a CandidateNode.

@dee0
Copy link
Author

dee0 commented Apr 29, 2024

Here https://github.com/open-component-model/ocm/actions/runs/8873092383/job/24358384683?pr=734
I have pushed the changed for switching to yqlib.

The path doesn't need to start with a '.'. Consider, with goccy the path needed to start with '$.' and the substitution code took care of this under the covers. So in similar fashion the code I pushed is adding the required '.' under the covers. So that isn't a breaking change.

Unless consumers are doing something silly like depending on the precise formatting of the output YAML or JSON code they shouldn't care about this change.

And on that note, I switched the tests to use MatchYAML and MatchJSON instead of Equals on strings. Testing for string equality

  • Makes the tests fragile as the slightest change to to formatting of the yaml or json breaks the test. e.g. { "foo": "bar" } won't match { "foo": "bar" }
  • Allows the test to pass when the output is neither valid yaml nor valid json.

Regarding the switch to CandidateNode in the SubstitutionTarget implementation. This is necessary to efficiently handle multiple updates to the same target.

While I have pushed the changes up and all the existing tests are passing, things I would still like to do before considering this ready for handover to anyone are

  • Make sure that the test cases I mentioned here are in place
  • Fix the lint error I see reported
  • Perform a 'sanity check' review ( I pushed the code up as soon the tests passed but before doing this )

@Skarlso
Copy link
Contributor

Skarlso commented Apr 29, 2024

The path doesn't need to start with a '.'. Consider, with goccy the path needed to start with '$.' and the substitution code took care of this under the covers. So in similar fashion the code I pushed is adding the required '.' under the covers. So that isn't a breaking change.

Yes, you can do that in the background, but I would rather not do any "fixing of the yaml path" which is invisible to the user. That just opens up a bunch of problems when trying to debug failures. Any hidden modification of the yaml path is a problem. I would rather be up-front about what is required to make things work.

Also regarding your implementation, that's almost the same as I wrote, with the expectation that I didn't bother with all the tree parsing. The default all parser can handle all of that and can be re-run as in the example I linked in.

@morri-son
Copy link
Contributor

ok folks, then we keep this issue open until we really figured out which way we plan to go in the future, using doccy or anything else, e.g. yg. Once the discussion has finalised, we'll spin up a new issue with the exact steps to be done.

@Skarlso
Copy link
Contributor

Skarlso commented Apr 29, 2024

yah, I think this issue is fine. Dan has some great ideas and we can spit-ball in this issue. :)

@dee0
Copy link
Author

dee0 commented Apr 29, 2024

Thanks :)

As for making debugging easier,

  • In the log message(s) make it clear what the source of the configuration value is, what the target file is and what the yq expression used is. Have the message clearly indicate it a yq expression
  • Provide tooling that lets me extract the source and the target file to disk

Then a can run the yq cli myself to validate whatever substitutions are taking place

Fwiw, this is pretty much what I was doing at the start of this except I was using go playground because I didn't goccy doesn't appear to have a cli.

Actually, elaborating a bit more on what I was doing

  • I have an Ingress that I add to my cluster and which exposes the registry. This enabled the next step.
  • I have a set of shell scripts that use crane to fetch the snapshot contents for a Resource, Localization or Configuration.
  • After pulling down the images as tgz I would untar the biggest tar file they contained, assuming it contained my helm chart resource
  • Then I would pass the values.yaml to the yq cli to figure out which file was invalid. I used the yq cli to make sure my paths were correct. ( Adding the required '.' :) )

So I didn't need to maintain my own ingress and extraction scripts and if there log messages that told me what I need to extract and what the expression used was I think that would go a long way to making it so that problems were easier to diagnose.

@dee0
Copy link
Author

dee0 commented Apr 30, 2024

This morning I fixed the lint error however I see there is a test failure. That test doesn't fail locally.

This evening I added all the tests I said I wanted to add.

And I confirmed that ocm-controller build with the changes in my ocm PR works. That is, the service I was trying to onboard to ocm+ocm-controller+flux deployment actually was deployed. :)

I hope to have time tomorrow evening to look into that failing test + do a sanity check of the changes in the PR.

Oh, and at least when I am running the tests within vs code yqlib is very chatty. I'll check if that is the case during normal execution and if so look into reducing its verbosity.

@Skarlso
Copy link
Contributor

Skarlso commented Apr 30, 2024

yqlib is very chatty

There is a logger that it uses that you can set to debug mode at the begin to display it.

@Skarlso
Copy link
Contributor

Skarlso commented Apr 30, 2024

// GetLogger returns the yq logger instance.
func GetLogger() *logging.Logger {
	return log
}

This is the thing that you need to call and set it to debug.

@Skarlso Skarlso assigned dee0 and unassigned Skarlso Apr 30, 2024
@dee0
Copy link
Author

dee0 commented May 1, 2024

At least within the context of this ticket, I think I have made all the changes I would like to see in OCM.

This evening I

  • Added a test case that validates that if destination is yaml style then any substitution that is added doesn't change that ( and fixed the code since it was failign )
  • Reduced the yqlib verbosity
  • Addressed the test failure from last night ( by refreshing from main branch )

So from my perspective what is left is

  • I need to perform a sanity check of what is in the PR
  • Need to handle any feedback + address any bureaucratic things

Skarlso added a commit to open-component-model/ocm that referenced this issue May 3, 2024
## Description

Fixes open-component-model/ocm-project#179


## What type of PR is this? (check all applicable)

- [ ] 🍕 Feature
- [x ] 🐛 Bug Fix
- [ ] 📝 Documentation Update
- [ ] 🎨 Style
- [ ] 🧑‍💻 Code Refactor
- [ ] 🔥 Performance Improvements
- [x ] ✅ Test
- [ ] 🤖 Build
- [ ] 🔁 CI
- [ ] 📦 Chore (Release)
- [ ] ⏩ Revert

## Related Tickets & Documents

<!-- 
Please use this format link issue numbers: Fixes #123

https://docs.github.com/en/free-pro-team@latest/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->
- Related Issue # 179
- Closes # (179)
- Fixes # (179)
> Remove if not applicable

## Screenshots

<!-- Visual changes require screenshots -->


## Added tests?

- [x ] 👍 yes
- [ ] 🙅 no, because they aren't needed
- [ ] 🙋 no, because I need help
- [ ] Separate ticket for tests # (issue/pr)

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. Please also list any relevant details
for your test configuration


## Added to documentation?

- [ ] 📜 README.md
- [ x] 🙅 no documentation needed

## Checklist:

- [x ] My code follows the style guidelines of this project
- [ x] I have performed a self-review of my code
- [ x] I have commented my code, particularly in hard-to-understand
areas
- [ x] I have made corresponding changes to the documentation
- [ x] My changes generate no new warnings
- [ x] I have added tests that prove my fix is effective or that my
feature works
- [ x] New and existing unit tests pass locally with my changes
- [ x] Any dependent changes have been merged and published in
downstream modules

---------

Co-authored-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants