Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interaction between regular expressions and YAML path segments is unclear #266

Open
dhduvall opened this issue Nov 22, 2022 · 2 comments
Open

Comments

@dhduvall
Copy link

This bug has morphed a bit. I had originally tried to do --exclude-regexp metadata\.annotations\.checksum/.*, mistakenly thinking that the dots were part of a string that needed to be escaped in order not to match too many strings, and not the separators between path segments.

But that raises the question of how those two syntaxes interact. For instance, if I want to exclude any item whose path starts with metadata and ends with generation, how do I express that? It seems that metadata.*generation works, but possibly by accident? At least, I wouldn't know how to filter

metadata:
  something:
    something-else:
      generation: 5

distinctly from

metadata.something.something-else.generation: 5

or

metadataXsomethingXsomething-elseXgeneration: 5

There are a similar set of ambiguous patterns using / as the separator, too.

In tracking this down, I noticed that the tests are also missing examples which contain regular expression metacharacters (not that they would have caught this particular problem).

Ambiguity is probably fine 99% of the time, but even having that much documented would be really handy. Unfortunately, that seems like it's getting into manpage territory ....

I'll note with a crazy look in my eye that helm uses (AFAIK completely undocumented) vertical tab (\v) as a separator internally, so if you have, say, a datadog repo configured, which has datadog, datadog-crds, and datadog-operator charts, you can search for \vdatadog/datadog\v to find just the first. Perhaps it might make sense to treat \v as a segment separator here, too, assuming it's not possible to use that character in a YAML key.

@vladimir-avinkin
Copy link

I don't quite understand your question and what exactly do you mean by ambiguity, but
dyff applies regexps to a go-patch path string. At least i think so

if filterPath != nil && regexp.MatchString(filterPath.String()) {

(Path.String() returns the string with go-patch style separators)
So in your example the correct regexp would be ^/metadata/.+/generation$

The exact behavior should probably be documented somewhere. Unless I'm missing something you literally can't use this feature correctly without perusing the source code

@dhduvall
Copy link
Author

So in your example the correct regexp would be ^/metadata/.+/generation$

Yep, that'll work; thanks for helping get me to that point. It should be good enough for my use, since I'm not expecting to run into any ambiguous paths.

The exact behavior should probably be documented somewhere. Unless I'm missing something you literally can't use this feature correctly without perusing the source code

Agreed.

The ambiguity I was talking about is when you have a path which is collapsed into a string with some separator (dots, slashes, it doesn't matter; I spend more time with jq and yq, so I'm used to dots as separators, but also dyff output uses dots by default, so I'd expect that as input by default, too) and a single key that is the same as that path.

Here's a concrete example:

metadata:
  annotations:
    prometheus.io/scrape: true
    prometheus.io:
      scrape: false

(Ignoring the fact that this isn't a valid Kubernetes manifest.) Does metadata/annotations/prometheus.io/scrape, reference the true value or the false value? It looks like it references both (pardon the zsh-ism):

dyff between --exclude-regexp 'annotations/prometheus.io/scrape' \
  =(jq -n '.annotations["prometheus.io/scrape"] |= true | .annotations["prometheus.io"].scrape |= true') \
  =(jq -n '.annotations["prometheus.io/scrape"] |= false | .annotations["prometheus.io"].scrape |= false')

returns no differences. I think I would expect the expression to refer to the false value, but then there needs to be an escaping mechanism for the slash in order to get to the true value.

I had trouble figuring out what "go-patch" means, though. Google got me to RFC6902 (JSON Patch), which relies on RFC6901 (JSON Path), which uses ~1 to represent /, but I tried annotations/prometheus.io~1scrape, and dyff reported both differences.

It looks like dyff uses ytbx to do this processing, and that has no reference to ~ in its source, and doesn't even pretend to implement RFC6901 or RFC6902. This is a touch surprising, since the README explicitly references https://github.com/cppforlife/go-patch (aha, there's "go-patch"), but then never actually uses it. go-patch does implement the RFC6901 escaping mechanism, though it's careful to tell us that it's only "roughly based on" RFC6902. The README does say "output", and not "input", but since there's no mention of input format, maybe someone who read the README would assume it was the same as output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants