Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gep 2694 support match by RegularExpression #2737

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rokkiter
Copy link

@rokkiter rokkiter commented Jan 26, 2024

What type of PR is this?

/kind gep
What this PR does / why we need it:
HTTPRoute support match by RegularExpression
Which issue(s) this PR fixes:

Fixes #
#2694
Does this PR introduce a user-facing change?:


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 26, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @rokkiter. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2024
@rokkiter rokkiter changed the title add gep 2694 Add gep 2694 support match by RegularExpression Jan 26, 2024
@rokkiter
Copy link
Author

rokkiter commented Jan 29, 2024

@youngnick #2177 (comment)
Regarding the cascading match issue, from the implementation of istio, it doesn't care about the match type, but the position where the match rule is located decides that the first match on will take effect, and the user can decide which match has higher priority on his own, instead of being limited to death by the rule, which is also more conducive to the user's understanding. And there is no concept of priority for complete and prefix matches that have been implemented.

regex capture groups does not seem to be a necessary feature. The same result for regex capture groups can be achieved with the regex syntax (which can be a bit tricky)

@youngnick
Copy link
Contributor

Thanks for getting this started @rokkiter, this is a great start.

For other reviewer's clarity, here's what I had in the original comment that kicked off this GEP and PR:


The biggest reason why this is not done yet is about regular expression compatibility. An GEP that addresses this will need also to address:

what flavors of regexes are supported across most if not all Gateway API implementations

  • what flavor of regex is chosen as the Gateway API flavor. Ideally we should only be choosing one.
  • why that particular flavor has been chosen. Note that any flavor that's not supported by all implementations will need some pretty significant justification here.
  • Since doing regex path rewrites means supporting a regex path match and then using regex matched details in the rewrite filter, this GEP also covers adding regex match support to the base matches stanza.

Other concerns I'd like to see addressed in this GEP that aren't regex compatibility:

  • How do regexes get sorted in a list of matches? What happens when a regex also matches a Prefix or Exact match? ( I suspect the answer should be "Exact beats Prefix beats Regex beats /", but we will need to specify this precisely). Also note that the discussion about cascading matches is very relevant here too (what happens if cascading matches are enabled? Will that be hard to reason about or produce surprising results?)
  • How do we also allow the use of regex capture groups inside rewrite filters? (That is, can you match on /api/{path} and then use $1 or something to represent the matched path in a rewrite filter? If so, how does that syntax work?).
  • Using regex path matching basically precludes also doing any sort of path-based route delegation (which we haven't discussed for a long time, but we have discussed previously). What should we do here?
  • As raised by @plnordquist, what's the support for using other data inside a regex match, like hostname, port, etc?

For the last two items, we don't need to answer those in the GEP, but if we aren't going to, we need to explicitly put them in as "reserved for future work", so the discussion doesn't get derailed. The regex flavor discussions has all the hallmarks of one that could be contentious to me, which is concerning.


I think that I wasn't clear enough in that statement about a few things:

  • I'd like to see some more discussion about the available implementations and which dataplane they use, along with which regexes are supported there. The section that's here has good info, but a table summary as in GEP-1742 would be great.
  • I think that the proposal is suggesting that we standardize on RE2 because it's supported by a number of implementations, and is a subset of PCRE, which is supported by other implementations. This is reasonable, but needs to be explicitly stated as part of the "API" section. Just put a subsection called "Proposal" in that section and outline the proposed regex and why. (the information in the Alternatives section can probably be moved here).
  • The "Alternatives" section is really intended for recording discussion of the things that we don't choose, so that we can answer the inevitable "but why not use this alternative" questions.

Probably the biggest question I have though is:

I don't think it's clear enough where the proposed struct will be used. To be clear, I think that regex support needs to be added to the path type of the matches section (that is, the HTTPPathMatch struct), so that a rule can say "please match against all requests that match this path regex".

Then, the rewrite and redirect filters need to have a specification for how they can use information that has been captured in a regex match (or, if they can only use the entire matched path).

To put this another way, I think that the match part of a regex should go in the path match section, and the replace part should be specified in the redirect or rewrite config. So we're splitting a standard PCRE-style request that looks like s|pattern|replacement| into a match config that matches pattern and a field in the rewrite filter that looks like replacement.

I also appreciate your comments above about priority and capture groups, those are comments that should also be included in the "API" section - which should describe why this feature should work this way.

On the specifics there:

  • Priority of matching is very important, both in a single HTTPRoute, and across multiple HTTPRoutes (since it's possible that multiple HTTPRoutes can be combined). We can't just say "it's the specified order" and leave it at that, because we don't actually say that the order matters anywhere. Implementations are actually expected to reorder matches to meet the spec. For example, if a HTTPRoute specifies a /ex Prefix match first, then a /exact match, by that ordering, the Exact match would not necessarily match (depending on the data plane). So it's expected that /exact will be matched before any prefix matching of /ex, which effectively reorders the matches. I think that Regex matches should go after both Exact and Prefix matches, since it's very easy to accidentally capture more than you intend with a Regex. I could be convinced otherwise by use cases or other argument though.
  • without regex capture groups in the match, then I don't see what the point of adding regex is at all for rewrites or redirects. Being able to do something like rewriting /{user}/{operation} in a path to /{operation}/{user} is very useful and is not possible without using a capture group in the match criteria. I think this may also have been confusing because I didn't specify that I think that the match and replace parts should be separate.

Sorry, that was more than I intended to write, but hopefully this gives you a bit of a TODO list. Interesteed to hear what other GEP reviewers have to say though.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rokkiter
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rokkiter
Copy link
Author

rokkiter commented Apr 1, 2024

Thanks so much for making so many very meaningful suggestions. I didn't make it obvious in the title and description that this gep targets HTTPPathModifier rather than all path matching, so I don't intend to cover HTTPPathMatch related changes in this gep.
Currently HTTPPathModifier does not support regexes at all, while HTTPPathMatch supports PathMatchRegularExpression.One thing to note here is that HTTPPathMatch supportsRegularExpressionType based on the implementation. If RE2's solution is acceptable, then we can synchronize the changes to HTTPPathMatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/gep PRs related to Gateway Enhancement Proposal(GEP) needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants