Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk prefix and suffix string collection matching. #4997

Merged
merged 1 commit into from Aug 18, 2022
Merged

Add bulk prefix and suffix string collection matching. #4997

merged 1 commit into from Aug 18, 2022

Conversation

cube2222
Copy link
Contributor

@cube2222 cube2222 commented Aug 10, 2022

Will add tests after discussing further in the issue.

Fixes #4994

ast/builtins.go Outdated Show resolved Hide resolved
Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having written a few "starts with any" functions in Rego in the past, this is a welcome addition 👍 Will sleep on the naming question 😄

topdown/strings.go Outdated Show resolved Hide resolved
ast/builtins.go Show resolved Hide resolved
topdown/strings.go Outdated Show resolved Hide resolved
topdown/strings.go Outdated Show resolved Hide resolved
@cube2222
Copy link
Contributor Author

@anderseknert Please take another look.

I've refactored, added tests, and added benchmarks showing the improvement.

We've also done an internal poll to choose a slightly better name which should be more intuitive than the original proposal (though it's still not a strong opinion, happy to change it if you have a better idea for it).

@anderseknert
Copy link
Member

anderseknert commented Aug 11, 2022

Looks like some good progress has been made here! 😃

I'm a little confused by the current names though - AnyStartsWithString and AnyEndsWithAny.. should they not both have WithAny in their names? 🤔 About the naming, I did sleep on that.. and I wonder if this could be a candidate for retrofitting into the existing startswith and endswith? We've seen this a few times in the past, where functions like object.get, glob.match and others have been enhanced with support for more argument types than they originally had in order to provide new functionality. I guess it would then look something like this:

startswith(string search, string base)       => true if search begins with base (current functionality)
startswith(string search, array|set bases)   => true if search begins with any base
startswith(array|set search, string base)    => true if any search begins with base
startswith(array|set search, array|set base) => true if any search begins with any base

This would not only solve the woes around naming :P but might be a logical place for this functionality? I don't know for certain yet, but I'd love to hear what others think of that.

Only one concern about the implementation — the unit tests would be better if moved into the "YAML suite" of tests found here. These tests have the benefit of testing not just the Go implementation of OPA, but any implementation, like Wasm, IR implementations, and so on.

Instructions for how to run individual tests may be found in the top level comment of this file.

@cube2222
Copy link
Contributor Author

@anderseknert

About the naming, I did sleep on that.. and I wonder if this could be a candidate for retrofitting into the existing startswith and endswith?

This crossed my mind as well and I like the idea. My only issue with it is that this would probably also require porting this functionality to those other implementations (like wasm), requiring much more work.

That said, if porting to wasm would indeed be required in such a case I would propose - in the scope of this PR - implementing it there in the naive way, without patricia trees. Would that be alright?

So overall +1.

@anderseknert
Copy link
Member

Looking great, Jakob 👍 @srenatus will be back next week I think, and I'd appreciate his thoughts on Wasm here. Might also be good to have someone else from the OPA team chip in wrt reusing the startswith/endswith builtins vs. adding new ones. All clear from me though :)

@cube2222
Copy link
Contributor Author

This PR should now be ready.

Sounds good @anderseknert! I'd appreciate you finding more people to chime in then and I'll be looking forward to @srenatus 's review. (Hey @srenatus btw! Long time no see)

@cube2222 cube2222 marked this pull request as ready for review August 11, 2022 16:50
@srenatus
Copy link
Contributor

Great contribution (Hi Jacob! 👋). I was about to write that this would be fine without the wasm implementation but it's already there! Awesome. I'll have a look right away.

@srenatus
Copy link
Contributor

Ok, read through this. It's a nice contribution, thank you. And of course, if you overload startswith/endswith, you do need to provide a wasm implementation.

I think there's another possibility: keep startswith/endswith as-is, but introduce a new builtin that works with strings, or collections of strings, but has a name that's less confusing. WDYT?

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Some inline comments 🙃 👇

topdown/strings.go Outdated Show resolved Hide resolved
topdown/strings.go Outdated Show resolved Hide resolved
wasm/src/strings.c Outdated Show resolved Hide resolved
@anderseknert
Copy link
Member

@srenatus that was actually what the PR originally did, see #4997 (comment) 😬 reverting that should be simple enough though I suppose..

@cube2222
Copy link
Contributor Author

cube2222 commented Aug 16, 2022

@srenatus Thanks for the review! I'll hold off with the refactoring until we reach a consensus regarding whether this should be a new function or added to the existing one.

As @anderseknert mentioned, the original PR was with a separate function. Naming it was quite hard though (any_startswith_any?) which was one of the reasons to overload startswith instead.

Overall, I think overloading startswith makes more sense, but it's no problem for me to move it out to a separate function if you think that's better. Please let me know what you think.

@srenatus
Copy link
Contributor

Overall, I think overloading startswith makes more sense, but it's no problem for me to move it out to a separate function if you think that's better. Please let me know what you think.

I think the "any" semantics of the haystack collection are not understandable from the name of the built-in... I mean, it's clear what is likely to happen with the array of prefixes -- matching all of them simultaneously is unlikely -- but for the other arg, it's not.

So I'd be in favour of a new name.

@cube2222
Copy link
Contributor Author

@srenatus Do you think the name any_startswith_any is good (enough)?

@cube2222
Copy link
Contributor Author

cube2222 commented Aug 17, 2022

Also @srenatus, do you think its signature should be ((array|set|string),(array|set|string)) (like now), or ((array|set), (array|set))? The former partly duplicates the standard startswith and the any suffix/prefix doesn't really fit the plain string, however, it is in practice more flexible and doesn't complicate the implementation too much.

@srenatus
Copy link
Contributor

startswith_crossproduct? 🤔 But let's get more opinions here before taking steps. With crossproduct, string/string args would still fit, I think.

@cube2222
Copy link
Contributor Author

cube2222 commented Aug 17, 2022

startswith_crossproduct

@srenatus I don't think that will be readable at all for people without an academic background (it's not a strong opinion though, the name is not too important of a matter to me). But yeah, let's get more opinions. I'd appreciate you finding more people to chime in.

@cube2222 cube2222 closed this Aug 17, 2022
@cube2222 cube2222 reopened this Aug 17, 2022
@srenatus
Copy link
Contributor

srenatus commented Aug 17, 2022

I've been discussing this with @anderseknert today, and we've come up with strings.any_prefix_match. So there's another contender. The rationale for this was: for the first argument, if the natural extension of startswith(base, seach) is to match all or any of the base strings isn't obvious; for prefixes, the second argument, it is: if you pass more than one prefix, you wouldn't ever want them all to match, so it must be any.

Also, it's namespaced, as new builtins should be.

@ashutosh-narkar, @philipaconrad what do you think about all that? 👀

@cube2222
Copy link
Contributor Author

@srenatus That sounds like the best to me so far. Would that still provide overloads for non-collections (so plain strings) in both arguments? I'd prefer that.

@anderseknert
Copy link
Member

@cube2222 yes, let's do that. It's a little silly to have two built-ins provide the same functionality, but silly ain't so bad :)

@anderseknert
Copy link
Member

(I suppose we could deprecate startswith/endswith, but given how prevalent they are, having them removed later would be quite challenging)

@srenatus
Copy link
Contributor

Btw when we go back to the "new builtin" code path, let's ensure we don't use the deprecated RegisterFunctionalBuiltin2, please 🔍

srenatus
srenatus previously approved these changes Aug 18, 2022
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for bearing with me back and forth on this one. It's a great first contribution!

I think icing on the cake would be test cases asserting eval time type check failures. But it's ok without. WDYT, @anderseknert?

@cube2222
Copy link
Contributor Author

@srenatus Adding them is not a problem, I'm still not completely done here.

@cube2222
Copy link
Contributor Author

Ok @srenatus @anderseknert please take another look.

srenatus
srenatus previously approved these changes Aug 18, 2022
@cube2222
Copy link
Contributor Author

@anderseknert Is there anything else I need to do before this can be merged?

@anderseknert
Copy link
Member

@cube2222 nothing I can think of except for rebasing on top of main, and squashing your commits :)

…nd suffix matching.

Signed-off-by: Jakub Martin <kubam@spacelift.io>
@anderseknert anderseknert merged commit e7130a8 into open-policy-agent:main Aug 18, 2022
@anderseknert
Copy link
Member

Great contribution! Thanks @cube2222 👍

@cube2222
Copy link
Contributor Author

Great, thanks for merging! And thanks a lot for working closely with me on this, providing quick feedback, and getting this merged! Cheers @anderseknert @srenatus

@srenatus
Copy link
Contributor

@cube2222 we've noticed that as it stands, there's no mention of this being better (performance-wise) than using the naive startswith(prefixes[_], haystack[_]) etc... would you mind adding a note to ast/builtins.go to indicate that? Otherwise I'll pick it up some day... 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optimized functions for matching an array of strings with an array of prefixes/suffixes.
4 participants