Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for exclusions #22

Merged
merged 4 commits into from Jun 1, 2020
Merged

Add support for exclusions #22

merged 4 commits into from Jun 1, 2020

Conversation

jalaziz
Copy link
Contributor

@jalaziz jalaziz commented Sep 14, 2019

Paths can be negated to stop searching through the remaining patterns in
a the glob list. All changed files tested and at least one matching file
will result in a label being added.

Fixes #9

@jalaziz
Copy link
Contributor Author

jalaziz commented Oct 3, 2019

@damccorm Just a friendly ping on this PR.

@bryanmacfarlane
Copy link
Member

Can we add unit tests?

@jalaziz
Copy link
Contributor Author

jalaziz commented Oct 3, 2019

I originally intended to, but there are no unit tests as far as I could tell. It didn't seem like this PR was the appropriate place to bootstrap unit tests for the project.

@bryanmacfarlane
Copy link
Member

Yeah, you're correct. I glanced at the test folder but it has a todo ;) . https://github.com/actions/labeler/blob/master/__tests__/main.test.ts

@jalaziz
Copy link
Contributor Author

jalaziz commented Oct 3, 2019

Yup, same thing I saw.

I could still give it a shot though, but fair warning, I'm fairly new to TS and have never used Jest.

@TimonVS
Copy link

TimonVS commented Oct 4, 2019

@jalaziz I wrote tests for a similar Action which might give you some inspiration: https://github.com/TimonVS/pr-labeler-action/pull/16/files#diff-74e406fb3eb7c98388edeb1b1c5dcf08.
I'm happy to help out wherever possible :)

@ofek
Copy link

ofek commented Dec 31, 2019

Can this be merged? 😄

@bryanmacfarlane
Copy link
Member

@dakale to have a look after the new year.

@dakale
Copy link
Contributor

dakale commented Jan 2, 2020

Im not sure the logic for short circuiting exclusion matches makes sense. With this, they are only useful if you list them first, eg, if I have simply have:

label:
- src/*
- !/src/some_file

Imo, I expect a PR that changes files in src/ but also changes src/some_file, to not match this label. I would read this as "any changes to src/ but not to src/some_file"

Previously, returning true makes sense because you only needed at least one positive match, but with exclusions, I would think you want at least one positive match AND no exclusions

I do think we can continue short circuiting when an exclusion is met, because the logic for matching a label is at least one positive AND no exclusions, so no positive matches for globs after the exclusion would ever need to be considered. The only difference would be instead of returning true as soon as a file matches a glob, we would defer returning true until after all globs for that file have been considered and no exclusions were met which matches my above example

@jalaziz
Copy link
Contributor Author

jalaziz commented Jan 2, 2020

Im not sure the logic for short circuiting exclusion matches makes sense. With this, they are only useful if you list them first, eg, if I have simply have:

label:
- src/*
- !/src/some_file

Imo, I expect a PR that changes files in src/ but also changes src/some_file, to not match this label. I would read this as "any changes to src/ but not to src/some_file"

Previously, returning true makes sense because you only needed at least one positive match, but with exclusions, I would think you want at least one positive match AND no exclusions

I do think we can continue short circuiting when an exclusion is met, because the logic for matching a label is at least one positive AND no exclusions, so no positive matches for globs after the exclusion would ever need to be considered. The only difference would be instead of returning true as soon as a file matches a glob, we would defer returning true until after all globs for that file have been considered and no exclusions were met which matches my above example

Please see the discussion in #9. I had concerns about order being significant, but @damccorm suggested it should be fine.

@dakale
Copy link
Contributor

dakale commented Jan 6, 2020

Im still not convinced that order should matter, given that no specific example was brought up in that conversation that actually requires ordering

Its not described, but I think what this follow example:

label2:
- "!example2/foo/bar"
- "example2/foo/**/*"
- "!example2/**/*"
- "**/*"

is expressing, is "None of the files in example2//*, except example2/foo//*", (which is the opposite of how I originally described the logic: any files in... except). This to me, isnt useful because its equivalent to just listing the match with no exclusions, because if you know what file you want to match, you can express that directly, instead of indirectly/inversely by subtracting all non-matches from the set of all possible files

In fact, I think the above example could be rewritten as:

label2:
- "**/*"
- "!example2/foo/bar"
- "example2/foo/**/*"
- "!example2/**/*"

without any loss of meaning (assuming the app logic is updated to not care about order), without impacting the actual logic of what is happening. You could put them in any order you want actually, it expresses the same thing, which is that there is at least one file in the PR that matches the label and isnt excluded. If you start with '**/*` and then start removing files via exclusions, you arrive at the same result (note this requires an implementation change to not short circuit at the first positive match)

Hopefully that makes sense and is actually correct. Basically what Im trying to boil it down to is a simpler system that can have its logic expressed simply: If a file matches any globs in a label, and is not excluded, the label is applied. This doesnt change any existing behavior, because "any match" is indistinguishable from "first match" when you dont have exclusions, only the implementation changes

Id be happy to merge this if its implemented as described above, or to continue this discussion

@chabou
Copy link

chabou commented Apr 26, 2020

@jalaziz do you need some help to update your PR to match @dakale specs?
I can make a new PR to address this if you don't have any free time.

@jalaziz
Copy link
Contributor Author

jalaziz commented Apr 26, 2020

@jalaziz do you need some help to update your PR to match @dakale specs?
I can make a new PR to address this if you don't have any free time.

@chabou sorry, been a bit busy, but looking at fixing this soon.

without any loss of meaning (assuming the app logic is updated to not care about order), without impacting the actual logic of what is happening. You could put them in any order you want actually, it expresses the same thing, which is that there is at least one file in the PR that matches the label and isnt excluded. If you start with '**/*` and then start removing files via exclusions, you arrive at the same result (note this requires an implementation change to not short circuit at the first positive match)

@dakale I'm trying to recall all the context around ordering, but I believe the reason why I was suggested order would matter was because of the statement from #9 (comment) of "Probably, we'd want to evaluate sequentially until we find a pattern that matches. So that way we could even do something like:"

As for the example from #9, I think precedence matters.

Also, your interpretation missed something:

label2:
- "!example2/foo/bar"
- "example2/foo/**/*"
- "!example2/**/*"
- "**/*"

Should translate to "match all files, except example2/**, but include example2/foo/** except example2/foo/bar". The reason why order matters here is that !example2/**/* would exclude the explicit match of example2/foo/**/*.

If you start with '**/*` and then start removing files via exclusions, you arrive at the same result (note this requires an implementation change to not short circuit at the first positive match)

Unfortunately, that's not true, because of the example2/foo/**/* rule followed by the !example2/**/* rule. Applying the !example2/**/ exclusion rule would override the example2/foo/**/* inclusion.

I would love if order didn't matter, but I don't see how that is accomplished with a single flat list without having to parse the patters to understand at which level the wildcard is being applied.

How do you feel about the map syntax I proposed? I think it would be more explicit and allow for additional use-cases such as #9 (comment).

The idea with supporting a map would be that we can represent the boolean expression simply as: (Rule 1 matches ^ !(Rule 1 exclusions)) | (Rule 2 matches ^ !(Rule 2 exclusions)) | (Rule 3 matches ^ !(Rule 3 exclusions)). That is, the top-level list is always "ORed" and the individual list items are always "ANDed".

Thinking about it a bit more, if we keep a single flat list and define it as (Any matches - all exclusions) as you suggested, then I believe it's still possible to accomplish the example with two sets of rules, but not sure if that's what you had in mind:

label2:
- "**/*"
- "!example2/**/*"

label2:
- "!example2/foo/bar"
- "example2/foo/**/*"

@dakale
Copy link
Contributor

dakale commented Apr 28, 2020

Hey @jalaziz I think you bring up a great point, and the map syntax you propose does seem to seem solve it, and other usecases, like:

Branch name: #54

label1:
- match: "example1/foo"
  exclude: ["example1/bar"]
  branch: "featureA"

Change type: #47

label1:
- match: "example1/foo"
  exclude: ["example1/bar"]
  on/event/action: "add/remove/edit/etc"

Having a rich object here, where the match path(s) are specified as a match key instead of assumed to be the default type of the value, allows us to implement a lot more functionality based on other properties, so Im all for it. Ive got some bandwidth currently to work on this action and would like to see it progress.

Have you thought through whether it might make sense to allow both the match/exclude properties to be string | string[] instead of match only being string and excludes only being string[]? For a particular label, the "matchers" array would still be ORed together, but it may allow more flexibility.

Regardless, I think what you have proposed in #9 (comment) looks promising as a starting point and should be extensible. Since a string exclude effectively just a length 1 array, and a string[] match is ultimately compatible with the interface, which is: match is just a boolean matches/doesnt match, regardless of if that means it matches all or matches any (AND vs OR).

Either way Im happy to help get a simple form of these changes in. Do you want to work on a new PR or iterate on this one (to change to map syntax)?

@jalaziz
Copy link
Contributor Author

jalaziz commented Apr 28, 2020

Have you thought through whether it might make sense to allow both the match/exclude properties to be string | string[] instead of match only being string and excludes only being string[]? For a particular label, the "matchers" array would still be ORed together, but it may allow more flexibility.

Funny you mention that. When I was looking at #9 again, I was thinking the same thing. I would definitely add support for a list of strings. Might as well make it as robust as could be.

Either way Im happy to help get a simple form of these changes in. Do you want to work on a new PR or iterate on this one (to change to map syntax)?

I'll iterate on this one. Will work on this today.

@jalaziz
Copy link
Contributor Author

jalaziz commented May 26, 2020

Just an update. I was working on the new richer syntax and then got distracted. However, I'm back to looking at this and trying to get it ready for review.

It turns out that supporting #9 (comment) is not quite the same as the exclusions we've been talking about. Specifically, the use case I originally brought up would match if at least one file does not match the excluded paths. However, the exclusion @fluzzi wants is more of a "reject if path matches".

Since I think there's value in both, I've been working on a change that supports both types of exclusion. I need to get some sleep, but hope to have it ready for review tomorrow.

Paths can be negated to stop searching through the remaining patterns in
a the glob list. All changed files tested and at least one matching file
will result in a label being added.

Fixes actions#9
@jalaziz
Copy link
Contributor Author

jalaziz commented May 27, 2020

@dakale Still need to update the documentation, but I think what I ended up with is far more flexible than my original idea.

Each "matcher" now accepts two possible fields: all and some. Each is an array of globs, possibly negated.

all globs must ALL match ALL changed files. some globs must ALL match SOME changed files. This effectively gives us the AND and OR semantics I original described, but gives you finer-grain control over whether the globs are matched against all or some of the files.

Exclusions can now be represented as:

label:
- some: ["/example/**/*", "!/example/foo/*"]

However, you can also reject a label if a file exists with:

label:
- some: ["/example/**/*"]
  all: ["!/example/foo"]

One thing I did not do is add support for single elements without array or list syntax. I didn't want to complicate type checking, but I can add support if you think it's necessary.

I'll work on documentation once I get some indication that this approach works for everyone.

@ofek
Copy link

ofek commented May 27, 2020

@jalaziz Great job! Hopefully I can get rid of our fork soon https://datadoghq.dev/integrations-core/meta/ci/#fork

A new "rich" matcher object can be provided instead of a normal glob.

The matcher object has two fields that accept an array of globs:
* Globs in "all" must all match every changed file.
* Globs in "some" must all match at least one changed file.

Combined with negated globs, this allows for a precise control of when
labels are applied.
@ofek
Copy link

ofek commented May 27, 2020

@dakale @pjquirk Can we please get another look at this?

@dakale
Copy link
Contributor

dakale commented May 29, 2020

This looks pretty good so far. I see that the there are still README changes that are now outdated and might be misleading. Ideally, outside of full, up to date documentation, it would be best to at least avoid checking in outdated docs

Beyond that my only nit is: How do you feel about "any" instead of "some"? To me, "some" usually means > 1 whereas "any" means > 0?

@jalaziz
Copy link
Contributor Author

jalaziz commented May 29, 2020

This looks pretty good so far. I see that the there are still README changes that are now outdated and might be misleading. Ideally, outside of full, up to date documentation, it would be best to at least avoid checking in outdated docs

Beyond that my only nit is: How do you feel about "any" instead of "some"? To me, "some" usually means > 1 whereas "any" means > 0?

Yeah, I'll be updating the docs, just wanted to make sure you're ok with the approach before adding docs and examples.

I'm definitely ok with any. I just chose some because of the JavaScript Array method name, but I much prefer any/all.

Will update the name and docs.

@jalaziz
Copy link
Contributor Author

jalaziz commented Jun 1, 2020

@dakale everything should be updated.

I can squash once everything I get the final approval.

@ericcornelissen
Copy link
Contributor

Regarding the docs, I would expect the "Basic Examples" to come first, before the "Match Object" documentation, as the latter is more advanced.

@jalaziz
Copy link
Contributor Author

jalaziz commented Jun 1, 2020

Regarding the docs, I would expect the "Basic Examples" to come first, before the "Match Object" documentation, as the latter is more advanced.

@ericcornelissen I chose to place it before "Basic Examples" because it describes the "theory" of the examples. Happy to move it after the examples though.

@dakale
Copy link
Contributor

dakale commented Jun 1, 2020

@jalaziz Im fine with merging this now unless you want to add anything else. No need to squash since I can just do squash+merge anyways.

Ultimately, I think Ill need to go back and reorganize the docs anyways so they dont need to perfect here. I dont think the README is the best place to have extensive docs beyond a few small examples so ill need to measure the right amount to include here, and add some details around which version of labeler this will even be available in. Since the version of the README containing these samples is on the master branch, and that corresponds to the version of the action that has the described functionality, it should be fine. Although it would be ideal to describe it in terms of versions too.

Ill need to create a tag and release for this (will probably be v3), until then to be able to consume these changes users will need to use labeler@master, or using the sha of this merge commit (or newer).

@ofek
Copy link

ofek commented Jun 1, 2020

Yay! 🎉

@jalaziz
Copy link
Contributor Author

jalaziz commented Jun 1, 2020

@dakale I have nothing to add if it all looks good to you.

@akutz
Copy link

akutz commented Jan 29, 2023

Mentioned in vmware-tanzu/vm-operator#50 (review) (this was not added automatically for some reason)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for exclusion paths
8 participants