Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

feat: add exclude files when using the git file generator (#468) #514

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adamjohnson01
Copy link

This adds the ability to exclude specific paths when using the git file generator.

Closes #468

Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Just one comment for now. Also, could you update the git generator docs page to show how to use the new feature?

pkg/generators/git.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! My only remaining concern is that a/**/b.json might be a valid path in path.Match, and we might be changing the behavior for any users who have that kind of pattern. I don't have time to investigate at the moment.

Are you strongly opposed to just introducing a new glob field for both files and directories? I feel like that's safer and easier for users to understand.

@adamjohnson01
Copy link
Author

adamjohnson01 commented Feb 26, 2022

a/**/b.json and a/*/b.json will be identical in path.Match. That pattern would not be used currently because only the directory generator currently uses path.Match.

I think the only potential issues would occur if there were users that had ** in their directory path excludes because path.Match would treat that a single *. If you are concerned about this then we could have a two different filtering functions. One. that includes glob for files if there is a ** and another that just uses path.Match for directories.

@adamjohnson01 adamjohnson01 force-pushed the add-git-file-exclude branch 7 times, most recently from 2c4fc4a to 3d484be Compare February 27, 2022 23:40
@adamjohnson01
Copy link
Author

adamjohnson01 commented Feb 27, 2022

@crenshaw-dev, I misunderstood what you meant about adding a new glob field. I think that would be more confusing. The only time glob will need to be used is when there is a ** and exclude is true. The ** works when generating the directory and file lists as under the covers is it just git ls-files and the path. The path.Match function does not translate ** the same way that the git ls-files does so the current setup is producing unexpected behaviour for anyone using ** in excludes. I think that adding the ** check could potentially break some existing config but the config that it 'breaks' is not working as it should anyway. Changing it will make the exclude path matching consistent with the file and directory path generation which will make troubleshooting issues with excludes a lot easier.

@crenshaw-dev
Copy link
Collaborator

@adamjohnson01 thanks for the explanation! I don't think I'll be able to give this the review it deserves until Monday. Thanks for your patience!

@crenshaw-dev
Copy link
Collaborator

After thinking some more, I'm still not comfortable inferring pattern type.

  1. We can't be sure of the intent of someone who is currently using ** in a pattern. It's possibly a typo. But if we change the behavior, it could break currently-working AppSets. (For example, a/b/** would now return a/b/c/d/ along with a/b/c/.
  2. Glob has other behavior differences. For example, a pattern containing { or } will be interpreted differently. Someone currently using path/**/{with-literal-braces} will find that /path/a/{with-literal-braces} no longer matches. It's a far-edge case, but still possible.
  3. Someone who wants to use glob's pattern-list feature ({pattern,list}) will not be able to, unless their pattern also contains **. We'd have to broaden the feature detection, possibly breaking things for existing users.
  4. The glob library hasn't been maintained in a while, and there are weird quirks which could be very confusing for a user who changes nothing but * to ** and suddenly finds that other parts of the pattern are broken.

@adamjohnson01
Copy link
Author

@crenshaw-dev, those are valid points. How about using the doublestar library instead of glob. It implements support for double star ** matches for path.Match. The patterns are the same other than ** support.

@crenshaw-dev
Copy link
Collaborator

I do like doublestar a lot more! Looks better-maintained. But I don't think it resolves the first three points.

I've been using these tests to compare behavior:

func TestGlobVsMatch(t *testing.T) {
	cases := []struct {
		name string
		testPath string
		pattern string
		expected bool
	}{
		{
			name: "double-star does not match separator",
			testPath: "/a/b/c",
			pattern: "/a/**",
			expected: true,
		},
		{
			name: "{ is a valid part of a path",
			testPath: "/a/{filename}",
			pattern: "/a/{filename}",
			expected: true,
		},
	}

	for _, testCase := range cases {
		testCaseCopy := testCase

		t.Run(testCaseCopy.name, func(t *testing.T) {
			t.Parallel()
			pathMatches, err := path.Match(testCaseCopy.pattern, testCaseCopy.testPath)
			assert.NoError(t, err)
			assert.Equal(t, testCaseCopy.expected, pathMatches, "path.Match returned %v instead of %v for pattern %v and path %v", pathMatches, testCaseCopy.expected, testCaseCopy.pattern, testCaseCopy.testPath)
			doublestarMatches, err := doublestar.Match(testCaseCopy.pattern, testCaseCopy.testPath)
			assert.NoError(t, err)
			assert.Equal(t, testCaseCopy.expected, doublestarMatches, "doublestar.Match returned %v instead of %v for pattern %v and path %v", doublestarMatches, testCaseCopy.expected, testCaseCopy.pattern, testCaseCopy.testPath)
		})
	}
}

@adamjohnson01
Copy link
Author

@crenshaw-dev

I do like doublestar a lot more! Looks better-maintained. But I don't think it resolves the first three points.

I think that 1 is the only point that it does not solve but this is limited to excludes and not the path generation as that is handled by git ls-files. The other points are no longer valid as we will no longer be using glob, we will be using path.Match again.

@crenshaw-dev
Copy link
Collaborator

this is limited to excludes and not the path generation as that is handled by git ls-files

Can you clarify what you mean here? I'm sure I'm missing something obvious. But as far as I can tell, file listing is currently performed by cloning the repo and using filepath.Walk to generate a list of files.

So the behavior of this generator would change after merging this PR:

  - git:
      repoURL: https://github.com/argoproj/applicationset.git
      revision: HEAD
      files:
      - path: "examples/git-generator-files-discovery/cluster-config/**/config.json"

Instead of matching only files one directory away from cluster-config, this would seem to now match config.json files in further-nested directories.

@adamjohnson01
Copy link
Author

@crenshaw-dev

this is limited to excludes and not the path generation as that is handled by git ls-files

Can you clarify what you mean here? I'm sure I'm missing something obvious. But as far as I can tell, file listing is currently performed by cloning the repo and using filepath.Walk to generate a list of files.

You are not missing anything. I am only half correct. filepath.Walk is used for directories generation and git ls-files is used for file generation.

So the behavior of this generator would change after merging this PR:

  - git:
      repoURL: https://github.com/argoproj/applicationset.git
      revision: HEAD
      files:
      - path: "examples/git-generator-files-discovery/cluster-config/**/config.json"

Instead of matching only files one directory away from cluster-config, this would seem to now match config.json files in further-nested directories.

This would not change as git ls-files supports ** already and we are only adding adding exclude functionality not changing generation functionality. The exclude is being implemented to be inline with the way generation works for files.

I think that it is probably best if we implement two different filtering functions. One for directories and one for files. That way we can leave the current directory filtering which works the same way as directory generation alone. That way we will not break any existing config.

The new exclude config for files should work the same way that the generation for files works so we will need to implement ** functionality.

What do you think?

@crenshaw-dev
Copy link
Collaborator

Ah! Thanks for the clarification. Had to look at the code for a few minutes to understand what's happening. Yep, your idea of two different filter functions makes a lot of sense.

…tor (argoproj#468)

Signed-off-by: Adam Johnson <adamjohnson01@gmail.com>
Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for walking me through understanding the implementation. Took me a while to understand fully. 😄

Comment on lines 12 to 13
- path: "examples/git-generator-files-discovery/cluster-config/**/config.json"
exclude: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Looks like it would exclude everything.

@rishabh625
Copy link
Contributor

@adamjohnson01 : Can u please move this PR into argocd as applicationset is moved. If u find it laborious, i can do it on your behalf with your consent

@adamjohnson01
Copy link
Author

@rishabh625, I will give it a go

@crenshaw-dev
Copy link
Collaborator

@adamjohnson01 let us know if you'd rather us move over the PR on your behalf. :-)

@yebolenko
Copy link

any progress on this PR?
When it will be merged?

@rishabh625
Copy link
Contributor

@yebolenko it's already merged and will be available in next release argoproj/argo-cd#9150

@bitfactory-henno-schooljan

@yebolenko it's already merged and will be available in next release argoproj/argo-cd#9150

AFAICS this merge is related to excluding entire repositories using the SCM generator, how would I use this to exclude only a single file using the Git file generator when generating all Applications from a single Git repository?

@adamjohnson01
Copy link
Author

I have not had time to work on this but will try and get it moved across.

@bitfactory-henno-schooljan

Has there been any progress on this? I would like to ignore certain directory patterns but keep the rest, which is impossible using only glob patterns for inclusion as it is now.

@vinicius-deoliveira
Copy link

Hi folks,

Do we have some update about this? I would like to use exclude files when using git file generator.

@crenshaw-dev
Copy link
Collaborator

I have not seen this feature attempted on the new repo

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to exclude files when using the git file generator
7 participants