Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Include/Exclude filtering capability to Unzip Task (#5169) #6018

Merged
merged 19 commits into from Feb 6, 2021

Conversation

IvanLieckens
Copy link
Contributor

@IvanLieckens IvanLieckens commented Jan 11, 2021

Fixes #5169

Context

See #5169

Changes Made

Unzip Task now has "Include" and "Exclude" optional properties to pass a pattern to filter archive entries to be unzipped.

Testing

Added following tests:

  • CanUnzip_WithIncludeFilter
  • CanUnzip_WithExcludeFilter
  • CanUnzip_WithIncludeAndExcludeFilter

These 3 test the ability to include/exclude files from the archive unzip.

Notes

Unable to translate the resources to all languages. Can someone provide guidance/translations?

@rainersigwald
Copy link
Member

Don't worry about the failing PR builds--I believe #6019 will fix that.

Unable to translate the resources to all languages. Can someone provide guidance/translations?

Thanks, but no need to worry! A Microsoft-funded team will do the translations. Details in the loc doc if you're interested.

@rainersigwald
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -2792,6 +2792,9 @@
<data name="Unzip.DidNotUnzipBecauseOfFileMatch">
<value>Did not unzip from file "{0}" to file "{1}" because the "{2}" parameter was set to "{3}" in the project and the files' sizes and timestamps match.</value>
</data>
<data name="Unzip.DidNotUnzipBecauseOfFilter">
<value>Did not unzip file "{0}" because it didn't match the include or matched the exclude filter.</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
<value>Did not unzip file "{0}" because it didn't match the include or matched the exclude filter.</value>
<value>Did not unzip file "{0}" because it didn't match the include or because it matched the exclude filter.</value>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I have applied it in the new commits

using (TestEnvironment testEnvironment = TestEnvironment.Create())
{
TransientTestFolder source = testEnvironment.CreateFolder(createFolder: true);
TransientTestFolder destination = testEnvironment.CreateFolder(createFolder: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind modifying this test to use wildcards such that two are included, of which one is also excluded; a third is also excluded; and a fourth isn't excluded or included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, I hope this new version is more what you were looking for?

@@ -1217,6 +1217,8 @@ public sealed partial class Unzip : Microsoft.Build.Tasks.TaskExtension, Microso
public Unzip() { }
[Microsoft.Build.Framework.RequiredAttribute]
public Microsoft.Build.Framework.ITaskItem DestinationFolder { get { throw null; } set { } }
public string Exclude { get { throw null; } set { } }
public string Include { get { throw null; } set { } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stick with regular expressions for this, we should change the names for Exclude and Include as they are "default" msbuild names. ie. items are added via Include and the patterns differ between them and regular expressions. A suggested name during PR review was IncludePattern & ExcludePattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, the functionality differs indeed. I have pushed the change.

Copy link
Member

@Forgind Forgind Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benvillalobos, I thought we'd agreed globs were more MSBuild-y than regex? Confused by your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did, I posted this while we were talking about it, hence the "If we stick with regular expressions." The current implementation is still regex, is this how we process includes and excludes normally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; we use globs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benvillalobos ok, thank you for the clearing up. Is there a good example in MSBuild of how you'd like this to function I can use as a lead for the implementation? Just want to make sure that it feels native without any quirks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out Expander.cs, ExpandIntoItemsLeaveEscaped may have what you're looking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ItemSpec.cs is also relevant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @benvillalobos , I studied Expander and ItemSpec but they were outside of reach for the Tasks assembly and I did not want to introduce dependencies so I leveraged FileMatcher to handle the normalization and verification of the globs and paths. It does not support property references due to this. I hope this matches with your expected behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSBuildGlob would have been great here, but unfortunately it's not visible by tasks. :(

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR overall looks good, pending the switch from regex to globs.

@@ -1217,6 +1217,8 @@ public sealed partial class Unzip : Microsoft.Build.Tasks.TaskExtension, Microso
public Unzip() { }
[Microsoft.Build.Framework.RequiredAttribute]
public Microsoft.Build.Framework.ITaskItem DestinationFolder { get { throw null; } set { } }
public string Exclude { get { throw null; } set { } }
public string Include { get { throw null; } set { } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're in agreement 🙂

@IvanLieckens the to-do list here would be:

  • Rename ExcludePattern and IncludePattern back to Exclude and Include (sorry)
    • We wanted ExcludePattern if we were sticking with RegEx, but it turns out we don't want to use regex here
  • Change the implementation to expect * parse a glob pattern

@@ -31,6 +31,8 @@ internal class FileMatcher
private static readonly char[] s_wildcardCharacters = { '*', '?' };
private static readonly char[] s_wildcardAndSemicolonCharacters = { '*', '?', ';' };

private static readonly string[] s_propertyReferences = { "$(", "@(" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static readonly string[] s_propertyReferences = { "$(", "@(" };
private static readonly string[] s_propertyAndItemReferences = { "$(", "@(" };

Items are referred to with an @ and properties with $.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just copied from the old name of the method. I adjusted the naming now to these.

/// <summary>
/// Determines whether the given path has any property references.
/// </summary>
internal static bool HasPropertyReferences(string filespec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal static bool HasPropertyReferences(string filespec)
internal static bool HasPropertyOrItemReferences(string filespec)

Another option would be to create a HasPropertyReferences that checks for $( and a separate HasItemReferences that checks for @(. Though I don't feel too strongly about that extra suggestion since filematcher already has something like s_wildcardAndSemicolonCharacters and HasWildcardsOrSemicolon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I split up the existing method that was already there to allow more granular calling and identification. But I didn't want to introduce too many new methods if they weren't needed.

/// </summary>
internal static bool HasWildcardsOrSemicolon(string filespec)
{
return -1 != filespec.LastIndexOfAny(s_wildcardAndSemicolonCharacters);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any significant perf difference between using Aggregate and -1 != filespec.LastIndexOfAny(s_wildcardAndSemicolonCharacters)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you thinking Aggregate with a function like Aggregate(false, (acc, ch) => acc || s_wildcardAndSemicolonCharacters.Contains(ch))? That would almost certainly be slower than this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the difference between s_propertyReferences.Aggregate(false, (current, propertyReference) => current | filespec.Contains(propertyReference)); and -1 != filespec.LastIndexOfAny(s_propertyReferences);

If the former is more efficient, we can change HasWildcardsOrSemicolon to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe LastIndexOf is going to be faster but I had to use the aggregate for the other because the s_wildcardAndSemicolonCharacters are char[] and the s_propertyReferences are string[]. Would need to setup a microbenchmark to validate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I didn't realize that we couldn't replicate what was already there. This looks fine to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgind improved it further now to use Any() :)

src/Shared/FileMatcher.cs Outdated Show resolved Hide resolved
patterns = pattern.Contains(';')
? pattern.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries).Select(FileMatcher.Normalize).ToArray()
: new[] { pattern };
if (patterns.Any(p => p.IndexOfAny(Path.GetInvalidPathChars()) != -1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Move this before the split?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very strange...the docs for Path.GetInvalidPathChars explicitly says that "on Windows-based desktop platforms, invalid path characters might include...less than (<), greater than (>), pipe (|),..." yet trying that out on my Windows-based desktop platform, it didn't. I submitted an issue about it: dotnet/dotnet-api-docs#5292

I'd use FileUtilities.InvalidPathChars and modify the tests to target | or a character 1-31 and make it the same across all platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very interesting, I changed to using the FileUtilities.InvalidPathChars and added a pipe in the name in the test whilst removing the platform specific flag.

src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
@@ -212,5 +275,41 @@ private bool ShouldSkipEntry(ZipArchiveEntry zipArchiveEntry, FileInfo fileInfo)
&& zipArchiveEntry.LastWriteTime == fileInfo.LastWriteTimeUtc
&& zipArchiveEntry.Length == fileInfo.Length;
}

private bool ParseIncludeExclude()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return anything? Looking at ParsePattern below, it either throws an error or returns true. That means that this either throws an error or returns true. I was momentarily confused when I thought it was skipping everything if there were no include/exclude present, so I'd just remove that bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed the return and swapped to using Log.HasLoggedErrors to jump out preventing further execution when the Task is misconfigured.

@IvanLieckens IvanLieckens changed the title Added Include/Exclude RegEx filtering capability to Unzip Task (#5169) Added Include/Exclude filtering capability to Unzip Task (#5169) Jan 29, 2021
Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for bearing with us on this!

@@ -204,7 +204,7 @@ internal static bool HasWildcardsSemicolonItemOrPropertyReferences(string filesp
/// </summary>
internal static bool HasPropertyOrItemReferences(string filespec)
{
return s_propertyAndItemReferences.Any(ref=> filespec.Contains(ref));
return s_propertyAndItemReferences.Any(filespec.Contains);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice!

src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
src/Tasks/Unzip.cs Outdated Show resolved Hide resolved
IvanLieckens and others added 2 commits January 30, 2021 00:17
Co-authored-by: Mihai Codoban <micodoba@microsoft.com>
Co-authored-by: Mihai Codoban <micodoba@microsoft.com>
@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Jan 29, 2021
@Forgind Forgind merged commit 70f6767 into dotnet:master Feb 6, 2021
@Forgind
Copy link
Member

Forgind commented Feb 6, 2021

Thanks @IvanLieckens!

@IvanLieckens IvanLieckens deleted the feature/UnzipFiltering branch February 7, 2021 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unzip Task: Enable filtering
6 participants