Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caret negation ([^...]) in addition to exclamation mark #141

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tertsdiepraam
Copy link

@tertsdiepraam tertsdiepraam commented Mar 11, 2024

Closes #116.

This feature has some interesting history. Let's start with the source of everything. Here's POSIX:

If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the character ( '!' ) shall replace the character ( '^' ) in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression. A bracket expression starting with an unquoted character produces unspecified results. Otherwise, '[' shall match the character itself.

(emphasis mine)

The glob(7) man page supports this:

Now that regular expressions have bracket expressions where the negation is indicated by a ^, POSIX has declared the effect of a wildcard pattern [^...] to be undefined.

This seems to have had the effect that many implementations have opted to support both [!...] and [^...]. This leaves us in a situation where this crate is correct, but sometimes leads to unexpected behaviour for users expecting [^...] to work. This happened in uutils and could become an issue for other projects as well, such as sudo-rs (see memorysafety/sudo-rs#834). Fixing this issue with a workaround is quite tricky (see e.g. uutils/coreutils#5584).

Now for a fun round of "What do the others do?":

Python: only `[!...]` is negation
from fnmatch import fnmatch
print(fnmatch("b", "[!a]"))
print(fnmatch("a", "[^a]"))

prints

True
False
libc: both work
#include <stdio.h>
#include <fnmatch.h>

int main() {
    int res1 = fnmatch("[^a]", "b", 0);
    int res2 = fnmatch("[!a]", "b", 0);
    printf("Result: %d, %d", res1, res2);
    return 0;
}
Result: 0, 0
sudo's custom fnmatch: both work

See this relevant line in the code: https://github.com/sudo-project/sudo/blob/b6175b78ad1c4c9535cad48cb76addf53352a28f/lib/util/fnmatch.c#L174

npm `glob` package: both work

Documented here: https://www.npmjs.com/package/glob#glob-primer

ruby: both work (but `^` is emphasized in the docs)
puts File::fnmatch("[^a]", "b")
puts File::fnmatch("[!a]", "b")

prints

true
true
Java: only `[!...]` is negation
import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) {
        PathMatcher x = FileSystems.getDefault().getPathMatcher("glob:[^a]");
        System.out.println(x.matches(Paths.get("b")));

        PathMatcher y = FileSystems.getDefault().getPathMatcher("glob:[!a]");
        System.out.println(y.matches(Paths.get("b")));
    }
}

prints

false
true

Downsides and alternatives

The major downside of this is that it is a breaking change and people might be relying on the current behaviour, but that is difficult to tell. It might also be that people have unknowingly written incorrect patterns using ^, because that's what they expect from regex. Maybe we can estimate that with a GitHub code search.

So, there is a case to be made that this should belong in a fork. The reason that I believe that this change belongs here is that I think it helps all the projects that re-implement existing C code into Rust and removes an edge case.

Another option is to make the parsing of patterns in this crate configurable, where the default behaviour stays the same. This could potentially also be extended to allow for escape sequences and other features. Again, though, that comes with downsides. Mostly because all combinations of options would have to be tested and maintained. Given that this crate is a cornerstone in the Rust ecosystem, that combinatorical explosion might lead to (security) issues in various projects.

Relevant GitHub Code Searches

I did a couple searches for this pattern on GitHub to estimate the impact of this change. All searches are with lang:rust:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow the use of ^ for negation in addition to !?
1 participant