Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing behavior of regex parameters #1783

Closed
agausmann opened this issue May 14, 2020 · 3 comments
Closed

Confusing behavior of regex parameters #1783

agausmann opened this issue May 14, 2020 · 3 comments
Labels

Comments

@agausmann
Copy link
Contributor

Right now, the regex set implementation used by bindgen adds anchors to user-provided expressions so that they must match the entire input string. I can't find this anywhere in the documentation, and it has caused problems before (#1755). Inspired by this, I did a bit of research.

Of the ~600 crates that depend on bindgen, about half of them use builder functions that accept regexes, for example, the whitelist and blacklist functions. Of those crates, there are around 30 (5% of all dependents, 4% when weighted by download count) that currently add anchors to their regexes that aren't necessary.

Another 30 crates use alternation in their regexes, which is bug-prone because of how it interacts with the implicit anchors. Many of these uses aren't technically correct, even if they do work correctly within the scope of the crate.

I'm creating this issue to start a discussion about whether this is a problem that needs to be fixed, and if so, how we could fix it without causing breakage for existing users.

@agausmann
Copy link
Contributor Author

agausmann commented May 14, 2020

I understand why the anchors are added, because it makes it easier to specify exact matches, but I'm not happy with the effect it has when regular expressions are being used. When I give a regular expression, I generally expect it to be evaluated as-is. For example, to specify a prefix, I'd expect that I'd be able to write a raw regular expression like ^gl, but in bindgen this won't work because it becomes ^^gl$. The right answer (currently) is gl.*, which then becomes ^gl.*$.

On the other hand, it might be acceptable behavior, considering that grep also has a flag to match the whole line, writing regular expressions that check for exact matches may not be as foreign as I originally thought:

       -x, --line-regexp
              Select only those matches that exactly match the whole line.
              For a regular expression pattern, this is like parenthesizing
              the pattern and then surrounding it with ^ and $.

@kulp
Copy link
Member

kulp commented Jun 2, 2022

Also #2195 may be relevant, if it shows that this problem is biting users.

pvdrz added a commit to ferrous-systems/rust-bindgen that referenced this issue Nov 9, 2022
When this option is enabled, bindgen won't parentesize and surround any
regex with ^ and $.

Fixes rust-lang#1783
@pvdrz
Copy link
Contributor

pvdrz commented Nov 14, 2022

Solved via #2345

@pvdrz pvdrz closed this as completed Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants