Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ascii class bug and 'Hir::is_match_empty' bug #860

Merged
merged 2 commits into from May 18, 2022

Conversation

BurntSushi
Copy link
Member

The commit messages should explain a bit, but this picks off a couple low hanging bug fruits.

Fixes #680, Fixes #859

This fixes a bug in how ASCII class unioning was implemented. Namely, it
previously and erroneously unioned together two classes and then applied
negation/case-folding based on the most recently added class, even if
the class added previously wasn't negated. So for example, given the
regex '[[:alnum:][:^ascii:]]', this would initialize the class with
'[:alnum:]', then add all '[:^ascii:]' codepoints and then negate the
entire thing because of the negation in '[:^ascii:]'. Negating the
entire thing is clearly wrong and not the intended semantics.

We fix this by applying negation/case-folding only to the class we're
dealing with, and then we union it with whatever existing class we're
building.

Fixes #680
This was incorrectly defined for \b. Previously, I had erroneously made
it return true only for \B since \B matches '' and \b does not match
''. However, \b does match the empty string. Like \B, it only matches a
subset of empty strings, depending on what the surrounding context is.
The important bit is that it can match *an* empty string, not that it
matches *the* empty string.

We were not yet using this predicate anywhere in the regex crate, so we
just fix the implementation and update the tests.

This does present a compatibility hazard for anyone who was using this
function, but as of this time, I'm considering this a bug fix since \b
clearly matches an empty string.

Fixes #859
@BurntSushi BurntSushi merged commit 88a2a62 into master May 18, 2022
@BurntSushi BurntSushi deleted the ag/syntax-updates branch May 18, 2022 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant