Skip to content

Commit

Permalink
syntax: make \p{cf} work
Browse files Browse the repository at this point in the history
It turns out that 'cf' is also an abbreviation for the 'Case_Folding'
property. Even though we don't actually support a 'Case_Folding'
property, a quirk of our code caused 'cf' to fail since it was treated
as a normal boolean property instead of a general category. We fix it be
special casing it.

Note that '\p{gc=cf}' worked and continues to work.

If we ever do add the 'Case_Folding' property, we'll not be able to
support its abbreviation since it is now taken by 'Format'.

Fixes #719
  • Loading branch information
BurntSushi committed Oct 13, 2020
1 parent fe9b5c9 commit b1489c8
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 2 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
@@ -1,3 +1,13 @@
1.4.1 (2020-10-13)
==================
This is a small bug fix release that makes `\p{cf}` work. Previously, it would
report "property not found" even though `cf` is a valid abbreviation for the
`Format` general category.

* [BUG #719](https://github.com/rust-lang/regex/issues/719):
Fixes bug that prevented `\p{cf}` from working.


1.4.0 (2020-10-11)
==================
This releases has a few minor documentation fixes as well as some very minor
Expand Down
12 changes: 10 additions & 2 deletions regex-syntax/src/unicode.rs
Expand Up @@ -237,8 +237,16 @@ impl<'a> ClassQuery<'a> {
fn canonical_binary(&self, name: &str) -> Result<CanonicalClassQuery> {
let norm = symbolic_name_normalize(name);

if let Some(canon) = canonical_prop(&norm)? {
return Ok(CanonicalClassQuery::Binary(canon));
// This is a special case where 'cf' refers to the 'Format' general
// category, but where the 'cf' abbreviation is also an abbreviation
// for the 'Case_Folding' property. But we want to treat it as
// a general category. (Currently, we don't even support the
// 'Case_Folding' property. But if we do in the future, users will be
// required to spell it out.)
if norm != "cf" {
if let Some(canon) = canonical_prop(&norm)? {
return Ok(CanonicalClassQuery::Binary(canon));
}
}
if let Some(canon) = canonical_gencat(&norm)? {
return Ok(CanonicalClassQuery::GeneralCategory(canon));
Expand Down
3 changes: 3 additions & 0 deletions tests/unicode.rs
Expand Up @@ -74,6 +74,9 @@ mat!(
Some((0, 3))
);
mat!(uni_class_gencat_format, r"\p{Format}", "\u{E007F}", Some((0, 4)));
// See: https://github.com/rust-lang/regex/issues/719
mat!(uni_class_gencat_format_abbrev1, r"\p{cf}", "\u{E007F}", Some((0, 4)));
mat!(uni_class_gencat_format_abbrev2, r"\p{gc=cf}", "\u{E007F}", Some((0, 4)));
mat!(
uni_class_gencat_initial_punctuation,
r"\p{Initial_Punctuation}",
Expand Down

0 comments on commit b1489c8

Please sign in to comment.