`tag_no_case` panicks while matching certain unicode capitalized characters. #1719

DelSkayn · 2024-01-03T13:59:01Z

Rust version : 1.75.0
nom version : 7.1.3 and also main at the time of creating the issue.
nom compilation features used: default features only

Test case

fn main() {
    let _ = nom::bytes::complete::tag_no_case::<_,_,nom::error::Error<&str>>("k")("K");
}

The tag_no_case function can panic whenever a character in a to be matched string lowercases to a character with a shorter byte pattern.

In the above example the K to be matched is actually the kelvin sign with the UTF-8 byte pattern of 0xE2 0x84 0xAA. This character has a the normal single byte k as its lowercase character and therefore matches the tag.

The tag_no_case implementation, after matching a string to the tag, splits the given string with the length of the tag:

nom/src/bytes/mod.rs

Line 76 in e87c7da

    
           CompareResult::Ok => Ok((i.take_from(tag_len), OM::Output::bind(|| i.take(tag_len)))),

This assumes that the length in bytes of the matched characters is the same as the tag it was matched to. But, as in the above example, this is not always the case, resulting in the function sometimes trying to split a string outside of a character boundary and panicking

The text was updated successfully, but these errors were encountered:

epage mentioned this issue Jan 3, 2024

tag(Caseless(...)) panicks while matching certain unicode capitalized characters. winnow-rs/winnow#414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tag_no_case` panicks while matching certain unicode capitalized characters. #1719

`tag_no_case` panicks while matching certain unicode capitalized characters. #1719

DelSkayn commented Jan 3, 2024 •

edited

tag_no_case panicks while matching certain unicode capitalized characters. #1719

tag_no_case panicks while matching certain unicode capitalized characters. #1719

Comments

DelSkayn commented Jan 3, 2024 • edited

Test case

`tag_no_case` panicks while matching certain unicode capitalized characters. #1719

`tag_no_case` panicks while matching certain unicode capitalized characters. #1719

DelSkayn commented Jan 3, 2024 •

edited