Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tag(Caseless(...)) panicks while matching certain unicode capitalized characters. #414

Closed
epage opened this issue Jan 3, 2024 · 1 comment · Fixed by #451
Closed

tag(Caseless(...)) panicks while matching certain unicode capitalized characters. #414

epage opened this issue Jan 3, 2024 · 1 comment · Fixed by #451
Labels
A-combinator Area: combinators C-bug Category: Things not working as expected M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Milestone

Comments

@epage
Copy link
Collaborator

epage commented Jan 3, 2024

Adapted from rust-bakery/nom#1719

fn main() {
    let _ = nom::bytes::complete::tag_no_case::<_,_,nom::error::Error<&str>>("k")("K");
}

The tag_no_case function can panic whenever a character in a to be matched string lowercases to a character with a shorter byte pattern.

In the above example the K to be matched is actually the kelvin sign with the UTF-8 byte pattern of 0xE2 0x84 0xAA. This character has a the normal single byte k as its lowercase character and therefore matches the tag.

The tag_no_case implementation, after matching a string to the tag, splits the given string with the length of the tag.

This assumes that the length in bytes of the matched characters is the same as the tag it was matched to. But, as in the above example, this is not always the case, resulting in the function sometimes trying to split a string outside of a character boundary and panicking

@epage epage added A-combinator Area: combinators M-breaking-change Meta: Implementing or merging this will introduce a breaking change. C-bug Category: Things not working as expected labels Jan 3, 2024
@epage
Copy link
Collaborator Author

epage commented Jan 3, 2024

We need CompareResult::Ok to include the byte offset for where the difference occurs and to split on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-combinator Area: combinators C-bug Category: Things not working as expected M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant