Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify None case in bstr::decode_utf8 #139

Open
glts opened this issue Nov 9, 2022 · 1 comment
Open

Clarify None case in bstr::decode_utf8 #139

glts opened this issue Nov 9, 2022 · 1 comment
Labels
doc Documentation should be improved.

Comments

@glts
Copy link

glts commented Nov 9, 2022

Thank you for this useful library.

In bstr 1.0.1, the documentation for bstr::decode_utf8 states:

When unsuccessful, None is returned along with the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence. In this case, the number of bytes consumed is always between 0 and 3, inclusive, where 0 is only returned when slice is empty.

bstr::decode_utf8(b"\xFFabc") returns (None, 1). The byte \xFF cannot be decoded so the result is None; but the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence would be 0, as \xFF is not a valid UTF-8 prefix.

Can you confirm, or can you paraphrase the wording for me?

@BurntSushi
Copy link
Owner

Ah. 1 is indeed correct. The docs need to be updated. Returning 0 wouldn't make sense, because 0 is meant to be the terminal condition of a loop. Returning 0 in any other case leads to more complex loop logic that would be easy to get wrong, which would lead to an infinite loop in practice.

@BurntSushi BurntSushi added the doc Documentation should be improved. label Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation should be improved.
Projects
None yet
Development

No branches or pull requests

2 participants