Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat emoji presentation sequences as fullwidth #35

Closed

Conversation

Jules-Bertholet
Copy link
Contributor

@Jules-Bertholet Jules-Bertholet commented Feb 10, 2024

UAX11 says:

[UTS51] emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

Lookup is done with a 2-level trie.

@Jules-Bertholet
Copy link
Contributor Author

In terms of UTS 51 conformance, with this PR, this crate will give the correct widths for:

However, it may overestimate (though never underestimate) the rendered widths of:

@Jules-Bertholet Jules-Bertholet force-pushed the emoji-presentation branch 6 times, most recently from 3cc64ff to 5525e7d Compare February 14, 2024 15:24
@Jules-Bertholet
Copy link
Contributor Author

Jules-Bertholet commented Feb 14, 2024

I've replaced the binary search with a better datastructure, and also added a section in the rustdoc documenting the full width rules.

scripts/unicode.py Outdated Show resolved Hide resolved
src/tables.rs Outdated Show resolved Hide resolved
scripts/unicode.py Show resolved Hide resolved
@Jules-Bertholet
Copy link
Contributor Author

The not-yet-released Unicode 16 adds 8 new non-emoji standardized variation sequences that affect width: https://unicode.org/alloc/Pipeline.html#variation_sequences, https://www.unicode.org/L2/L2023/23212r-quotes-svs-proposal.pdf. In time, we'll need to support those as well.

@Jules-Bertholet Jules-Bertholet mentioned this pull request Apr 23, 2024
@Jules-Bertholet
Copy link
Contributor Author

This PR is bloated, I'm splitting it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants