Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don’t slice a surrogate pair in half when you truncate. #34

Open
issuefiler opened this issue May 25, 2022 · 3 comments
Open

Don’t slice a surrogate pair in half when you truncate. #34

issuefiler opened this issue May 25, 2022 · 3 comments

Comments

@issuefiler
Copy link

issuefiler commented May 25, 2022

Handle the possible surrogate pair (a pair of two UTF-16 code points) at the end properly when you truncate the filename.

Like how the package “truncate-utf8-bytes” does; but note that this package truncates a string to a specific number of bytes, not UTF-16 code points, unlike String.prototype.slice.

And by the way, truncating to a specific number of bytes instead of UTF-16 code points might be more suitable for filenames.

@issuefiler
Copy link
Author

Example

"He slices the 🦄 in half".slice(0, 15)
// "🦄" === "\uD83E\uDD84"
"He slices the \uD83E"

@issuefiler
Copy link
Author

Note that slice also breaks a Unicode grapheme cluster (e.g. combined family emojis). While disassembled Unicode grapheme clusters are still valid, breaking a surrogate pair, which represents a single code point, renders the string invalid in UTF-8 and UTF-16.

@sindresorhus
Copy link
Owner

We can use Intl.Segmenter to solve this now that this package targets Node.js 16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants