-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zwj codepoints, skin tones, families, and kisses #2
Comments
I honestly didn't even know about zwj codepoints until a few weeks ago. Calling @mathiasbynens (Unicode wizard). Do you happen to know if there's any way to do this? Maybe you happen to have a module for it. ;) |
Yup, GitHub are "nice" enough to replace some emojis with web components...
Same thing here: https://github.com/sindresorhus/skin-tone |
For inspiration, take a look at how lodash attempts to solve this (see its internal
This is the right question. Thereβs no way to detect whether the current environment renders the given set of code points as a single grapheme/glyph/emoji, which is what you really want here. As for emoji + ZWJ, you could programmatically account for the combinations listed here: http://unicode.org/emoji/charts/emoji-zwj-sequences.html But that list changes over time, doesnβt necessarily reflect the environment your code runs in, and excludes non-emoji uses of ZWJ. |
A link to the stringToArray reference. Woo! _.size('πΆ') // 1
_.size('πΆπ½') // 1
_.size('π©βπ©βπ¦βπ¦') // 1
_.size('π¨ββ€οΈβπβπ¨') // 1 |
@jdalton I think @mathiasbynens hit the nail on the head. Observe: I'm ok with closing this issue with a doc patch, but it seems like the only way to get the right answer programmatically is an option to specify whether zwj chars should be respected, or perhaps even an option to specify which zwj combinations should be treated as combined. |
To me if the user specifies something with a zwj it's their intent to consider it as part of the joined whole regardless of how the device renders the emoji. Our methods like |
@jdalton I donβt disagree β for all scenarios in which youβd want to count grapheme clusters youβd want π©βπ©βπ¦βπ¦ to count as a single unit. But note that this project is about getting the visual width of a string, which is inherently dependent on the environment it runs in. |
Ya, no worries. I just popped in, after being mentioned, to give a woo & a π for fancy emojis. |
I know this issue is pretty old, but is there any solutions (or workable hacks) for this? |
@eamodio I'm on it, if everything goes well... |
@Offirmo Thanks! FYI, I'm not sure it helps with this, but I hacked together a solution for my use-case here: https://github.com/eamodio/vscode-gitlens/blob/99d6da9c9032e244a3dcaeb6f86ca65eeebfbd8c/src/system/string.ts#L130-L188 |
@eamodio thanks! I had a look at your implementation and I believe my pending one will be more generic (for ex. emojis are not always taking 2 cols). But interesting read! |
@Offirmo thanks. Definitely looking forwards to a more robust generic solution! |
Hi there? I know it's a bit old... but I think |
I had switched to power-assert-util-string-width, which depends on eastasianwidth. |
so, done ? |
I stopped working on it, sorry. 1) it ended up being super complicated 2) it needed a refacto of this lib and @sindresorhus wasn't keen on changing the API. |
This package now depends on that too. |
Consider these various glyphs:
The first is a generic "simpsons-colored" baby. This module correctly interprets it as a single column. (One might argue it really ought to be considered full-width, or 2 columns, since most terminals render emoji as extra wide, but one would be wrong to make that argument, because most terminals also "incorrectly" overlap the next character on top of the emoji, so it actually only "takes up" one column.)
The second is a baby with a specific skin tone. This module doesn't handle the zero-width-joiner (or "zwj", pronounced "zwidge") properly, so it reads as 2 columns.
The third is a "woman [zwj] woman [zwj] boy [zwj] boy". It's a full 25 bytes of familial goodness, and this module treats it as 7 columns.
The fourth is "man [zwj] heart [zwj] kiss [zwj] man", and comes in at 8 columns.
Is this problem even solvable? Conceivably, something like "fireman [zwj] cat" could be turned into "fire cat" by Apple or Google or Microsoft tomorrow, and a current 2 column set of code points could become 1.
If not, it seems like maybe it should be called out in the readme as just an impossible thing we can never hope to account for? Another way would be to optimistically treat anything with zero width joiners as single chars, but that might be too optimistic?
The text was updated successfully, but these errors were encountered: