Hangul / Multi grapheme input is broken #936

andrewzah · 2017-12-06T00:38:05Z

Korean characters are made up of at least two graphemes, max 3-4. Normal input waits for a space or the maximum graphemes before moving onto the next character.

Intended behavior: Typing ㄲ ㅗ ㄱ should produce 꼭, ㄷ ㅏ ㄹ ㄱ should produce 닭, etc.

Alacritty behavior: Typing ㄲ ㅗ ㄱ stays as the individual graphemes. It doesn't appear to enter the mode of character composition.

System: OSX
Tested with iTerm2, no issues there on my setup.

andrewzah · 2017-12-06T00:41:33Z

I asked on #rust and was pointed to these libraries:

Edit, it looks like these are good for displaying once you have the characters. Not so much for the actual input.

jwilm · 2017-12-06T03:45:22Z

Thanks for the issue! This may be an issue in winit. When you are composing a character, do the individual characters normally show up until the grapheme is finished and then disappear?

Regarding input, unicode segmentation is relevant there, and so is unicode-width. harfbuzz is certainly a render-time tool though. And actually, we don't properly support multi-codepoint grapheme clusters yet; cells are limited to single code points for now.

andrewzah · 2017-12-06T04:05:45Z

Here is a gif:

Essentially it builds in place until you press space/tab/enter/any non-hangul character, or reach the hangul limit (3 or 4 depending).
At least for Korean. I can't speak for Mandarin or Arabic which probably work differently.

jwilm · 2017-12-06T04:08:10Z

And it works similarly in iTerm?

andrewzah · 2017-12-06T04:10:26Z

Yep! iTerm2 and Terminal.app work the same as that gif, and have no problem switching input sources.

jwilm · 2017-12-06T04:42:30Z

In that case, this sounds like a duplicate of #306. Generally speaking, full unicode support is a bit tricky since it requires changes to how cell data is stored in the grid. Our current implementation only supports having a single code point per cell (a Rust char type) and not arbitrary UTF-8 sequences. Supporting arbitrary UTF-8 requires running unicode-segmentation on the input we get, storing this efficiently for each cell, and then using harfbuzz to shape text when rendering.

What's really needed to get this moving forward is someone who is familiar with these problems to propose an implementation for Alacritty. The really hard part is doing all of this and not leaving performance on the table.

mbrubeck · 2017-12-06T18:32:09Z

If I understand correctly, aside from certain archaic glyphs, Korean text may use precomposed Hangul Syllables code points, which use one code point per grapheme. Therefore, multi-code-point grapheme clusters are not required to display Korean text, though they might be used during input.

Decomposed (multi-code-point) syllables can be converted to their precomposed (single-code-point) equivalents using the nfc method in the unicode-normalization crate. Perhaps this normalization can be applied at some point within alacritty? Or maybe there is a way to let the operating system's input method handle this; I'm not sure.

jwilm · 2017-12-06T19:33:54Z

@mbrubeck that's really helpful! Given that it composes into single codepoints per grapheme, we may be able to support this without major architectural changes.

On Linux, this is handled out-of-band with IMEs, but I don't think there's any equivalent for macOS.

andrewzah changed the title ~~Hangul / Multi glyph input is broken~~ Hangul / Multi grapheme input is broken Dec 6, 2017

jwilm added the H - macos label Dec 6, 2017

jwilm added S - unicode and removed H - macos labels Dec 6, 2017

jwilm closed this as completed Dec 6, 2017

jwilm added the F - duplicate label Dec 6, 2017

ycy3723 mentioned this issue Sep 11, 2023

Character separation issue when typing in Korean neovide/neovide#2014

Open

fee1-dead mentioned this issue Dec 19, 2023

Support for Font Ligatures using harfbuzz #5696

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hangul / Multi grapheme input is broken #936

Hangul / Multi grapheme input is broken #936

andrewzah commented Dec 6, 2017 •

edited

andrewzah commented Dec 6, 2017 •

edited

jwilm commented Dec 6, 2017

andrewzah commented Dec 6, 2017 •

edited

jwilm commented Dec 6, 2017

andrewzah commented Dec 6, 2017

jwilm commented Dec 6, 2017

mbrubeck commented Dec 6, 2017

jwilm commented Dec 6, 2017

Hangul / Multi grapheme input is broken #936

Hangul / Multi grapheme input is broken #936

Comments

andrewzah commented Dec 6, 2017 • edited

andrewzah commented Dec 6, 2017 • edited

jwilm commented Dec 6, 2017

andrewzah commented Dec 6, 2017 • edited

jwilm commented Dec 6, 2017

andrewzah commented Dec 6, 2017

jwilm commented Dec 6, 2017

mbrubeck commented Dec 6, 2017

jwilm commented Dec 6, 2017

andrewzah commented Dec 6, 2017 •

edited

andrewzah commented Dec 6, 2017 •

edited

andrewzah commented Dec 6, 2017 •

edited