Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangul / Multi grapheme input is broken #936

Closed
andrewzah opened this issue Dec 6, 2017 · 8 comments
Closed

Hangul / Multi grapheme input is broken #936

andrewzah opened this issue Dec 6, 2017 · 8 comments

Comments

@andrewzah
Copy link

andrewzah commented Dec 6, 2017

Korean characters are made up of at least two graphemes, max 3-4. Normal input waits for a space or the maximum graphemes before moving onto the next character.

Intended behavior: Typing ㄲ ㅗ ㄱ should produce 꼭, ㄷ ㅏ ㄹ ㄱ should produce 닭, etc.

Alacritty behavior: Typing ㄲ ㅗ ㄱ stays as the individual graphemes. It doesn't appear to enter the mode of character composition.

System: OSX
Tested with iTerm2, no issues there on my setup.

@andrewzah andrewzah changed the title Hangul / Multi glyph input is broken Hangul / Multi grapheme input is broken Dec 6, 2017
@andrewzah
Copy link
Author

andrewzah commented Dec 6, 2017

I asked on #rust and was pointed to these libraries:

Edit, it looks like these are good for displaying once you have the characters. Not so much for the actual input.

@jwilm
Copy link
Contributor

jwilm commented Dec 6, 2017

Thanks for the issue! This may be an issue in winit. When you are composing a character, do the individual characters normally show up until the grapheme is finished and then disappear?

Regarding input, unicode segmentation is relevant there, and so is unicode-width. harfbuzz is certainly a render-time tool though. And actually, we don't properly support multi-codepoint grapheme clusters yet; cells are limited to single code points for now.

@jwilm jwilm added the H - macos label Dec 6, 2017
@andrewzah
Copy link
Author

andrewzah commented Dec 6, 2017

Here is a gif: Gif

Essentially it builds in place until you press space/tab/enter/any non-hangul character, or reach the hangul limit (3 or 4 depending).
At least for Korean. I can't speak for Mandarin or Arabic which probably work differently.

@jwilm
Copy link
Contributor

jwilm commented Dec 6, 2017

And it works similarly in iTerm?

@andrewzah
Copy link
Author

Yep! iTerm2 and Terminal.app work the same as that gif, and have no problem switching input sources.

@jwilm
Copy link
Contributor

jwilm commented Dec 6, 2017

In that case, this sounds like a duplicate of #306. Generally speaking, full unicode support is a bit tricky since it requires changes to how cell data is stored in the grid. Our current implementation only supports having a single code point per cell (a Rust char type) and not arbitrary UTF-8 sequences. Supporting arbitrary UTF-8 requires running unicode-segmentation on the input we get, storing this efficiently for each cell, and then using harfbuzz to shape text when rendering.

What's really needed to get this moving forward is someone who is familiar with these problems to propose an implementation for Alacritty. The really hard part is doing all of this and not leaving performance on the table.

@mbrubeck
Copy link

mbrubeck commented Dec 6, 2017

If I understand correctly, aside from certain archaic glyphs, Korean text may use precomposed Hangul Syllables code points, which use one code point per grapheme. Therefore, multi-code-point grapheme clusters are not required to display Korean text, though they might be used during input.

Decomposed (multi-code-point) syllables can be converted to their precomposed (single-code-point) equivalents using the nfc method in the unicode-normalization crate. Perhaps this normalization can be applied at some point within alacritty? Or maybe there is a way to let the operating system's input method handle this; I'm not sure.

@jwilm
Copy link
Contributor

jwilm commented Dec 6, 2017

@mbrubeck that's really helpful! Given that it composes into single codepoints per grapheme, we may be able to support this without major architectural changes.

On Linux, this is handled out-of-band with IMEs, but I don't think there's any equivalent for macOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants