Doesn't respect Unicode character width; all characters take up one cell #265

rspeer · 2017-01-09T21:21:51Z

In Unicode, "monospaced" characters are meant to take up 0, 1, or 2 character cells based on what kind of character they are. Combining characters take up 0 cells because they stack on top of the previous character. CJK characters (except for the ones designated as "halfwidth") take up 2 cells, because that's how the original CJK terminal displays were designed.

In alacritty, all characters take up 1 cell, even the ones that aren't supposed to, leading to display problems. This leads to incorrect scrolling and wrapping in tmux, for example.

To demonstrate the problem, I defined this string using python3:

text = 'u\N{COMBINING DIAERESIS}' * 200

Without Python, you could just try pasting in this string, I believe (but don't use the pre-combined character ü):

üüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüü

My terminal is 240 characters wide, so when I print(text), the result should fit on one line for me. Instead, not only does it wrap, but the unexpected wrapping causes tmux to glitch, so the entire window gets filled with columns of un-combined u and ¨ characters.

The Japanese word ありがとう should take up 10 character cells. Instead, it takes up 5 character cells, with the characters overlapping each other.

This is not a font issue, it's an issue with the actual behavior of the terminal. The lack of fallback fonts just makes the issue harder to see.

C code uses the wcwidth(3) function to determine how wide a character is. This function seems to have at one point been in the Rust standard library, then moved out. http://unicode-rs.github.io/unicode-width/unicode_width/index.html is a crate that seems to provide it.

The text was updated successfully, but these errors were encountered:

jwilm · 2017-01-09T21:46:20Z

Thank you for the thorough bug report! This is an essential feature that needs to be supported.

medwards · 2017-01-11T13:18:09Z

Is this related to trying to use certain fonts too? For example I'm trying to replicate my gnome-terminal settings and setting the font to "Ubuntu Mono Regular" but all characters take up two cells for some reason.

jwilm · 2017-01-11T16:56:44Z

@medwards that sounds like it's an issue with how we use font metrics. The glyphs likely don't occupy two cells, it's just that single cells are calculated to be far too wide.

medwards · 2017-01-11T17:02:07Z

OK, let me know if that is in another issue or if I should open a new bug for it. Thanks for the fast reply :)

jwilm · 2017-01-11T17:06:43Z

I think #83 covers it.

joshuarubin · 2017-01-11T22:02:00Z

When you do implement character width, please use the unicode9 width tables (not wcwidth or another library). I have a simple header only file (https://github.com/joshuarubin/wcwidth9) that can be used for the character width calculations.

You may want to optionally enable pre-unicode9 widths as a configuration option (or vice versa, iTerm.app nightlies has this option), but the repo I listed will give correct unicode widths for all characters including east asian and emoji.

Many terminal apps are unicode9 ready including tmux, vim and neovim.

jwilm · 2017-01-11T22:05:28Z

Thanks @joshuarubin.

It looks like Rust's unicode-width crate is up-to-date with Unicode 9, so we are set on that front.

You may want to optionally enable pre-unicode9 widths as a configuration option

Is there a lot of demand for this?

joshuarubin · 2017-01-11T22:11:53Z

Based on the size of the tables in Rust's unicode-width, it definitely seems like there are things missing. Also, an issue there states that it doesn't calculate emoji width (and doesn't seem to think it should). There are additional issues of how to handle east asian "ambiguous" width characters (iTerm defaults to 1, but has a config option, discouraged, to change to 2) and "private use area" characters.

My library (for C, don't know much about rust [...yet]) lets the user decide on a char by char basis how to handle things.

Its return values:

1 or 2: the width of the character
-1: non-printable, combining or unassigned character
-2: ambiguous width character
-3: private-use character

joshuarubin · 2017-01-11T22:15:27Z

Is there a lot of demand for [pre-unicode9 widths]?

Dunno. I've been fighting for unicode9 widths for months across a bunch of projects. Any change causes users to complain. As this is a new project, maybe just include unicode9 and see how it works for people?

jwilm · 2017-07-01T23:43:57Z

We handle double width characters now. Lots of work for full unicode support.

Closing in favor of #306. Please subscribe there if you want to follow development on this subject.

According to alacritty/alacritty#265 (comment) , unicode in monospace can only have width 0, 1, or 2, so, just accounting for width 2 (in addition to what we already do) should be enough.

rspeer changed the title ~~Doesn't respect unicode::char::width; all characters take up one cell~~ Doesn't respect Unicode character width; all characters take up one cell Jan 9, 2017

jwilm added this to the Version 1.0 milestone Jan 9, 2017

jwilm added the B - bug label Jan 9, 2017

rspeer mentioned this issue Jan 10, 2017

Thai text displayed incorrectly #108

Closed

kazimuth mentioned this issue Jan 13, 2017

Full unicode support #306

Closed

schachmat mentioned this issue Mar 6, 2017

Incorrect table alignment under some terminal emulators due to double-width characters schachmat/wego#111

Closed

jwilm closed this as completed Jul 1, 2017

unphased mentioned this issue Jan 8, 2024

"󰈙" character is an example of one that my tmux built from source gets garbled rendering tmux/tmux#3799

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't respect Unicode character width; all characters take up one cell #265

Doesn't respect Unicode character width; all characters take up one cell #265

rspeer commented Jan 9, 2017 •

edited

jwilm commented Jan 9, 2017

medwards commented Jan 11, 2017

jwilm commented Jan 11, 2017

medwards commented Jan 11, 2017

jwilm commented Jan 11, 2017

joshuarubin commented Jan 11, 2017

jwilm commented Jan 11, 2017

joshuarubin commented Jan 11, 2017

joshuarubin commented Jan 11, 2017 •

edited

jwilm commented Jul 1, 2017

Doesn't respect Unicode character width; all characters take up one cell #265

Doesn't respect Unicode character width; all characters take up one cell #265

Comments

rspeer commented Jan 9, 2017 • edited

jwilm commented Jan 9, 2017

medwards commented Jan 11, 2017

jwilm commented Jan 11, 2017

medwards commented Jan 11, 2017

jwilm commented Jan 11, 2017

joshuarubin commented Jan 11, 2017

jwilm commented Jan 11, 2017

joshuarubin commented Jan 11, 2017

joshuarubin commented Jan 11, 2017 • edited

jwilm commented Jul 1, 2017

rspeer commented Jan 9, 2017 •

edited

joshuarubin commented Jan 11, 2017 •

edited