Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't respect Unicode character width; all characters take up one cell #265

Closed
rspeer opened this issue Jan 9, 2017 · 10 comments
Closed
Labels
Milestone

Comments

@rspeer
Copy link

rspeer commented Jan 9, 2017

In Unicode, "monospaced" characters are meant to take up 0, 1, or 2 character cells based on what kind of character they are. Combining characters take up 0 cells because they stack on top of the previous character. CJK characters (except for the ones designated as "halfwidth") take up 2 cells, because that's how the original CJK terminal displays were designed.

In alacritty, all characters take up 1 cell, even the ones that aren't supposed to, leading to display problems. This leads to incorrect scrolling and wrapping in tmux, for example.

To demonstrate the problem, I defined this string using python3:

text = 'u\N{COMBINING DIAERESIS}' * 200

Without Python, you could just try pasting in this string, I believe (but don't use the pre-combined character ü):

üüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüüü

My terminal is 240 characters wide, so when I print(text), the result should fit on one line for me. Instead, not only does it wrap, but the unexpected wrapping causes tmux to glitch, so the entire window gets filled with columns of un-combined u and ¨ characters.

The Japanese word ありがとう should take up 10 character cells. Instead, it takes up 5 character cells, with the characters overlapping each other.

This is not a font issue, it's an issue with the actual behavior of the terminal. The lack of fallback fonts just makes the issue harder to see.

C code uses the wcwidth(3) function to determine how wide a character is. This function seems to have at one point been in the Rust standard library, then moved out. http://unicode-rs.github.io/unicode-width/unicode_width/index.html is a crate that seems to provide it.

@rspeer rspeer changed the title Doesn't respect unicode::char::width; all characters take up one cell Doesn't respect Unicode character width; all characters take up one cell Jan 9, 2017
@jwilm
Copy link
Contributor

jwilm commented Jan 9, 2017

Thank you for the thorough bug report! This is an essential feature that needs to be supported.

@jwilm jwilm added this to the Version 1.0 milestone Jan 9, 2017
@jwilm jwilm added the B - bug label Jan 9, 2017
@medwards
Copy link

Is this related to trying to use certain fonts too? For example I'm trying to replicate my gnome-terminal settings and setting the font to "Ubuntu Mono Regular" but all characters take up two cells for some reason.

@jwilm
Copy link
Contributor

jwilm commented Jan 11, 2017

@medwards that sounds like it's an issue with how we use font metrics. The glyphs likely don't occupy two cells, it's just that single cells are calculated to be far too wide.

@medwards
Copy link

OK, let me know if that is in another issue or if I should open a new bug for it. Thanks for the fast reply :)

@jwilm
Copy link
Contributor

jwilm commented Jan 11, 2017

I think #83 covers it.

@joshuarubin
Copy link

When you do implement character width, please use the unicode9 width tables (not wcwidth or another library). I have a simple header only file (https://github.com/joshuarubin/wcwidth9) that can be used for the character width calculations.

You may want to optionally enable pre-unicode9 widths as a configuration option (or vice versa, iTerm.app nightlies has this option), but the repo I listed will give correct unicode widths for all characters including east asian and emoji.

Many terminal apps are unicode9 ready including tmux, vim and neovim.

@jwilm
Copy link
Contributor

jwilm commented Jan 11, 2017

Thanks @joshuarubin.

It looks like Rust's unicode-width crate is up-to-date with Unicode 9, so we are set on that front.

You may want to optionally enable pre-unicode9 widths as a configuration option

Is there a lot of demand for this?

@joshuarubin
Copy link

Based on the size of the tables in Rust's unicode-width, it definitely seems like there are things missing. Also, an issue there states that it doesn't calculate emoji width (and doesn't seem to think it should). There are additional issues of how to handle east asian "ambiguous" width characters (iTerm defaults to 1, but has a config option, discouraged, to change to 2) and "private use area" characters.

My library (for C, don't know much about rust [...yet]) lets the user decide on a char by char basis how to handle things.

Its return values:

  • 1 or 2: the width of the character
  • -1: non-printable, combining or unassigned character
  • -2: ambiguous width character
  • -3: private-use character

@joshuarubin
Copy link

joshuarubin commented Jan 11, 2017

Is there a lot of demand for [pre-unicode9 widths]?

Dunno. I've been fighting for unicode9 widths for months across a bunch of projects. Any change causes users to complain. As this is a new project, maybe just include unicode9 and see how it works for people?

@jwilm
Copy link
Contributor

jwilm commented Jul 1, 2017

We handle double width characters now. Lots of work for full unicode support.

Closing in favor of #306. Please subscribe there if you want to follow development on this subject.

@jwilm jwilm closed this as completed Jul 1, 2017
aaruni96 added a commit to aaruni96/AbstractAlgebra.jl that referenced this issue Feb 8, 2024
According to alacritty/alacritty#265 (comment) , unicode in monospace can only have width 0, 1, or 2, so, just accounting for width 2 (in addition to what we already do) should be enough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants