Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full unicode support #306

Closed
kazimuth opened this issue Jan 13, 2017 · 34 comments
Closed

Full unicode support #306

kazimuth opened this issue Jan 13, 2017 · 34 comments

Comments

@kazimuth
Copy link

kazimuth commented Jan 13, 2017

TL;DR: Supporting unicode is hard, and it might be a good idea to use a library that knows how to do it well. Maybe look into using harfbuzz as well as freetype for font rendering. Unicode-width is good but doesn't do everything.

The problem of translating sequences of unicode codepoints to actual you-can-draw-this-on-screen glyphs, supporting things like character width (#265), ligatures (#50), bidirectional text (like in arabic), and text reordering (!), is called complex text layout. It is, appropriately, complex, and most terminals don't actually do it very well.

Windows and Mac both have systems that perform layout, integrated with their font rasterizers:

  • On windows, there's several different supported libraries, from various periods of windows history: DirectWrite, GDI, and Uniscribe on windows.
  • On OS X, there's Core Text.
  • On linux, there's a stack: freetype for rasterizing, harfbuzz for "text shaping", and pango for full layout. All of these libraries are actually cross-platform, though. See this article for more on the difference between pango and harfbuzz.

There's also ICU, which is cross-platform and supported by IBM (I think?)

Of the available options:

  • OS X Terminal / Iterm2 use Core Text
  • Terminator (and everything else that uses GTK) uses Pango
  • Windows CMD and Powershell use black magic and baby tears
  • Chrome uses harfbuzz
  • Firefox uses pango
  • ...

It seems to me like harfbuzz is the best option in terms of cross-platform support and level of control. You basically hand it a line of text and it tells you all the glyphs to draw in that line. Keep in mind I'm not actually a font rendering person, though, and it's possible I'm missing important details here.

This would probably have a performance cost, but I'm not sure how much of a performance cost. With a clever implementation it might not be too bad, and would be a huge boon to international users.

Other relevant links:

@kazimuth
Copy link
Author

(The challenge, of course, is that a lot of these features may interfere with the way the terminal is expected to work; and also it may be difficult to implement them in a performant way. If you have to call freetype to reraster things every frame you lose the benefits of GPU rendering. So it might be reasonable to put this off for a while.)

@khaledhosny
Copy link

You might want to check mlterm which supported bidirectional text for a long time, and it latest versions seems to have added more support for complex text using HarfBuzz as well.

@bjesus
Copy link

bjesus commented Jan 26, 2017

I just want to confirm that it's not a font issue - I'm running Alacritty on Arch Linux and all Hebrew text just shows up as boxes. I'm not talking being displayed right-to-left - as said before, many terminals just keep everything left to right - but no matter what font I choose I can't even see the letters. Is this normal or am I missing something?

screenshot from 2017-01-26 19-16-03

@jwilm
Copy link
Contributor

jwilm commented Jan 26, 2017

@bjesus Not sure why you say this isn't a font issue. That's definitely a font issue. The font alacritty is looking in for those glyphs doesn't have them.

@bjesus
Copy link

bjesus commented Jan 26, 2017

Hi! Thanks for Alacritty! I never said it wasn't font issue, I was only asking... and the reason I thought so is that I tried many different fonts that worked just fine on other terminals. Any suggestion for a font that I should try? I've tried monospace, "Droid Sans Mono", "Consolas" and even "Arial". None of them showed the Hebrew characters.

@dywedir
Copy link
Contributor

dywedir commented Jan 26, 2017

@bjesus try Cousine (from ttf-croscore)

@jwilm
Copy link
Contributor

jwilm commented Jan 26, 2017

@bjesus the reason those other fonts work is not because they have the glyphs, but because the application you're using supports fallback fonts--something we plan to add to Alacritty.

@bjesus
Copy link

bjesus commented Jan 26, 2017

Thanks @N-006 and @jwilm , it work just fine now. Sorry about the misunderstanding, I didn't think about fallback fonts.

@voxadam
Copy link

voxadam commented Dec 20, 2017

@yoshuawuyts
Copy link

yoshuawuyts commented Jan 7, 2018

For people (like me) looking on how to enable unicode on Alacritty. There is now an RFC in #957 for font configuration - including fallback fonts, which would solve most unicode issues.

Tooke me a few clicks to find out about it, figured linking back here would be useful! ✨

@horta
Copy link

horta commented Jan 30, 2019

printf '\xF0\x9D\x9C\x99' is supposed to print 𝜙, but on Alacritty it prints an empty space. Is this due to the limitations that this thread refer to? Or is it a bug?

@chrisduerr
Copy link
Member

tmp

@horta Most likely a font issue.

@horta
Copy link

horta commented Jan 30, 2019

I just removed the alacritty configuration file, restart it and tried again. Still prints a space. Interestingly, some unicode characters work:

screenshot 2019-01-30 at 23 18 13

@horta
Copy link

horta commented Jan 30, 2019

I'm using macos, so it defaults to Menlo. However, using the same Menlo font on the Terminal.app it prints correctly.

@xarthurx
Copy link

image

I'm also experiencing issues which I believe is due to unicode width.

I use a "lambda" symbol in my prompt. Two issues noticed:

  1. the cursor is too far away from the real location (in the pic, the curcor is actually right after the letters, no space in between);

  2. when using tmux to do two panes, this issue also cause the separator char mis-aligned.

@chrisduerr
Copy link
Member

@xarthurx See #1295 for tracking that issue.

@smeikx
Copy link

smeikx commented May 1, 2019

On macOS 10.14.4 using a german keyboard just typing an umlaut (like ä) produces �ä. All following umlauts look fine: �öüä.
The real trouble starts when typing sharp s (ß), that produces: ��[7m<009f>, every time. Similar problem with ohter characters like ellipsis (…), ��[7m<0080>, or em dash (—), ��[7m<0080><0094>.

@chrisduerr
Copy link
Member

Do you have the correct locale set?

@smeikx
Copy link

smeikx commented May 1, 2019

No, I didn’t know about that! iTerm seems to infer it from the system settings, as echo $LANG showed; in Alacritty that didn’t print anything.
I added export LANG="de_AT.UTF-8" to my .zshrc to resolve the problem.
Thank you and sorry for polluting the thread.

@tyru
Copy link

tyru commented Aug 11, 2019

I missed this issue. related?
#1101 (comment)
#1606 (comment)

@jackroi
Copy link

jackroi commented Aug 3, 2020

I think I have a related problem: 2-byte long characters are not properly deleted using the backspace key, in fact only one byte is deleted, leading to invalid unicode characters.

@chrisduerr
Copy link
Member

@jackroi Two byte characters, or two grapheme characters? It's most likely a problem with your shell.

@chrisduerr chrisduerr reopened this Aug 3, 2020
@jackroi
Copy link

jackroi commented Aug 3, 2020

@chrisduerr I'm not sure. Here is an example of the characters in question: à, è, ì, ò, ù, (I found out now that this is 3-byte long).
I'm on Ubuntu 19.10 and I've also tried with the default terminal (Gnome terminal), which doesn't have this problem.

@chrisduerr
Copy link
Member

That should work just fine. It's likely a problem with your shell.

@jackroi
Copy link

jackroi commented Aug 4, 2020

But if it were a problem with my shell, the other terminals would also have the same problem, right?

If someone wants to try to reproduce the problem, an easy way is as follows:

cat > TEST
helloò<BACKSPACE>
helloò<BACKSPACE><BACKSPACE>      # the terminal renders this as "hell", as it should be

Then running hexdump -C TEST prints this:

00000000  68 65 6c 6c 6f c3 0a 68  65 6c 6c 6f 0a           |hello..hello.|
0000000d

Note the c3, that's the first byte of ò and that the second "hello" has the final o, even though I pressed <BACKSPACE> two times.

@kchibisov
Copy link
Member

can't repro on my system.

@sersorrel
Copy link

there is certainly some weirdness happening if you use the U+0300 combining grave rather than the precomposed U+00F2:

$  cat >TEST  # typing "helloo", then inserting a U+0300
hello
hell
^C
$  hexdump -C TEST
00000000  68 65 6c 6c 6f 6f 0a 68  65 6c 6c 6f 0a           |helloo.hello.|
0000000d
$  cat >TEST  # typing "hello", then inserting a U+00F2
hello
hell
^C
$  hexdump -C TEST
00000000  68 65 6c 6c 6f 0a 68 65  6c 6c 0a                 |hello.hell.|
0000000b

note that they render the same (I press backspace once on the first line, twice on the second line), but produce different files.

@kchibisov
Copy link
Member

what's the output of stty?

@sersorrel
Copy link

$  stty
speed 38400 baud; line = 0;
-brkint -imaxbel iutf8

@lo48576
Copy link
Contributor

lo48576 commented Aug 4, 2020

I think this issue (306) is a sort of "meta" issue, so I think the problem is better to be tracked as a separate (new) issue.

Anyway, I can't reproduce the issue (on zsh-5.8 on alacritty-0.5.0).

@kchibisov
Copy link
Member

just fyi, we don't perform unicode normalization, so it's expected? you should write unicode-normalize string to a file if you want it to be that way. From what I can see everything is working is expected with your file test.

@kchibisov
Copy link
Member

I think this issue (306) is a sort of "meta" issue, so I think the problem is better to be tracked as a separate (new) issue.

Anyway, I can't reproduce the issue (on zsh-5.8 on alacritty-0.5.0).

I agree, I'd rather not bump this issue.

@jackroi
Copy link

jackroi commented Aug 4, 2020

I have updated alacritty to version 0.6-dev (from 0.4.3) and the problem no longer occurs. Sorry, I should have updated before asking. Thank you all.

@chrisduerr
Copy link
Member

I'm going to close this issue since it is rather unspecific. Many of the issues mentioned here already have dedicated issues like bidi (#663) or ligatures (#50).

Though generally I don't see Alacritty implementing many of these features in the future, since most of them simply come with too big of a performance impact on all users to justify adding them for the few users that actually need them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests