Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Some non-ASCII characters get stuck in the preview pane #1366

Open
mhdna opened this issue Aug 2, 2023 · 11 comments
Open

[BUG] Some non-ASCII characters get stuck in the preview pane #1366

mhdna opened this issue Aug 2, 2023 · 11 comments

Comments

@mhdna
Copy link

mhdna commented Aug 2, 2023

Hello,

Thanks for this amazing project. I use LF every day!

I think this bug is related to #807, in which some symbols get stuck in the preview pane when I open a file containing non-ASCII characters or even when previewing it sometimes.

lf-bug.webm
@noornee
Copy link

noornee commented Aug 5, 2023

hmm, i cant reproduce this bug.

what version of lf are you currently on ?
i currently use version 30 and my terminal emulator is Alacritty.

@mhdna
Copy link
Author

mhdna commented Aug 5, 2023

@noornee I'm on the latest r30 too, and I use Alacritty as well, though, this doesn't matter as I can reproduce it in other terminal emulators.

You can reproduce it by putting these few lines into a file, and making sure to move this file into an empty directory for the bug to be apparent

🤭 face with hand over mouth
🫢 face with open eyes and hand over mouth
🫣 face with peeking eye
🫡 saluting face
🤐 zipper-mouth face
🫥 dotted line face

Then run lf --config - to use the default barebone lf config, and navigate to that file, open it in vim for example then exit it.

Weirdly enough, this bug doesn't take place with all emoji characters for example, but with some specific emojis and other non-
ASCII characters, like the ones I provided above.

@noornee
Copy link

noornee commented Aug 5, 2023

You can reproduce it by putting these few lines into a file, and making sure to move this file into an empty directory for the bug to be apparent

whoaa, i just followed the steps you provided and it reproduced the bug!

Weirdly enough, this bug doesn't take place with all emoji characters for example, but with some specific emojis and other non-
ASCII characters, like the ones I provided above.

ah, true.
that was why i wasn't able to reproduce this because the bug didn't occur with the emoji characters i originally provided.

@joelim-work
Copy link
Collaborator

It looks like the issue is coming from the go-runewidth library, which is used to determine whether a given character is single or double width. Normally you would expect that each of those emojis in the example given above are double width, but it turns out that runewidth.RuneWidth returns a width of 1 for some emojis and 2 for others.

I wrote a very quick program to read in text from standard input and print the widths of each rune:

package main

import (
	"bufio"
	"fmt"
	"os"

	"github.com/mattn/go-runewidth"
)

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		for _, r := range []rune(scanner.Text()) {
			fmt.Printf("%s\t%x\t%d\n", string(r), r, runewidth.RuneWidth(r))
		}
	}
}

I only have basic knowledge of Unicode, but if anyone is able to play around with this, and also cross-reference the results with the table for double width characters (which I believe are scraped from https://unicode.org), that would be much appreciated.

@joelim-work
Copy link
Collaborator

joelim-work commented Aug 6, 2023

So I looked into this a bit more and at this point I think the issue is not to do with the lf code, but rather the tcell TUI library, which also depends on go-runewidth.

I was able to reproduce this kind of issue by writing a simple program that displays user-provided text before clearing it, and inputs like 🤭x clear properly but inputs like 🫢x don't.

package main

import (
	"os"
	"time"

	"github.com/gdamore/tcell"
	"github.com/mattn/go-runewidth"
)

func setText(screen tcell.Screen) {
	text := "hello world"
	if len(os.Args) > 1 {
		text = os.Args[1]
	}

	x := 0
	for _, c := range text {
		screen.SetContent(x, 0, c, nil, tcell.StyleDefault)
		x += runewidth.RuneWidth(c)
	}
}

func main() {
	screen, _ := tcell.NewScreen()
	screen.Init()

	screen.Clear()
	setText(screen)
	screen.Show()
	time.Sleep(time.Second)

	screen.Clear()
	screen.Show()
	time.Sleep(time.Second)

	screen.Fini()
}

By the way, the documentation for Screen.SetContent might be relevant:

// SetContent sets the contents of the given cell location. If
// the coordinates are out of range, then the operation is ignored.
//
// The first rune is the primary non-zero width rune. The array
// that follows is a possible list of combining characters to append,
// and will usually be nil (no combining characters.)
//
// The results are not displayed until Show() or Sync() is called.
//
// Note that wide (East Asian full width and emoji) runes occupy two cells,
// and attempts to place character at next cell to the right will have
// undefined effects. Wide runes that are printed in the
// last column will be replaced with a single width space on output.

@noornee
Copy link

noornee commented Aug 6, 2023

So I looked into this a bit more and at this point I think the issue is not to do with the lf code, but rather the tcell TUI library, which also depends on go-runewidth.

I was able to reproduce this kind of issue by writing a simple program that displays user-provided text before clearing it, and inputs like 🤭x clear properly but inputs like 🫢x don't.

wow, this is interesting!!
great stuff you did to narrow the issue down to tcell. ^^

i suppose we are gonna have to open an issue at tcell repo for this to be hopefully resolved

@joelim-work
Copy link
Collaborator

I would appreciate it if other people confirm this issue as well - just in case it turns out to be an issue specific to my machine.

@mhdna
Copy link
Author

mhdna commented Aug 6, 2023

I was able to reproduce this kind of issue by writing a simple program that displays user-provided text before clearing it, and inputs like 🤭x clear properly but inputs like 🫢x don't.

I can't claim to have experience with Go, if any at all, but after grabbing the dependencies and running your code with these specific inputs I can confirm that the behavior is exactly as you have described.

@joelim-work
Copy link
Collaborator

OK I figured it out, actually tcell also uses go-runewidth for calculating character widths, so the issue is really with the latter.

It turns out that the tables are generated using https://github.com/mattn/go-runewidth/blob/master/script/generate.go and uses https://unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt as the source (W means wide character). Unfortunately the spec is out of date as new characters get introduced, which results in incorrect character widths.

Someone has created a PR to fix this though by updating the version to Unicode 14.0.0: mattn/go-runewidth#56. Although the latest version is now 15.0.0, version 14.0.0 should still fix the issue, at least for these particular set of emojis.

Anyway, until that PR gets merged, there's not much more I can do here, since the issue isn't actually to do with the lf code.

@mhdna
Copy link
Author

mhdna commented Aug 7, 2023

It turns out that the tables are generated using https://github.com/mattn/go-runewidth/blob/master/script/generate.go and uses https://unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt

It's really interesting the way you were able to narrow down this issue piece by piece until you found the real reason behind it.

Someone has created a PR to fix this though by updating the version to Unicode 14.0.0: mattn/go-runewidth#56.

Though, it seems that the PR you mentioned has been there forever, and the guys behind go-runewidth for whatever reason aren't willing to merge it yet.

Hopefully, they take some action to upgrade their Unicode source soon, for other repos such as this to get resolved.

Thank you brother Joe, you've done an amazing job!

@noornee
Copy link

noornee commented Aug 7, 2023

It's really interesting the way you were able to narrow down this issue piece by piece until you found the real reason behind it.

tbh. it's amazing.

Thank you brother Joe, you've done an amazing job!

ditto!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants