Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define width? #43

Open
ghostsquad opened this issue Jun 29, 2020 · 7 comments
Open

Define width? #43

ghostsquad opened this issue Jun 29, 2020 · 7 comments

Comments

@ghostsquad
Copy link

This is a question about how you are defining "width"? I'm mostly looking for a solution that gives me character width in monospaced fonts. So example in #39 and #36, the "width" would still be 2 as a flag although is considered 1 character in modern renders, it still takes up the space of 2 normal characters.

@dolmen
Copy link

dolmen commented Jul 8, 2020

@ghostsquad rune has a clear definition in the Go specification: an integer value identifying a Unicode code point.

The doc for RuneWidth gives another hint: it points to https://www.unicode.org/reports/tr11/ which talks about cells.

Instead flag emojis are made of 2 runes/codepoints.

So this package is more about East Asian characters, not emojis.

@dolmen
Copy link

dolmen commented Jul 11, 2020

@ghostsquad uniseg.GraphemeClusterCount might interest you: it will tell you how multiple runes combine for a single grapheme. But that's not a complete solution to you problem (I suppose rendering in a terminal emulator): it will not tell you how much space is used to render that grapheme in a monospace font (especially as "monospace font" and "modern renders" are fuzzy).

@mattn
Copy link
Owner

mattn commented Jul 11, 2020

@dolmen There is already plan to use it.

See #29

@ghostsquad
Copy link
Author

@dolmen yep I already looked at uniseg, and it doesn't provide the right information

@ghostsquad
Copy link
Author

You can kinda see some of the problems I'm trying to solve... it seems not even all monospaced fonts are made equally. From the github code view, you can see that the right padding misaligns the text. But from the screenshot (of my terminal, using Fira Mono for Powerline), the right padding is needed.

❯ ./test
     rune width: 2
     rune count: 1
            len: 4
    grapheme ct: 1
   req left pad: 3
  req right pad: 0
[  🔄 AAA]
     rune width: 2
     rune count: 2
            len: 8
    grapheme ct: 1
   req left pad: 4
  req right pad: 1
[  🇧🇾  BBB]
     rune width: 2
     rune count: 2
            len: 6
    grapheme ct: 1
   req left pad: 4
  req right pad: 1
[  ℹ️  CCC]
     rune width: 1
     rune count: 1
            len: 3
    grapheme ct: 1
   req left pad: 4
  req right pad: 0
[   • DDD]

[  🔄 AAA]
[  🇧🇾  BBB]
[  ℹ️  CCC]
[   • DDD]

image

package main

import (
	"fmt"
	"unicode/utf8"

	"github.com/mattn/go-runewidth"
	"github.com/rivo/uniseg"
)

func main() {
	fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("🔄"))
	fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("🔄"))
	fmt.Printf("%15s: %d\n", "len", len("🔄"))
	fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("🔄"))
	fmt.Printf("%15s: %d\n", "req left pad", 3)
	fmt.Printf("%15s: %d\n", "req right pad", 0)
	fmt.Printf("[%*s", 3, "🔄")
	fmt.Printf(" AAA]\n")

	fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("🇧🇾"))
	fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("🇧🇾"))
	fmt.Printf("%15s: %d\n", "len", len("🇧🇾"))
	fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("🇧🇾"))
	fmt.Printf("%15s: %d\n", "req left pad", 4)
	fmt.Printf("%15s: %d\n", "req right pad", 1)
	fmt.Printf("[%*s", 4, "🇧🇾")
	fmt.Printf("  BBB]\n")

	fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("ℹ️"))
	fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("ℹ️"))
	fmt.Printf("%15s: %d\n", "len", len("ℹ️"))
	fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("ℹ️"))
	fmt.Printf("%15s: %d\n", "req left pad", 4)
	fmt.Printf("%15s: %d\n", "req right pad", 1)
	fmt.Printf("[%*s", 4, "ℹ️")
	fmt.Printf("  CCC]\n")

	fmt.Printf("%15s: %d\n", "rune width", runewidth.StringWidth("•"))
	fmt.Printf("%15s: %d\n", "rune count", utf8.RuneCountInString("•"))
	fmt.Printf("%15s: %d\n", "len", len("•"))
	fmt.Printf("%15s: %d\n", "grapheme ct", uniseg.GraphemeClusterCount("•"))
	fmt.Printf("%15s: %d\n", "req left pad", 4)
	fmt.Printf("%15s: %d\n", "req right pad", 0)
	fmt.Printf("[%*s", 4, "•")
	fmt.Printf(" DDD]\n")

	fmt.Println()

	fmt.Printf("[%*s AAA]\n", 3, "🔄")
	fmt.Printf("[%*s  BBB]\n", 4, "🇧🇾")
	fmt.Printf("[%*s  CCC]\n", 4, "ℹ️")
	fmt.Printf("[%*s DDD]\n", 4, "•")
}

@ghostsquad
Copy link
Author

well, I might have landed on something interesting...

package main

import (
	"fmt"
	"strings"
	// "unicode/utf8"

	"github.com/mattn/go-runewidth"
)

// aligns to 5 characters
func valuePaddingPredictor(val string) string {
	runeWidth := runewidth.StringWidth(val)
	// runeCount := utf8.RuneCountInString(val)
	stringLen := len(val)

	leftPad := 3
	rightPad := 1
	if runeWidth == 1 {
		leftPad++
	}

	if stringLen > 4 {
		leftPad++
		rightPad++
	}

	return fmt.Sprintf("[%*s%sAAA]", leftPad, val, strings.Repeat(" ", rightPad))
}

func main() {
	characters := []string{
		"🔄",
		"🇧🇾",
		"ℹ️",
		"💩",
		"x",
		"😀",
		"💚",
		"☁️",
		"•",
		"⨯",
		"✔️",
		"✓",
		"؏",
		"├",
		"⻨",
	}

	for _, c := range characters {
		fmt.Println(valuePaddingPredictor(c))
	}
}
[  🔄 AAA]
[  🇧🇾  AAA]
[  ℹ️  AAA]
[  💩 AAA]
[   x AAA]
[  😀 AAA]
[  💚 AAA]
[  ☁️  AAA]
[   • AAA]
[   ⨯ AAA]
[  ✔️  AAA]
[   ✓ AAA]
[   ؏ AAA]
[   ├ AAA]
[  ⻨ AAA]

image

this is probably good enough for what I need.

@jquast
Copy link

jquast commented Dec 17, 2023

Hello,

I maintain the python wcwidth library, and I recently wrote a specification that is of interest to this specific issue. I also wrote an automatic testing tool to asses any individual terminal emulator's compliance to the specification for Wide, Zero, ZWJ, and Emoji VS-16 character sequences.

I wrote an overview here https://www.jeffquast.com/post/ucs-detect-test-results/

I just want to point out, most especially, the automatic test results for 20+ terminals, that indeed you will find varying levels of unicode version and feature support across terminals, so it is important to keep that in mind when trying to validate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants