Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded rivo/uniseg to latest version, switched StringWidth/Truncate to speedier version #63

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
strategy:
matrix:
os: [windows-latest, macos-latest, ubuntu-latest]
go: ["1.15", "1.16"]
go: ["1.18.x", "1.19.x"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
Expand Down
6 changes: 2 additions & 4 deletions benchmark_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,21 +113,19 @@ func benchString1Width(b *testing.B, eastAsianWidth bool, start, stop rune, want
return n
}
func BenchmarkString1WidthAll(b *testing.B) {
benchSink = benchString1Width(b, false, 0, utf8.MaxRune+1, 1295990)
benchSink = benchString1Width(b, false, 0, utf8.MaxRune+1, 1298422)
}
func BenchmarkString1Width768(b *testing.B) {
benchSink = benchString1Width(b, false, 0, 0x300, 702)
}
func BenchmarkString1WidthAllEastAsian(b *testing.B) {
benchSink = benchString1Width(b, true, 0, utf8.MaxRune+1, 1436664)
benchSink = benchString1Width(b, true, 0, utf8.MaxRune+1, 1439014)
}
func BenchmarkString1Width768EastAsian(b *testing.B) {
benchSink = benchString1Width(b, true, 0, 0x300, 794)
}

//
// tables
//
func benchTable(b *testing.B, tbl table) int {
n := 0
for i := 0; i < b.N; i++ {
Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
module github.com/mattn/go-runewidth

go 1.9
go 1.18
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to keep go1.16 but do you have something problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, rivo/uniseg uses generics and the new build tag syntax, both of which were introduced with Go 1.18.

I could probably downgrade it in go-runewidth to a previous version but that old version was much slower than the new version. If I do, Go 1.16 will work.

Let me know how you'd like to proceed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of uniseg is it possible to build with go-runewidth with 1.16?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v0.3.4

But if you use v0.3.4, you also need to make adjustments to your code.

Here's my suggestion: I prepare a second PR with the same output as this one, but based on uniseg v.0.3.4 and the older Go version. We can leave this PR (#63) open and you can merge it once you're ready to switch to Go 1.18.

What do you think of this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to separate files for runewidth_go118.go and runewidth_go117.go ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just submitted a commit that does this but it gives me merge conflicts. It looks like you made/accepted other changes in the meantime. I'm not able to resolve these conflicts, only you can.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was able to resolve the merge conflict. Please have a look.


require github.com/rivo/uniseg v0.2.0
require github.com/rivo/uniseg v0.4.2
6 changes: 4 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
github.com/rivo/uniseg v0.2.0 h1:S1pD9weZBuJdFmowNwbpi7BJ8TNftyUImj/0WQi72jY=
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.3.1 h1:SDPP7SHNl1L7KrEFCSJslJ/DM9DT02Nq2C61XrfHMmk=
github.com/rivo/uniseg v0.3.1/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/rivo/uniseg v0.4.2 h1:YwD0ulJSJytLpiaWua0sBDusfsCZohxjxzVTYjwxfV8=
github.com/rivo/uniseg v0.4.2/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
39 changes: 23 additions & 16 deletions runewidth.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,14 +174,17 @@ func (c *Condition) CreateLUT() {

// StringWidth return width as you can see
func (c *Condition) StringWidth(s string) (width int) {
g := uniseg.NewGraphemes(s)
for g.Next() {
state := -1
var cl string
for len(s) > 0 {
var chWidth int
for _, r := range g.Runes() {
chWidth = c.RuneWidth(r)
if chWidth > 0 {
break // Our best guess at this point is to use the width of the first non-zero-width rune.
cl, s, _, state = uniseg.FirstGraphemeClusterInString(s, state)
for index, r := range cl {
if index == 0 && inTable(r, emoji) {
chWidth = 2 // Not the optimal solution but it will work in most cases.
break
}
chWidth += c.RuneWidth(r)
}
width += chWidth
}
Expand All @@ -194,22 +197,26 @@ func (c *Condition) Truncate(s string, w int, tail string) string {
return s
}
w -= c.StringWidth(tail)
var width int
pos := len(s)
g := uniseg.NewGraphemes(s)
for g.Next() {
var chWidth int
for _, r := range g.Runes() {
chWidth = c.RuneWidth(r)
if chWidth > 0 {
break // See StringWidth() for details.
var width, pos int
substr, state := s, -1
for len(substr) > 0 {
var (
ch string
chWidth int
)
ch, substr, _, state = uniseg.FirstGraphemeClusterInString(substr, state)
for index, r := range ch {
if index == 0 && inTable(r, emoji) {
chWidth = 2 // Not the optimal solution but it will work in most cases.
break
}
chWidth += c.RuneWidth(r)
}
if width+chWidth > w {
pos, _ = g.Positions()
break
}
width += chWidth
pos += len(ch)
}
return s[:pos] + tail
}
Expand Down
2 changes: 1 addition & 1 deletion runewidth_table.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 11 additions & 2 deletions runewidth_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ var tables = []tableInfo{
{combining, "combining", 465, "3cce13deb5e23f9f7327f2b1ef162328285a7dcf277a98302a8f7cdd43971268"},
{doublewidth, "doublewidth", 182440, "3d16eda8650dc2c92d6318d32f0b4a74fda5a278db2d4544b1dd65863394823c"},
{ambiguous, "ambiguous", 138739, "d05e339a10f296de6547ff3d6c5aee32f627f6555477afebd4a3b7e3cf74c9e3"},
{emoji, "emoji", 3535, "9ec17351601d49c535658de8d129c1d0ccda2e620669fc39a2faaee7dedcef6d"},
{emoji, "emoji", 3561, "3f1ae3b21ead544a59ab68fe541e8b098adc4fe1ffe3389a523a01dbfd8c1d75"},
{narrow, "narrow", 111, "fa897699c5e3cd9141c638d539331b0bdd508b874e22996c5e929767d455fc5a"},
{neutral, "neutral", 27333, "5455f5e75c307f70b4e9b2384dc5a8bcd91a4c5e2b24b2b185dfad4d860ee5c2"},
}
Expand Down Expand Up @@ -380,6 +380,15 @@ func TestTruncateNoNeeded(t *testing.T) {
}
}

func TestTruncateEmoji(t *testing.T) {
s := "😂😡"
expected := "😂"

if out := Truncate(s, 2, ""); out != expected {
t.Errorf("Truncate(%q) = %q, want %q", s, out, expected)
}
}

var isneutralwidthtests = []struct {
in rune
out bool
Expand Down Expand Up @@ -462,7 +471,7 @@ func TestZeroWidthJoiner(t *testing.T) {
{"\u200d🍳", 2},
{"👨\u200d👨", 2},
{"👨\u200d👨\u200d👧", 2},
{"🏳️\u200d🌈", 1},
{"🏳️\u200d🌈", 2},
{"あ👩\u200d🍳い", 6},
{"あ\u200d🍳い", 6},
{"あ\u200dい", 4},
Expand Down
11 changes: 11 additions & 0 deletions script/generate.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
//go:build ignore
// +build ignore

// Generate runewidth_table.go from data at https://unicode.org/
Expand Down Expand Up @@ -166,6 +167,16 @@ func emoji(out io.Writer, in io.Reader) error {
})
}

// We also want regional indicator symbols (flags) to be part of the Emoji
// table. They are U+1F1E6..U+1F1FF.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add link to documentation of the specification?

Copy link
Author

@rivo rivo Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://www.unicode.org/reports/tr51/#def_emoji_flag_sequence documents how flags are constructed using Regional Indicator code points.

Regional Indicator code points are not classified as Extended_Pictographic so they don't show up in your emoji table. But for the sake of calculating the width, they behave the same as other emojis. So the simplest solution is to add them to your emoji table. Alternatively, you could add the detection of Regional Indicators to all other parts of your code. That would be overkill, in my opinion, but it's up to you.

In any case, you'll want StringLength("🇯🇵") == 2. That's what this is for.

You'll find them in the same file (look for "Regional Indicator"):

https://unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

for index, r := range arr {
if r.lo >= 0x1f1ff {
arr = append(arr[:index+1], arr[index:]...)
arr[index] = rrange{lo: 0x1f1e6, hi: 0x1f1ff}
break
}
}

shapeup(&arr)
generate(out, "emoji", arr)

Expand Down