Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded rivo/uniseg to latest version, switched StringWidth/Truncate to speedier version #63

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

rivo
Copy link

@rivo rivo commented Jul 27, 2022

The rivo/uniseg package has received a major update which also includes methods for grapheme cluster parsing that are much faster than the previously used Graphemes class.

I've upgraded your package accordingly and updated the relevant code to use these faster methods. It would be great if you could merge these changes.

Thank you!

ps. I noticed that some automatic checks did not complete successfully because they are still running on Go 1.15. Would you like me to look into upgrading them to the current version (1.18)?

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2022

Codecov Report

Merging #63 (0a21090) into master (f9d5553) will increase coverage by 0.25%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   93.56%   93.82%   +0.25%     
==========================================
  Files           3        3              
  Lines         171      178       +7     
==========================================
+ Hits          160      167       +7     
  Misses          6        6              
  Partials        5        5              
Impacted Files Coverage Δ
runewidth.go 95.10% <100.00%> (+0.25%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@rivo
Copy link
Author

rivo commented Sep 11, 2022

I upgraded again to the latest uniseg version. Also implemented the "Quick Fix" solution outlined in #59 (comment).

Also updated CI to use current Go versions. Everything passes now.

@@ -166,6 +167,16 @@ func emoji(out io.Writer, in io.Reader) error {
})
}

// We also want regional indicator symbols (flags) to be part of the Emoji
// table. They are U+1F1E6..U+1F1FF.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add link to documentation of the specification?

Copy link
Author

@rivo rivo Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://www.unicode.org/reports/tr51/#def_emoji_flag_sequence documents how flags are constructed using Regional Indicator code points.

Regional Indicator code points are not classified as Extended_Pictographic so they don't show up in your emoji table. But for the sake of calculating the width, they behave the same as other emojis. So the simplest solution is to add them to your emoji table. Alternatively, you could add the detection of Regional Indicators to all other parts of your code. That would be overkill, in my opinion, but it's up to you.

In any case, you'll want StringLength("🇯🇵") == 2. That's what this is for.

You'll find them in the same file (look for "Regional Indicator"):

https://unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

go.mod Outdated
@@ -1,5 +1,5 @@
module github.com/mattn/go-runewidth

go 1.9
go 1.18
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to keep go1.16 but do you have something problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, rivo/uniseg uses generics and the new build tag syntax, both of which were introduced with Go 1.18.

I could probably downgrade it in go-runewidth to a previous version but that old version was much slower than the new version. If I do, Go 1.16 will work.

Let me know how you'd like to proceed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of uniseg is it possible to build with go-runewidth with 1.16?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v0.3.4

But if you use v0.3.4, you also need to make adjustments to your code.

Here's my suggestion: I prepare a second PR with the same output as this one, but based on uniseg v.0.3.4 and the older Go version. We can leave this PR (#63) open and you can merge it once you're ready to switch to Go 1.18.

What do you think of this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to separate files for runewidth_go118.go and runewidth_go117.go ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just submitted a commit that does this but it gives me merge conflicts. It looks like you made/accepted other changes in the meantime. I'm not able to resolve these conflicts, only you can.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was able to resolve the merge conflict. Please have a look.

@rivo
Copy link
Author

rivo commented Sep 18, 2022

Do you still need information or any action from me here?

@derekperkins
Copy link

👍 to getting this merged

@derekperkins
Copy link

@mattn is there anything blocking this from merging and cutting a new release?

@junegunn
Copy link

junegunn commented Jan 4, 2024

@rivo Thanks for the patch. I can confirm that this fixes a display issue of my program.

cl, s, _, state = uniseg.FirstGraphemeClusterInString(s, state)
for index, r := range cl {
if index == 0 && inTable(r, emoji) {
chWidth = 2 // Not the optimal solution but it will work in most cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this incorrectly reports the width of some characters, such as (0x25b6), which is in both the emoji and ambiguous tables.

Related: junegunn/fzf#3588

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out that emojis can have multiple representations. There is a default for each emoji and your example or, say, ☺, defaults to "text presentation". And there are code points that can be added to force a specific presentation. To make things worse, many systems don't respect the Unicode specification and then there is the question of whether fonts will support the different representations.

go-runewidth does not go into this detail. This is also why the comment here says "in most cases". It's a simple approximation. If you want this emoji presentation flag to be considered, you can also use uniseg directly which has had string width calculation for about two years now. uniseg.StringWidth("\u25b6") will report as "1".

From what I can see, uniseg will be much more accurate than go-runewidth. However, go-runewidth is mostly in line with most terminals which also tend to use the simple wcwidth functionality. Depending on your application, there may be some benefit to "making the same mistakes" as the environment it is run in. For example, iTerm2 on macOS which is a very popular terminal application, does not render the rainbow flag correctly:

image

This flag obviously has a width of 2 but iTerm2 assigns a width of 1. If you need your application to be in line with iTerm2, it may be better to make that same mistake. go-runewidth is probably your best bet in that case. (Although I don't know what it reports for this specific example, it might not have the same issue as iTerm2.) But if you need something that is accurate, no matter what, I would suggest using uniseg to calculate rune/string widths.

As a final note, VS Code renders the flag correctly:

image

There is no official spec for these widths. Therefore, everyone rolls their own implementation. It's a mess.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment. Really appreciated. I'll see if I can use uniseg.StringWidth instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants