Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getopts should use grapheme clusters for text alignment #7

Closed
steveklabnik opened this issue Jan 21, 2015 · 9 comments
Closed

getopts should use grapheme clusters for text alignment #7

steveklabnik opened this issue Jan 21, 2015 · 9 comments

Comments

@steveklabnik
Copy link
Member

Issue by Kimundi
Saturday Mar 23, 2013 at 13:52 GMT

For earlier discussion, see rust-lang/rust#5516

This issue was labelled with: A-libs, E-easy in the Rust repository


It should at minimum align to number of codepoints (characters), (and now that PR #8710 has landed, it should be doing that minimum), but the only correct thing would be to align to grapheme clusters.

test case

extern mod extra;
use extra::getopts::groups;

fn main() {
    let optgroups = ~[
        groups::optflag("a", "apple",        "apple description"),
        groups::optflag("b", "banana\u00AB", "banana description"),
        groups::optflag("c", "br\xfbl\xe9e", "br\xfbl\xe9e quite long description"),
        groups::optflag("k", "kiwi\u20AC",   "kiwi description"),
        groups::optflag("o", "orange\u2039", "orange description"),
        groups::optflag("r",
                        "raspberry-but-making-this-option-way-too-long",
                        "raspberry description is also quite long indeed longer than every other piece of text we might encounter here and thus will be automatically broken up"),
    ];
    println(usage("Usage: fruits", optgroups));
}

current output:

% rustc /tmp/issue5516.rs && /tmp/issue5516 --help
warning: no debug symbols in executable (-arch x86_64)
Usage: fruits

Options:
    -a --apple          apple description
    -b --banana«        banana description
    -c --brûlée         brûlée quite long descripti
    -k --kiwi€          kiwi description
    -o --orange‹        orange description
    -r --raspberry-but-making-this-option-way-too-long 
                        raspberry description is also quite long indeed longer
                        than every other piece of text we might encounter here
                        and thus will be automatically broken up


Note that in addition to things failing to line up, it is also cutting off the end of the text for the brûlée case. That issue should also be fixed.

@SimonSapin
Copy link
Contributor

Perhaps StrExt::width is more relevant here than grapheme clusters.

@tromey
Copy link
Contributor

tromey commented Jun 18, 2015

The original example here looks correct. However I was able to see the bug by using a different character in one of the options, in my case \u{3042}.

Also:

Note that in addition to things failing to line up, it is also cutting off the end of the text for the brûlée case. That issue should also be fixed.

This doesn't seem to happen any more.

@prataprc
Copy link
Contributor

This is interesting issue. Never thought about grapheme length in rendering context. Is it possible to solve this issue in a clean way for terminal emulators ?

@SimonSapin
Copy link
Contributor

Perhaps StrExt::width is more relevant here than grapheme clusters.

This is now https://crates.io/crates/unicode-width

@prataprc
Copy link
Contributor

@SimonSapin Looks like there are two versions to compute the display width. width() and width_cjk() how to know which context to use ? Is this related to localisation ?

@SimonSapin
Copy link
Contributor

Many CJK characters are typically two columns wide in monospace fonts. The crate above will tell you, based on the Unicode character database. But some characters have an "ambiguous width": they make take 2 or 1 column depending on whether you’re in a "CJK context", which is not all that well defined. In the Windows console, it appears to depend on the Windows locale. The difference between these two methods is how ambiguous characters are counted. https://www.unicode.org/reports/tr11/ has all of the background.

But the actual rendering will depend on a bunch of stuff including which fonts are installed. Still, just using width() and ignoring the locale is already a better approximation than .chars().count().

@prataprc
Copy link
Contributor

One additional note, In the PR #71 I am using crate attribute: #![feature(rustc_private)] Got some strange error

use of unstable library feature 'rustc_private': this crate is being loaded from the sysroot, an unstable location; did you mean to load this crate from crates.io via `Cargo.toml` instead? (see issue #27812)

What does this error message mean ? Should we use the rustc_private feature ?

Thanks,

@SimonSapin
Copy link
Contributor

No, you should probably not use unstable features in this crate since that would prevent using it with Rust Stable, whereas previous versions can.

KodrAus added a commit that referenced this issue Jun 28, 2018
use unicode-width instead of len() or grapheme cluster #7.
@KodrAus
Copy link
Contributor

KodrAus commented Oct 15, 2018

This was implemented in #71 and released as 0.2.18.

@KodrAus KodrAus closed this as completed Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants