Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are control characters treated as zero-width for strings? #6

Open
typesanitizer opened this issue Oct 11, 2018 · 1 comment
Open

Comments

@typesanitizer
Copy link

typesanitizer commented Oct 11, 2018

From the docs:

fn width<'a>(&'a self) -> usize
Returns the string's displayed width in columns.
Control characters are treated as having zero width.

(Ignore '\0' for the points below as it has special treatment.)

This seems inconsistent with the behaviour for individual chars, where None is returned in case you have a control character. For consistency, I would expect (A) for a string, if any character has a width of None, the result should have width None XOR (B) control characters always have width Some(0).

IIUC, the second option hasn't been taken for consistency with wcwidth, which returns -1 for control characters. However, not taking the first option can lead to non-intuitive behaviour that can go by unnoticed.
E.g. if the code has LF/TAB/DEL in it, then you can get an answer that doesn't make much sense.

Moreover, this violates an embedding law that one might expect to hold: width(format!("{}", c)) == width(c) (because it doesn't even type-check).

What is the reasoning behind the current behaviour?

P.S. I'm not asking for the library's behaviour to be changed. I'm writing a Haskell implementation and ran into this while looking at the test cases. My library follows (A) because it seemed like the right choice, so I wanted to know why you didn't pick (A).

@jquast
Copy link

jquast commented Dec 17, 2023

What is the reasoning behind the current behaviour?

I may be able to answer this for you. From jquast/wcwidth#54 (comment)

I just want to also add that this cannot be fixed in the wcwidth() and wcswidth() functions, as they intend to exactly match function signature and behavior of the POSIX functions.

The reason that C0 and C1 control characters return -1, is that the intended application, a terminal emulator especially, should handle these characters in a stream and remove them from the string before passing on to wcswidth. Especially items like \n, \b, and \t. They become complicated, it depends on the current position of the cursor, and also terminal settings, for example \b can wrap to previous row if it is located at column 0, and the number of spaces incurred by '\t' are dependent on the tab stop setting and the current cursor position. C1 characters like '\x1b' may begin a terminal escape sequence, and that too should be processed before sending to wcswidth, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants