Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process character-width specifiers in text #512

Open
samliddicott opened this issue Jun 13, 2023 · 1 comment
Open

Process character-width specifiers in text #512

samliddicott opened this issue Jun 13, 2023 · 1 comment

Comments

@samliddicott
Copy link

Control of character width may be specifiable with CSI escape sequences, and I note that these sequences are terminated by NL among others. See #511

This is mentioned in https://www.cl.cam.ac.uk/~mgk25/ucs/scw-proposal.html

Set Character Width proposal (version 3)
by Markus Kuhn

This proposal adds a new control sequence to those defined in the ISO 6429 (= ECMA-48) standard, to allow applications to specify exactly, which ISO 10646 character sequences shall be displayed as non-spacing, single-width or double-width characters or ligatures (as needed for ideographic languages).

@mgeisler
Copy link
Owner

Control of character width may be specifiable with CSI escape sequences, and I note that these sequences are terminated by NL among others. See #511

This is mentioned in https://www.cl.cam.ac.uk/~mgk25/ucs/scw-proposal.html

Set Character Width proposal (version 3) by Markus Kuhn

This proposal adds a new control sequence to those defined in the ISO 6429 (= ECMA-48) standard, to allow applications to specify exactly, which ISO 10646 character sequences shall be displayed as non-spacing, single-width or double-width characters or ligatures (as needed for ideographic languages).

Thanks for linking to this proposal, which I had not heard about.

Right now, Textwrap simply ignores the CSI color sequences. More precisely, if it finds the two bytes in

/// The CSI or “Control Sequence Introducer” introduces an ANSI escape
/// sequence. This is typically used for colored text and will be
/// ignored when computing the text width.
const CSI: (char, char) = ('\x1b', '[');

then it will ignore all characters until it sees a character in this range:

/// The final bytes of an ANSI escape sequence must be in this range.
const ANSI_FINAL_BYTE: std::ops::RangeInclusive<char> = '\x40'..='\x7e';

I might have gotten those ranges from Wikipedia, I'm not sure any longer.

Processing the new proposed control sequences can be done today by providing your own custom implementation of the Fragment trait. This trait is used for all wrapping computations: it tells the library the size of a single unbreakable block of text. In particular, you can use the wrap_optimal_fit algorithm with your own custom fragment and get beautifully wrapped lines of text.

Does that help with your use-case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants