Matrix Display output alignment is not Unicode-aware #1354

nicholas-miklaucic · 2024-01-18T04:49:05Z

Hi,

Thanks for all the wonderful work on this library. I'm coming back after a Rust hiatus and I can't believe the const generics work as well as they do.

I'm using a custom scalar type that uses a combining overline character instead of a minus sign, to match the style of this reference work:

Somewhat ironically, a typesetting convention designed to make alignment easier in print breaks alignment for Matrix output using this custom type:

  ┌             ┐
  │  0 ̅1  0  0 │
  │  0 ̅1  0  0 │
  │  ⅔  0  1  0 │
  │  0  0  0  1 │
  └             ┘

The Display implementation here uses .char().count() instead of grapheme counting, which is what causes the misalignment seen here. I imagine this would also affect less frivolous custom Display impls, like Chinese characters.

unicode-segmentation is a large dependency to make a tiny edge case look nicer, so I'm not exactly saying this should be implemented differently, but I'd be interested if anyone else has had this issue or if there's a solution that doesn't require increasing the code size too massively.

The text was updated successfully, but these errors were encountered:

Ralith · 2024-01-18T06:54:53Z

I imagine this would also affect less frivolous custom Display impls, like Chinese characters

Is there a reason to put Chinese characters in an nalgebra::Matrix?

nicholas-miklaucic · 2024-01-18T15:20:59Z

Some people may prefer to output numbers in Chinese numerals instead of the Arabic numeral system. A cursory glance at crates.io doesn't reveal any crates that have nalgebra and a crate for that formatting as a dependency, so I think that's not common enough to have made it on crates.io. I'd be interested to hear if anyone else has tried to use Unicode characters that break character-level counting before besides myself!

tpdickso · 2024-02-10T20:00:54Z

A potential solution would be to allow trait overloading for computing the length of a scalar in characters, at the level of the scalar. This would not help you if you were printing a f32 using fullwidth numerals to correctly align arabic numerals with hanzi, but would permit a custom scalar type to align itself using unicode-segmentation.

@Ralith I don't believe that Hanzi numerals are popular in mathematical work, but fullwidth numerals can be useful for alignment purposes; e.g. in terminals well-formatted CJK characters tend to take up two horizontal cells rather than one, so using fullwidth numerals can be useful.

However I think those will actually work fine, as it's a char-count and not a byte-count, so the fullwidth forms should be aligned correctly relative to one another, as long as they aren't being constructed using CJK combining characters (which are sadly not that well-supported anywhere, in my experience.) unicode-segmentation wouldn't actually help with aligning CJK with non-CJK characters either since CJK taking up 2 cells is a property of terminals, not unicode...

nicholas-miklaucic · 2024-02-15T16:35:45Z

Another use case, beyond mine, is computer algebra systems or other libraries that want to implement symbolic expressions. Perhaps that's more common than different languages.

You make an excellent point—allowing the custom types to control their display width eliminates my main concern with a fix for this, the large size of unicode-segmentation. It also allows correct behavior even when the alignment doesn't follow graphemes. I would be happy to implement this solution if people are interested.

The val_width function could be moved into a trait that inherits from Display with a default implementation as it's written. The API would have an additional trait, which is unfortunate if it's not widely used, but the source code would change very little. (I don't think it makes sense to have this apply to anything besides Display, and having a ton of separate traits for the other formatting traits makes little sense, so it would require some refactoring of the macro to cover the extra case.)

An important note is that fixing the issue on my end is more challenging than I originally thought. One solution I had considered was to add zero-width characters to the Display impl, so both positive and negative numbers had the additional spaces, but that only lets you pad more, not less. The result is pretty bad alignment, and it's quite brittle when you consider extensions like fractions or colored output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix Display output alignment is not Unicode-aware #1354

Matrix Display output alignment is not Unicode-aware #1354

nicholas-miklaucic commented Jan 18, 2024 •

edited

Ralith commented Jan 18, 2024

nicholas-miklaucic commented Jan 18, 2024

tpdickso commented Feb 10, 2024

nicholas-miklaucic commented Feb 15, 2024

Matrix Display output alignment is not Unicode-aware #1354

Matrix Display output alignment is not Unicode-aware #1354

Comments

nicholas-miklaucic commented Jan 18, 2024 • edited

Ralith commented Jan 18, 2024

nicholas-miklaucic commented Jan 18, 2024

tpdickso commented Feb 10, 2024

nicholas-miklaucic commented Feb 15, 2024

nicholas-miklaucic commented Jan 18, 2024 •

edited