Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code duplication and maintainability ambiguities in font and metric generation #3702

Open
2 tasks done
IvoWingelaar opened this issue Sep 2, 2022 · 3 comments · May be fixed by #3715
Open
2 tasks done

Code duplication and maintainability ambiguities in font and metric generation #3702

IvoWingelaar opened this issue Sep 2, 2022 · 3 comments · May be fixed by #3715
Labels

Comments

@IvoWingelaar
Copy link

IvoWingelaar commented Sep 2, 2022

Before reporting a bug

I've been rewriting the code responsible for font, and the associated metrics, generation from the Perl scripts to a (IMHO) more readable (and therefore maintainable) collection of Python scripts. The font generation scripts are finished, but I've ran into some situations that point to possible issues when rewriting the metric generation.

  • The mapping from TeX font character codes to Unicode scalar values is defined twice, once in the src/fonts/makeFF Perl script, and a second time in the src/metrics/mapping.pl Perl script. These two mappings are very slightly different. Is this an intentional design decision? The overwhelming majority of mapping pairs are the same. It seems a chore and risky with regards to possible bugs to maintain two almost equal datasets of >1000 lines.
  • Some of the mappings overlap symbols, as for example in makeFF:
    cmr10 0x2C -> Main-Regular 0x2C, while the following mapping also exists: cmmi10 0x3B -> Main-Regular 0x2C.
    The makeFF script sorts certain keys of the mappings, so I believe the produced fonts stays deterministic in which glyphs survive these competing mappings, but this seems confusing.
    Other duplicate mappings are:
    cmr10 0x2E -> Main-Regular 0x2E and cmmi10 0x3A -> Main-Regular 0x2E,
    cmbx10 0x2E -> Main-Bold 0x2E and cmmib10 0x3A -> Main-Bold 0x2E,
    cmbx10 0x2C -> Main-Bold 0x2C and cmmib10 0x3B -> Main-Bold 0x2C.
  • A pseudo-duplicate mapping also exists in the Typewriter mapping for 0x7E, but there the intent is clear as the second mapping is between the same glyphs as the first one, but the second one provides an additional y shift.

I've tried so search the archived katex-fonts repository for more information about these design choices, but I couldn't find it, so are these decisions intentional?

Environment (please complete the following information):

  • KaTeX Version: latest commit (ef49e2be91628e96e4fcb962b28b443930627d84)
@edemaine edemaine added fonts and removed bug labels Sep 2, 2022
@edemaine
Copy link
Member

edemaine commented Sep 2, 2022

I believe the messyness in the code (and those duplicate tables) are inherited from MathJax code that this was originally based on. But it predates my time so I'm not sure.

I agree that it would be best to remove this risky duplicated code (in particular resolving the discrepancies), and to avoid duplicates (and maybe detect duplicates to avoid this in the future?). Or to otherwise rewrite this code in a better way.

@IvoWingelaar
Copy link
Author

The issue is not as bad as I thought. I re-used the font generation mapping to also generate the metrics, and get the following diff on my newly generated fontMetricsData.js:

  • Adds all Fraktur-Bold and Caligraphic-Bold metrics: they are absent in the current version apparently.
  • Adds 2 missing metrics to Main-Italic and 5 to Typewriter-Regular.
  • Removes 3 metrics (corresponding to glyphs that are currently unused) from Typewriter-Regular.
  • And only changes the metrics on 1 glyph in Typewriter-Regular: (0u7E - Combining Tilde) has changed height/depth.

I'm confident my work's in a state that can be reviewed, so how granular do you want my PR to be? A single PR for both the font-generation and the metrics generation changes? Or two separate ones?

Afterwards, it's not that difficult to add a sanity check to detect duplicate mappings. My next goal though is to add the pair-kerning tables to the fonts.

@IvoWingelaar
Copy link
Author

I've taken the initiative to prepare to split it in multiple PR's, as the changes can be semantically grouped, with the goal to make it easier to review. A huge delta of multiple KLOC's added/deleted seems like bad etiquette, hopefully a chain of PR's is less bad. This chain is #3713, #3714, #3715. When this chain is approved and merged I can start working on implementing pair-kerning, but this is feature where a separate issue to track progress seems more appropriate.

@IvoWingelaar IvoWingelaar linked a pull request Sep 17, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants