Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX: support for Greek and Cyrillic #5645

Merged
merged 5 commits into from Nov 21, 2018

Conversation

jfbu
Copy link
Contributor

@jfbu jfbu commented Nov 16, 2018

  1. do not escape Unicode Greek letters via LaTeX math mark-up: pass them
    through un-modified to LaTeX document,

  2. if "fontenc" receives extra option LGR, then pdflatex will support
    Unicode Greek letters (not in math), and with extra option T2A it
    will support (most) Unicode Cyrillic letters.

  3. for pdflatex with LGR, this will use "textalpha" LaTeX package and
    "substitutefont" package to set up some automatic font substitution
    to work around the unavailability of Greek with "times"
    package (which is default font package chosen by Sphinx for
    pdflatex), same with T2A and "substitutefont" for Cyrillic.

  4. for xelatex/lualatex, set up Computer Modern Unicode as default font,
    as it supports Cyrillic and Greek scripts,

  5. for platex, don't do anything special as the engine already has
    its default font supporting Cyrillic and Greek (even in math mode!)

edit

The final version differs from the description above:

  • for xelatex/lualatex use FreeFont (because on Ubuntu xenial, CMU requires the big package texlive-fonts-extra, and FreeFont has its own separate fonts-freefont-otf)

  • also X2 encoding is handled for pdflatex, and the fonts are the base ones in LGR or X2 or T2A encoding, providedon Ubuntu xenial from cm-super package (and texlive-lang-greek, texlive-lang-cyrillic). These extra dependencies are optional only, to support the new feature of support of Greek and Cyrillic in European language document.

TODO: indexing with xindy has merge-rules for Greek, but for Cyrillic no, because they are loaded only for Cyrillic documents, so indexing in non-Cyrillic document of Cyrillic words will work but all such words will end-up in "non-alphabetical" group.

Closes: #5251
Fixes: #5248
Fixes: #5247
Closes: #1682

0. do not escape Unicode Greek letters via LaTeX math mark-up: pass them
   through un-modified to LaTeX document,

1. if "fontenc" receives extra option LGR, then pdflatex will support
   Unicode Greek letters (not in math), and with extra option T2A it
   will support (most) Unicode Cyrillic letters.

2. for pdflatex with LGR, this will use "textalpha" LaTeX package and
   "substitutefont" package to set up some automatic font substitution
   to work around the unavailability of Greek with "times"
   package (which is default font package chosen by Sphinx for
   pdflatex), same with T2A and "substitutefont" for Cyrillic.

3. for xelatex/lualatex, set up Computer Modern Unicode as default font,
   as it supports Cyrillic and Greek scripts,

4. for platex, don't do anything special as the engine already has
   its default font supporting Cyrillic and Greek (even in math mode!)

Closes: sphinx-doc#5251
Fixes:  sphinx-doc#5248
Fixes:  sphinx-doc#5247
@jfbu jfbu added this to the 2.0.0 milestone Nov 16, 2018
CHANGES Outdated
the use of the ``LGR`` (Greek) and/or ``T2A`` (Cyrillic) font encoding. Even
then, the last four are font packages arising in the default value for
:confval:`latex_elements`.\ ``'fontpkg'``, and may be replaced by other font
packages providing ``LGR`` and/or ``T2A`` support.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These explanations may move later on to builders/index.rst from CHANGES, after 2.0 release

CHANGES Outdated
will use the text font not the math font. If (and only if) the document
contains such Greek Unicode letters *and* the :confval:`latex_engine` is
``'pdflatex'`` then the :confval:`latex_elements`.\ ``'fontenc'`` key
**must** be used to declare usage of the ``LGR`` font encoding.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I added LGR and usage of packages textalpha and substitutefont to the default pdflatex setting so that no document would need any change for transition to Sphinx 2.0. Unfortunately the textalpha is on Ubuntu xenial available via texlive-lang-greek which wasn't previous dependency.

So after some hesitation I decided on the contrary to not change anything to the default 'fontenc' but this means that if a document had a Greek Unicode in text, which previously was escaped to math mark-up, now, what will happens is that project author must explicitely use latex_elements 'fontenc' key to add usage of LGR encoding. And the TeX installation might need additional LaTeX packages (listed above).

@@ -58,10 +97,18 @@ __ https://github.com/sphinx-contrib/sphinx-pretty-searchresults
* #4018: htmlhelp: Add :confval:`htmlhelp_file_suffix` and
:confval:`htmlhelp_link_suffix`
* #5559: text: Support complex tables (colspan and rowspan)
* LaTeX: support rendering (not in math, yet) of Greek and Cyrillic Unicode
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible via alphabeta package to support Greek Unicode literals in math directive with pdflatex, but the problem is that an upright Unicode alpha will be mapped to \alpha and rendered by italic math font, contrarilty to MathJax rendering, which will keep separate the upright and the italic. Anyway there is no issue yet about this on your tracker, and besides, only xelatex/lualatex are reasonable for supporting Unicode input in math, it is not worthwile to go to extreme lengths to try to support it with pdflatex.

I was surprised to discover that platex (Japanese) has all set-up and individual Greek and Cyrillic letters work fine both in text mode and in math mode...

* #5247: LaTeX: PDF does not build with default font config for Russian
language and ``'xelatex'`` or ``'lualatex'`` as :confval:`latex_engine`
(refs: #5251)
* #5248: LaTeX: Greek letters in section titles disappear from PDF bookmarks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There still remains #5249, but see previous comment, user should choose xelatex and add usage of unicode-math package for this.

\substitutefont{LGR}{\ttdefault}{cmtt}
\substitutefont{T2A}{\rmdefault}{fcm}
\substitutefont{T2A}{\sfdefault}{fcs}
\substitutefont{T2A}{\ttdefault}{fct}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the T2A fonts are from the cm-lgc LaTeX font package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should replace the last three lines with \substitutefont{X2}{\rmdefault}{cmr} etc.. because x2cmr.fd exists in texlive-lang-cyrillic, and covers more Cyrillic than T2A encoded font files.

' \\sphinxDUC{2502}{\\sphinxunichar{2502}}\n'
' \\sphinxDUC{2514}{\\sphinxunichar{2514}}\n'
' \\sphinxDUC{251C}{\\sphinxunichar{251C}}\n'
' \\sphinxDUC{2572}{\\textbackslash}\n'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only refactoring, for better readability of what the code does

'fontpkg': '',
'fontpkg': ('\\setmainfont{CMU Serif}\n'
'\\setsansfont{CMU Sans Serif}\n'
'\\setmonofont{CMU Typewriter Text}'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fonts are visually close to previous default with xelatex: previous default was LaTeX choice i.e. Latin Modern OpenType, which is not far from original Knuth Computer Modern. The CMU Serif is the Unicode version of Computer Modern, it exists in serif, sans serif, monospace and supports both Greek and Cyrillic.

Alternative is Libertinus, which is more Times-like (derived from Linux Libertine) but this is more recent font, currently still under development recently.

if ('T2A' in self.elements['fontenc'] and
not self.babel.uses_cyrillic()):
self.elements['substitutefont'] = '\\usepackage{substitutefont}'
self.elements['sphinxpkgoptions'] += ',cyrnocyr'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefered to move indeed to sphinx.sty the fancy latex, rather than have it in template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said above I think now the fancy latex should go into some sphinxcyrillic.sty.

self.elements['substitutefont'] = '\\usepackage{substitutefont}'
self.elements['sphinxpkgoptions'] += ',cyrnocyr'
if 'LGR' in self.elements['fontenc']:
self.elements['substitutefont'] = '\\usepackage{substitutefont}'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx does not support Greek as main language I think, so it makes sense not to worry about LGR being last font encoding in 'fontenc' key.

if 'LGR' in self.elements['fontenc']:
self.elements['substitutefont'] = '\\usepackage{substitutefont}'
else:
self.elements['textgreek'] = ''
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way no dependency is added to existing Sphinx projects only needing T1 encoding (i.e. all (non-Cyrillic) projects which did not have Unicode Greek letters in their text, previously rendered via TeX math font)

@codecov
Copy link

codecov bot commented Nov 16, 2018

Codecov Report

Merging #5645 into master will decrease coverage by <.01%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #5645      +/-   ##
=========================================
- Coverage    83.2%   83.2%   -0.01%     
=========================================
  Files         294     294              
  Lines       39195   39219      +24     
  Branches     5864    5865       +1     
=========================================
+ Hits        32614   32632      +18     
- Misses       5220    5224       +4     
- Partials     1361    1363       +2
Impacted Files Coverage Δ
tests/test_markup.py 96.73% <ø> (ø) ⬆️
sphinx/util/texescape.py 100% <ø> (ø) ⬆️
sphinx/writers/latex.py 83.53% <33.33%> (-0.39%) ⬇️
sphinx/config.py 82.74% <0%> (-0.79%) ⬇️
sphinx/testing/util.py 93.51% <0%> (-0.12%) ⬇️
sphinx/writers/text.py 93.52% <0%> (-0.03%) ⬇️
tests/test_util.py 100% <0%> (ø) ⬆️
setup.py 0% <0%> (ø) ⬆️
sphinx/writers/texinfo.py 88.08% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c94801...1edf2a4. Read the comment docs.

@jfbu
Copy link
Contributor Author

jfbu commented Nov 17, 2018

@tk0miya sorry for flake8, but for some time now I can't execute the tests at my locale because I use Anaconda Python distribution and it has ceased supporting my system version of Mac OS X, and flake8 simply refuses to work.

As per circleci, the build succeeds with lualatex (showing CMU Serif is available) and fails with xelatex which can not find the font, I will try to modify it to call the font by filename but this is more cumbersome. I wonder if testing is done on Mac OS ?

I had forgotten xetex can't find TeXLive fonts by font names, it needs filenames, because at my locale I have symlinks to work around that problem.

@jfbu jfbu force-pushed the latex_greek_cyrillic_letters branch from 53bf824 to 2b88140 Compare November 17, 2018 00:42
@jfbu jfbu force-pushed the latex_greek_cyrillic_letters branch from 2b88140 to 6283324 Compare November 17, 2018 00:46
Copy link
Contributor Author

@jfbu jfbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on this PR, I did not pay attention to the dependencies on say Ubuntu Xenial. Indeed I mainly work on my own Mac OS X with a (quasi-full) TeXLive installation.

I feel I need to think a bit more about minimizing dependencies resulting from choice of fonts. For most Sphinx users wishing to use some Greek or Cyrillic exceptionally in some non-Cyrillic document, important is that the Unicode letter shows in PDF, not that the font is well-matched with the font for Latin text.

CHANGES Outdated
- Greek letters (in text, not math)
* - cm-lgc
- texlive-fonts-extra
- Cyrillic letters (in non-Cyrillic documents)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are alternative which may be better: texlive-fonts-extra appears to be rather big Ubuntu/Debian package, which installs many many TeX fonts. Thus, texlive-lang-cyrillic is much smaller and provides some Computer Modern fonts in "cyrillic" core LaTeX bundle which is not cm-lgc. They are available in X2 encoding which covers even more Cyrillic glyphs. I must check if in PDF they are rendered as scalable fonts. (I expect so)

It would be logical to tell people: for support of occasional Greek, you will need texlive-lang-greek and for support of occasional Cyrillic texlive-lang-cyrillic but not require them the big texlive-fonts-extra.

For xelatex/lualatex, this is different, I have no qualms about requiring full up-to-date installation if necessary. But for vast majority of projects using pdflatex it makes sense to try not to add big dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The X2 encoded Computer Modern seem to require not only texlive-lang-cyrillic but also cm-super-minimal (because PDF needs subsetting in particular font file sfrm1000.pfb, if I look at log of pdflatex at my locale). Investigating if cm-super-minimal is enough or cm-super is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigating if cm-super-minimal is enough or cm-super is needed.

Yes, cm-super-minimal is enough, but \usepackage[10pt]{type1ec} must be used before \usepackage[...]{fontenc}. From the README.debian of the cm-super-minimal package,

If you DON'T have cm-super (the full package) installed you have to use
\usepackage[10pt]{type1ec}
which uses only those cm-super fonts available in cm-super-minimal and
scale those fonts for other design sizes.

and I confirmed by looing at contents of type1ec.sty file to see what it does.

doc/conf.py Outdated
@@ -56,10 +56,17 @@
'Georg Brandl', 'manual', 1)]
latex_logo = '_static/sphinx.png'
latex_elements = {
'fontenc': r'\usepackage[LGR,T2A,T1]{fontenc}',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably switch to X2 in place of T2A to cover even more of Cyrillic Unicode block.

@@ -164,8 +164,11 @@ The builder's "name" must be given to the **-b** command-line option of
* ``texlive-latex-recommended``
* ``texlive-fonts-recommended``
* ``texlive-latex-extra``
* ``texlive-fonts-extra``, ``texlive-lang-greek`` (if needed to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much better to say here texlive-fonts-cyrillic in place of texlive-fonts-extra

\substitutefont{LGR}{\ttdefault}{cmtt}
\substitutefont{T2A}{\rmdefault}{fcm}
\substitutefont{T2A}{\sfdefault}{fcs}
\substitutefont{T2A}{\ttdefault}{fct}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should replace the last three lines with \substitutefont{X2}{\rmdefault}{cmr} etc.. because x2cmr.fd exists in texlive-lang-cyrillic, and covers more Cyrillic than T2A encoded font files.

both Cyrillic and Greek scripts (contrarily to the
default font configured by LaTeX for ``xelatex/lualatex``
if ``'fontpkg'`` is left to empty string, as was the case
prior to 2.0).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: \setmainfont{CMU Serif} syntax works only with lualatex, for xelatex one needs in fact filename, as I did in latex.py. Update doc.

}%
\DeclareTextSymbolDefault{\CYRpalochka}{T2A}%
\fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably replace it by more extensive code covering X2 which is superset of T2A for Cyrillic. And tex.sx answer did already the work, in a more agreeable syntax using Unicode letters. I think I should externalize this to a sphinxcyrillic.sty file probably, to keep sphinx.sty simpler. Besides with a separate sphinxcyrillic.sty file, its loading can be done from template rather than pass an option to sphinx.

(merge-rule "\IeC {\textChi }" "Χ" :string)
(merge-rule "\IeC {\textPsi }" "Ψ" :string)
(merge-rule "\IeC {\textOmega }" "Ω" :string)
(merge-rule "\IeC {\textohm }" "Ω" :string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these merge rules do nothing bad with xelatex/lualatex where Unicode letters remain identical to themselves when written to file; they serve only with pdflatex because the .idx file will have those macros rather than the Unicode letters. So we map the macros back to the Unicode letters.

% T2A was declared as font encoding
\substitutefont{T2A}{\rmdefault}{fcm}
\substitutefont{T2A}{\sfdefault}{fcs}
\substitutefont{T2A}{\ttdefault}{fct}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change that presumably to X2 with cmr, so that only texlive-lang-cyrillic is needed not the much bigger texlive-fonts-extra.

'\\setmonofont{cmuntt.otf}[\n'
' BoldFont = cmuntb.otf,\n'
' ItalicFont = cmunit.otf,\n'
' BoldItalicFont = cmuntx.otf]'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well what a pain that is. On my MacOS, the problem did not arise because I used symlinks from font repertory ~/Library/Fonts to the TeXLive location, so XeTeX knows how to find fonts per font name, not filename, so in initial commit I had done for xelatex as for lualatex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this seems to require texlive-fonts-extra. But one may expect from Sphinx xetex/lualatex users complete up-to-date TeX installations, so perhaps this is acceptable. Libertinus font is more recent yet.

if ('T2A' in self.elements['fontenc'] and
not self.babel.uses_cyrillic()):
self.elements['substitutefont'] = '\\usepackage{substitutefont}'
self.elements['sphinxpkgoptions'] += ',cyrnocyr'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said above I think now the fancy latex should go into some sphinxcyrillic.sty.

@jfbu
Copy link
Contributor Author

jfbu commented Nov 21, 2018

I am not sure the testing includes building out document to PDF: if it does, the cm-super package is required now.

In my last commit I have

  • tried to remove the clutter and be less verbose in explanations,

  • decided for FreeFont as default choice of fonts for the Unicode engines,

  • added support to pdflatex for X2-encoded Cyrillic which covers a lot of Cyrillic glyphs

  • reduced extra requirements on Ubuntu xenial to a minimum: texlive-fonts-extra never required, only individual packages such as fonts-freefont-otf or cm-super(-minimal). And these requirements are optional, only to support previously non existing support of Greek/Cyrillic in English or European languages documents.

@jfbu
Copy link
Contributor Author

jfbu commented Nov 21, 2018

Testing completed. I will push one last commit to avoid using same option prefix for sphinx.sty and (new) sphinxcyrillic.sty and then I will merge.

Forgot to add it to previous commit :(
@jfbu jfbu merged commit 8412bdf into sphinx-doc:master Nov 21, 2018
@jfbu
Copy link
Contributor Author

jfbu commented Nov 21, 2018

@tk0miya merged! thanks for reviewing, I since externalized to new sphinxcyrillic.sty, fixed up font choice for xelatex/lualatex, clarified the Ubuntu package dependencies and improved the documentation.

@jfbu
Copy link
Contributor Author

jfbu commented Nov 22, 2018

@tk0miya It seems our LaTeX testing does not include building our PDF documentation sphinx.pdf (I agree this takes time, less now thanks to your work on speeding up the gathering of references phase). MEMO: to build sphinx.pdf this PR adds the following requirements texlive-lang-greek, texlive-lang-cyrillic, and cm-super (because Greek and Cyrillic is used for demonstration in document). fonts-latex-extra texlive-fonts-extra is not required.

@jfbu jfbu deleted the latex_greek_cyrillic_letters branch November 22, 2018 08:47
@tk0miya
Copy link
Member

tk0miya commented Nov 22, 2018

At present, we only do testing to build testroot document to PDF, not our document.

On the other hand, PDF is always built in readthedocs.org on every commit. I know it's not testing, but might be enough.

@jfbu
Copy link
Contributor Author

jfbu commented Nov 22, 2018

ah yes, and RTD build succeeded. As per the requirements for our continuous testing I see they are changed e.g. at circleci via a docker file sphinxdoc/docker-ci? I was worried I could not modify it. What is needed for LaTeX PDF builds if one tests the new features of this PR is texlive-lang-greek, texlive-lang-cyrillic, cm-super. For xelatex/lualatex, fonts-freefont-otf. The (big) Ubuntu package texlive-fonts-extra is not needed.

BTW i need to do s/fonts-latex-extra/texlive-fonts-extra/g everywhere... (I got mixed up with Ubuntu package names, and I am not on Ubuntu at my locale...)

edit: actually I mixed up only in my comments, not in this PR

@tk0miya
Copy link
Member

tk0miya commented Nov 22, 2018

Yes, Circle CI uses sphinxdoc/docker-ci image (https://hub.docker.com/r/sphinxdoc/docker-ci/).
And it comes from https://github.com/sphinx-doc/docker-ci .
Circle CI has always fetch our image on each testing. So it would be nice if we can make it slim :-)

@jfbu
Copy link
Contributor Author

jfbu commented Nov 22, 2018

@tk0miya afaict, texlive-fonts-extra (in https://github.com/sphinx-doc/docker-ci, which is a big dependency) could be replaced by
fonts-freefont-otf (to support xelatex and lualatex builds on master branch), and fonts-lmodern (to support them on 1.8 branch). (I think no platex builds are done)

edit I see lmodern (which includes fonts-lmodern) is already listed as dependency (the dependencies are in alphabetical order but for some reason the name of this one does not start with texlive- so I missed it). After 2.0 release it will become superfluous I think and fonts-freefont-otf will be enough.

If we add some Greek or Cyrillic letters to the testroot (still with language English), then it would be needed to add texlive-lang-greek, texlive-lang-cyrillic and cm-super for the PDFLaTeX build of testroot to succeed. But this is a bit redundant because RTD build of our own documentation tests this now.

@tk0miya
Copy link
Member

tk0miya commented Nov 24, 2018

We should test for Greek and Cyrillic at least once. And I think we've already done it on rtd. So it's okay to keep as is.

jfbu added a commit that referenced this pull request Dec 18, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants