- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
LaTeX: support for Greek and Cyrillic #5645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0. do not escape Unicode Greek letters via LaTeX math mark-up: pass them through un-modified to LaTeX document, 1. if "fontenc" receives extra option LGR, then pdflatex will support Unicode Greek letters (not in math), and with extra option T2A it will support (most) Unicode Cyrillic letters. 2. for pdflatex with LGR, this will use "textalpha" LaTeX package and "substitutefont" package to set up some automatic font substitution to work around the unavailability of Greek with "times" package (which is default font package chosen by Sphinx for pdflatex), same with T2A and "substitutefont" for Cyrillic. 3. for xelatex/lualatex, set up Computer Modern Unicode as default font, as it supports Cyrillic and Greek scripts, 4. for platex, don't do anything special as the engine already has its default font supporting Cyrillic and Greek (even in math mode!) Closes: sphinx-doc#5251 Fixes: sphinx-doc#5248 Fixes: sphinx-doc#5247
CHANGES
Outdated
the use of the ``LGR`` (Greek) and/or ``T2A`` (Cyrillic) font encoding. Even | ||
then, the last four are font packages arising in the default value for | ||
:confval:`latex_elements`.\ ``'fontpkg'``, and may be replaced by other font | ||
packages providing ``LGR`` and/or ``T2A`` support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These explanations may move later on to builders/index.rst from CHANGES, after 2.0 release
CHANGES
Outdated
will use the text font not the math font. If (and only if) the document | ||
contains such Greek Unicode letters *and* the :confval:`latex_engine` is | ||
``'pdflatex'`` then the :confval:`latex_elements`.\ ``'fontenc'`` key | ||
**must** be used to declare usage of the ``LGR`` font encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I added LGR and usage of packages textalpha
and substitutefont
to the default pdflatex setting so that no document would need any change for transition to Sphinx 2.0. Unfortunately the textalpha
is on Ubuntu xenial available via texlive-lang-greek
which wasn't previous dependency.
So after some hesitation I decided on the contrary to not change anything to the default 'fontenc'
but this means that if a document had a Greek Unicode in text, which previously was escaped to math mark-up, now, what will happens is that project author must explicitely use latex_elements
'fontenc'
key to add usage of LGR
encoding. And the TeX installation might need additional LaTeX packages (listed above).
@@ -58,10 +97,18 @@ __ https://github.com/sphinx-contrib/sphinx-pretty-searchresults | |||
* #4018: htmlhelp: Add :confval:`htmlhelp_file_suffix` and | |||
:confval:`htmlhelp_link_suffix` | |||
* #5559: text: Support complex tables (colspan and rowspan) | |||
* LaTeX: support rendering (not in math, yet) of Greek and Cyrillic Unicode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is possible via alphabeta
package to support Greek Unicode literals in math directive with pdflatex
, but the problem is that an upright Unicode alpha will be mapped to \alpha
and rendered by italic math font, contrarilty to MathJax rendering, which will keep separate the upright and the italic. Anyway there is no issue yet about this on your tracker, and besides, only xelatex/lualatex are reasonable for supporting Unicode input in math, it is not worthwile to go to extreme lengths to try to support it with pdflatex.
I was surprised to discover that platex (Japanese) has all set-up and individual Greek and Cyrillic letters work fine both in text mode and in math mode...
* #5247: LaTeX: PDF does not build with default font config for Russian | ||
language and ``'xelatex'`` or ``'lualatex'`` as :confval:`latex_engine` | ||
(refs: #5251) | ||
* #5248: LaTeX: Greek letters in section titles disappear from PDF bookmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There still remains #5249, but see previous comment, user should choose xelatex and add usage of unicode-math
package for this.
doc/usage/configuration.rst
Outdated
\substitutefont{LGR}{\ttdefault}{cmtt} | ||
\substitutefont{T2A}{\rmdefault}{fcm} | ||
\substitutefont{T2A}{\sfdefault}{fcs} | ||
\substitutefont{T2A}{\ttdefault}{fct} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the T2A fonts are from the cm-lgc LaTeX font package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should replace the last three lines with \substitutefont{X2}{\rmdefault}{cmr}
etc.. because x2cmr.fd
exists in texlive-lang-cyrillic
, and covers more Cyrillic than T2A encoded font files.
' \\sphinxDUC{2502}{\\sphinxunichar{2502}}\n' | ||
' \\sphinxDUC{2514}{\\sphinxunichar{2514}}\n' | ||
' \\sphinxDUC{251C}{\\sphinxunichar{251C}}\n' | ||
' \\sphinxDUC{2572}{\\textbackslash}\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only refactoring, for better readability of what the code does
sphinx/writers/latex.py
Outdated
'fontpkg': '', | ||
'fontpkg': ('\\setmainfont{CMU Serif}\n' | ||
'\\setsansfont{CMU Sans Serif}\n' | ||
'\\setmonofont{CMU Typewriter Text}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fonts are visually close to previous default with xelatex
: previous default was LaTeX choice i.e. Latin Modern OpenType, which is not far from original Knuth Computer Modern. The CMU Serif is the Unicode version of Computer Modern, it exists in serif, sans serif, monospace and supports both Greek and Cyrillic.
Alternative is Libertinus, which is more Times-like (derived from Linux Libertine) but this is more recent font, currently still under development recently.
sphinx/writers/latex.py
Outdated
if ('T2A' in self.elements['fontenc'] and | ||
not self.babel.uses_cyrillic()): | ||
self.elements['substitutefont'] = '\\usepackage{substitutefont}' | ||
self.elements['sphinxpkgoptions'] += ',cyrnocyr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefered to move indeed to sphinx.sty
the fancy latex, rather than have it in template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above I think now the fancy latex should go into some sphinxcyrillic.sty
.
self.elements['substitutefont'] = '\\usepackage{substitutefont}' | ||
self.elements['sphinxpkgoptions'] += ',cyrnocyr' | ||
if 'LGR' in self.elements['fontenc']: | ||
self.elements['substitutefont'] = '\\usepackage{substitutefont}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sphinx does not support Greek as main language I think, so it makes sense not to worry about LGR being last font encoding in 'fontenc'
key.
if 'LGR' in self.elements['fontenc']: | ||
self.elements['substitutefont'] = '\\usepackage{substitutefont}' | ||
else: | ||
self.elements['textgreek'] = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way no dependency is added to existing Sphinx projects only needing T1 encoding (i.e. all (non-Cyrillic) projects which did not have Unicode Greek letters in their text, previously rendered via TeX math font)
Codecov Report
@@ Coverage Diff @@
## master #5645 +/- ##
=========================================
- Coverage 83.2% 83.2% -0.01%
=========================================
Files 294 294
Lines 39195 39219 +24
Branches 5864 5865 +1
=========================================
+ Hits 32614 32632 +18
- Misses 5220 5224 +4
- Partials 1361 1363 +2
Continue to review full report at Codecov.
|
@tk0miya sorry for flake8, but for some time now I can't execute the tests at my locale because I use Anaconda Python distribution and it has ceased supporting my system version of Mac OS X, and flake8 simply refuses to work. As per circleci, the build succeeds with lualatex (showing CMU Serif is available) and fails with xelatex which can not find the font, I will try to modify it to call the font by filename but this is more cumbersome. I had forgotten xetex can't find TeXLive fonts by font names, it needs filenames, because at my locale I have symlinks to work around that problem. |
53bf824
to
2b88140
Compare
2b88140
to
6283324
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on this PR, I did not pay attention to the dependencies on say Ubuntu Xenial. Indeed I mainly work on my own Mac OS X with a (quasi-full) TeXLive installation.
I feel I need to think a bit more about minimizing dependencies resulting from choice of fonts. For most Sphinx users wishing to use some Greek or Cyrillic exceptionally in some non-Cyrillic document, important is that the Unicode letter shows in PDF, not that the font is well-matched with the font for Latin text.
CHANGES
Outdated
- Greek letters (in text, not math) | ||
* - cm-lgc | ||
- texlive-fonts-extra | ||
- Cyrillic letters (in non-Cyrillic documents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are alternative which may be better: texlive-fonts-extra
appears to be rather big Ubuntu/Debian package, which installs many many TeX fonts. Thus, texlive-lang-cyrillic
is much smaller and provides some Computer Modern fonts in "cyrillic" core LaTeX bundle which is not cm-lgc
. They are available in X2 encoding which covers even more Cyrillic glyphs. I must check if in PDF they are rendered as scalable fonts. (I expect so)
It would be logical to tell people: for support of occasional Greek, you will need texlive-lang-greek
and for support of occasional Cyrillic texlive-lang-cyrillic
but not require them the big texlive-fonts-extra
.
For xelatex/lualatex, this is different, I have no qualms about requiring full up-to-date installation if necessary. But for vast majority of projects using pdflatex it makes sense to try not to add big dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The X2 encoded Computer Modern seem to require not only texlive-lang-cyrillic
but also cm-super-minimal
(because PDF needs subsetting in particular font file sfrm1000.pfb
, if I look at log of pdflatex
at my locale). Investigating if cm-super-minimal
is enough or cm-super
is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Investigating if cm-super-minimal is enough or cm-super is needed.
Yes, cm-super-minimal is enough, but \usepackage[10pt]{type1ec}
must be used before \usepackage[...]{fontenc}
. From the README.debian of the cm-super-minimal
package,
If you DON'T have cm-super (the full package) installed you have to use
\usepackage[10pt]{type1ec}
which uses only those cm-super fonts available in cm-super-minimal and
scale those fonts for other design sizes.
and I confirmed by looing at contents of type1ec.sty
file to see what it does.
doc/conf.py
Outdated
@@ -56,10 +56,17 @@ | |||
'Georg Brandl', 'manual', 1)] | |||
latex_logo = '_static/sphinx.png' | |||
latex_elements = { | |||
'fontenc': r'\usepackage[LGR,T2A,T1]{fontenc}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably switch to X2 in place of T2A to cover even more of Cyrillic Unicode block.
doc/usage/builders/index.rst
Outdated
@@ -164,8 +164,11 @@ The builder's "name" must be given to the **-b** command-line option of | |||
* ``texlive-latex-recommended`` | |||
* ``texlive-fonts-recommended`` | |||
* ``texlive-latex-extra`` | |||
* ``texlive-fonts-extra``, ``texlive-lang-greek`` (if needed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much better to say here texlive-fonts-cyrillic
in place of texlive-fonts-extra
doc/usage/configuration.rst
Outdated
\substitutefont{LGR}{\ttdefault}{cmtt} | ||
\substitutefont{T2A}{\rmdefault}{fcm} | ||
\substitutefont{T2A}{\sfdefault}{fcs} | ||
\substitutefont{T2A}{\ttdefault}{fct} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should replace the last three lines with \substitutefont{X2}{\rmdefault}{cmr}
etc.. because x2cmr.fd
exists in texlive-lang-cyrillic
, and covers more Cyrillic than T2A encoded font files.
doc/usage/configuration.rst
Outdated
both Cyrillic and Greek scripts (contrarily to the | ||
default font configured by LaTeX for ``xelatex/lualatex`` | ||
if ``'fontpkg'`` is left to empty string, as was the case | ||
prior to 2.0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: \setmainfont{CMU Serif}
syntax works only with lualatex, for xelatex one needs in fact filename, as I did in latex.py. Update doc.
sphinx/texinputs/sphinx.sty
Outdated
}% | ||
\DeclareTextSymbolDefault{\CYRpalochka}{T2A}% | ||
\fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably replace it by more extensive code covering X2 which is superset of T2A for Cyrillic. And tex.sx answer did already the work, in a more agreeable syntax using Unicode letters. I think I should externalize this to a sphinxcyrillic.sty file probably, to keep sphinx.sty simpler. Besides with a separate sphinxcyrillic.sty file, its loading can be done from template rather than pass an option to sphinx.
(merge-rule "\IeC {\textChi }" "Χ" :string) | ||
(merge-rule "\IeC {\textPsi }" "Ψ" :string) | ||
(merge-rule "\IeC {\textOmega }" "Ω" :string) | ||
(merge-rule "\IeC {\textohm }" "Ω" :string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these merge rules do nothing bad with xelatex/lualatex where Unicode letters remain identical to themselves when written to file; they serve only with pdflatex because the .idx
file will have those macros rather than the Unicode letters. So we map the macros back to the Unicode letters.
sphinx/writers/latex.py
Outdated
% T2A was declared as font encoding | ||
\substitutefont{T2A}{\rmdefault}{fcm} | ||
\substitutefont{T2A}{\sfdefault}{fcs} | ||
\substitutefont{T2A}{\ttdefault}{fct} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change that presumably to X2 with cmr
, so that only texlive-lang-cyrillic
is needed not the much bigger texlive-fonts-extra
.
sphinx/writers/latex.py
Outdated
'\\setmonofont{cmuntt.otf}[\n' | ||
' BoldFont = cmuntb.otf,\n' | ||
' ItalicFont = cmunit.otf,\n' | ||
' BoldItalicFont = cmuntx.otf]'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well what a pain that is. On my MacOS, the problem did not arise because I used symlinks from font repertory ~/Library/Fonts
to the TeXLive location, so XeTeX knows how to find fonts per font name, not filename, so in initial commit I had done for xelatex as for lualatex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this seems to require texlive-fonts-extra
. But one may expect from Sphinx xetex/lualatex users complete up-to-date TeX installations, so perhaps this is acceptable. Libertinus font is more recent yet.
sphinx/writers/latex.py
Outdated
if ('T2A' in self.elements['fontenc'] and | ||
not self.babel.uses_cyrillic()): | ||
self.elements['substitutefont'] = '\\usepackage{substitutefont}' | ||
self.elements['sphinxpkgoptions'] += ',cyrnocyr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above I think now the fancy latex should go into some sphinxcyrillic.sty
.
I am not sure the testing includes building out document to PDF: if it does, the In my last commit I have
|
Testing completed. I will push one last commit to avoid using same option prefix for sphinx.sty and (new) sphinxcyrillic.sty and then I will merge. |
Forgot to add it to previous commit :(
@tk0miya merged! thanks for reviewing, I since externalized to new sphinxcyrillic.sty, fixed up font choice for xelatex/lualatex, clarified the Ubuntu package dependencies and improved the documentation. |
@tk0miya It seems our LaTeX testing does not include building our PDF documentation |
At present, we only do testing to build testroot document to PDF, not our document. On the other hand, PDF is always built in readthedocs.org on every commit. I know it's not testing, but might be enough. |
ah yes, and RTD build succeeded. As per the requirements for our continuous testing I see they are changed e.g. at circleci via a docker file BTW i need to do s/fonts-latex-extra/texlive-fonts-extra/g everywhere... (I got mixed up with Ubuntu package names, and I am not on Ubuntu at my locale...) edit: actually I mixed up only in my comments, not in this PR |
Yes, Circle CI uses |
@tk0miya afaict, edit I see If we add some Greek or Cyrillic letters to the testroot (still with language English), then it would be needed to add |
We should test for Greek and Cyrillic at least once. And I think we've already done it on rtd. So it's okay to keep as is. |
do not escape Unicode Greek letters via LaTeX math mark-up: pass them
through un-modified to LaTeX document,
if "fontenc" receives extra option LGR, then pdflatex will support
Unicode Greek letters (not in math), and with extra option T2A it
will support (most) Unicode Cyrillic letters.
for pdflatex with LGR, this will use "textalpha" LaTeX package and
"substitutefont" package to set up some automatic font substitution
to work around the unavailability of Greek with "times"
package (which is default font package chosen by Sphinx for
pdflatex), same with T2A and "substitutefont" for Cyrillic.
for xelatex/lualatex, set up Computer Modern Unicode as default font,
as it supports Cyrillic and Greek scripts,
for platex, don't do anything special as the engine already has
its default font supporting Cyrillic and Greek (even in math mode!)
edit
The final version differs from the description above:
for xelatex/lualatex use FreeFont (because on Ubuntu xenial, CMU requires the big package
texlive-fonts-extra
, and FreeFont has its own separatefonts-freefont-otf
)also X2 encoding is handled for pdflatex, and the fonts are the base ones in LGR or X2 or T2A encoding, providedon Ubuntu xenial from
cm-super
package (andtexlive-lang-greek
,texlive-lang-cyrillic
). These extra dependencies are optional only, to support the new feature of support of Greek and Cyrillic in European language document.TODO: indexing with xindy has merge-rules for Greek, but for Cyrillic no, because they are loaded only for Cyrillic documents, so indexing in non-Cyrillic document of Cyrillic words will work but all such words will end-up in "non-alphabetical" group.
Closes: #5251
Fixes: #5248
Fixes: #5247
Closes: #1682