Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF not showing some traditional Chinese characters #6319

Closed
iruletheworld opened this issue Oct 22, 2019 · 23 comments
Closed

PDF not showing some traditional Chinese characters #6319

iruletheworld opened this issue Oct 22, 2019 · 23 comments
Labels
Accepted Accepted issue on our roadmap Needed: documentation Documentation is required

Comments

@iruletheworld
Copy link

Details

Sorry to open this issue, but I have read a lot of the related issues and have googled a lot but still cannot get it fixed.

I am using the method in #5453 to build the Latex PDF for zh_TW. The local build is fine, all things as expected, but the remote build has some characters missing, e.g., "換", "佈" (basically a funny PDF).

Here is the setting in conf.py.

latex_engine = 'xelatex'
latex_use_xindy = False

latex_elements = {

    'papersize': 'a4paper',

    'pointsize': '10pt',

    'preamble': r'''

    \usepackage[UTF8]{ctex}

    \usepackage{float}

    \usepackage{graphicx}

    \usepackage{indentfirst}
    \setlength{\parindent}{2em}

    ''',

    'figure_align': 'H',
}

I have tried not using ctex but just xeCJK with a few different fonts but still not working.

By the way, the simplified Chinese translation is all correct (I use just xeCJK for it). Also, the HTML is fine with either language.

Expected Result

All traditional Chinese characters display correctly.

Actual Result

Some traditional Chinese characters, e.g., "換", "佈", are not displayed (the font used on the server does not have them?)

@humitos
Copy link
Member

humitos commented Oct 22, 2019

Thanks @iruletheworld for reporting this.

The local build is fine, all things as expected, but the remote build has some characters missing, e.g., "換", "佈" (basically a funny PDF).

What font are you using locally?

Please, share any information that you may consider useful to debug this. I open the PDF and I can't really differentiate one character from another. It's really hard to me to notice if there is a character not rendering properly. That said, you may want to point us to the specific page, line and column to mention a wrong character and which one should be placed there instead.

@stsewd stsewd added Needed: more information A reply from issue author is required Support Support question labels Oct 22, 2019
@iruletheworld
Copy link
Author

Hi @humitos , thanks for replying. I dig into the ctex manual and think I might have located the cause (the default Fandol font).

In here, I would:

  • show the comparisons of local PDF and remote PDF with images
  • attach MWEs to reproduce the issue locally
  • quote possible ctex solution from the manual
  • report the fixing progress to date (on going...)
  • summrise the stuff so far and reference seemingly related issuses from the ctex repo

Detailed Comparisons

Some examples of missing characters are shown in the screenshots below. The missing characters are rendered as "F" in squares.

  • Missing characters "換" and "佈" (title page)

Expected:

  • Missing character "說" (page 6)

Excepted:

  • Missing characters "註" and "儘"

Expected:

I uploaded a local PDF build for your reference: latex2img/remote_build_pdf_issues/latex2img_local_build.pdf

MEWs for reproducing this issue locally

Basically, you need to force the Fandol font, otherwise ctex would select fonts based on the OS (ctex manual page 6, table 3, section 4.3).

MWE using ctex:

"錄", "換" and "註" are missing.

\documentclass[fontset = none]{book}
\usepackage[UTF8]{ctex}
\ctexset{fontset = fandol}
\begin{document}

    Using \textbf{ctex} with \textbf{fontset = fandol} to reproduce the issue locally.\newline

    The 3 lines below are forced FandolHei (characters missing).\newline

    \textbf{目錄}\\

    \textbf{轉換}\\

    \textbf{備註}\\

\end{document}

You can use xelatex with fandol to reproduce it as well:

\documentclass[a4, 10pt]{article}

\usepackage{xeCJK}
% force FandolHei, which is causing problem
\setCJKmainfont[BoldFont=Source Han Serif TC]{FandolHei}

\begin{document}

Using \textbf{xelatex} and \textbf{xeCJK} with forced \textbf{FandolHei} to reproduce the issue locally.\newline

The 3 lines below are forced FandolHei (characters missing).\newline

目錄\\

轉換\\

備註\\

The 3 lines below are forced Source Han Serif TC (characters correct).\newline

\textbf{目錄}\\

\textbf{轉換}\\

\textbf{備註}\\

\end{document}

Suspected Cause

I think this is because of the Fandol fonts which ctex defaults to when it detects the OS is neither Mac or Windows (manual page 6, table 3).

This table basically says, when using xelatex, for Mac OS X, it uses the HuaWen font family by default; for Window Vista and plus, it defaults to ZhongYi family + Microsoft YaHei; for Window XP and minus, it defaults to ZhongYi family; for others, it defaults to the Fandol family.

The two MWEs above try to force the Fandol fonts to reproduce the problem and have reproduced successfully.

I think this may be something to deal the implementation of Fandol , especially how bold fonts are implemented, either real bold or not.

Possible Solution

I think there may be two solutions.

  • Using ctex natively solution. ctex allows users to explicitly setting fonts. This is shown in Example 5 in page 7. The code would require you to define documentclass options, by setting [fontset = none] in the documentclass and then set it again in the premable.
\documentclass[fontset = none]{ctexart}
\ctexset{fontset = founder}
\begin{document}在文档类选项中声明\verb|fontset = none|,随后在导言区用\verb|\ctexset|指定字体。
\end{document}
  • Use xeCJK package and use \setCJKmainfont{<available font>} to use the correct font.

Progress To Date

I tried method 1 by using 'extraclassoptions': r'fontset = none', and \ctexset{fontset = ubuntu}, still not working (got worse actually). The settings in conf.py is:

latex_engine = 'xelatex'
latex_use_xindy = False

latex_docclass = {
   'manual': 'ctexbook'
}

latex_elements = {

    'papersize': 'a4paper',

    'pointsize': '10pt',

    'extraclassoptions': r'fontset = none',

    'preamble': r'''

    \usepackage{ctex}
    \ctexset{fontset = ubuntu}

    \usepackage{float}

    \usepackage{graphicx}

    \usepackage{indentfirst}
    \setlength{\parindent}{2em}

    ''',

    'figure_align': 'H',
}

Summary So Far

  • the cause seems to be ctex defaulting to Fandol font family (especial the FandolHei for bold) on the remote server
  • this issue can be reproduce locally on a Windows machine by forcing ctex or xeCJK to use FandolHei (ctex on Windows defaults to Microsoft YaHei and will not have this problem)
  • forcing ctex to use fontset = ubuntu made things worse (ubuntu CJK fonts not installed?)
  • we should be able to solve this problem by either making the default Ubuntu CJK fonts used by ctex available on the server (slim chance?) or use xeCJK to force readily available CJK fonts
  • this issue is specific to traditional Chinese, simplified Chinese seems ok (Fandol seems to have quite a bit of them)
  • traditional Chinese can be really difficult to get right since TW has its own set while HK also has its own set, and mainland China also has a new set (believe me...), and I am not counting other places like Japan, Singapore (usually some partly simplified traditionals, e.g. "麵" -> "麺", note that the latter is the partly simplified version)
  • this issue and this one from the ctex repo confirm that Fandol has a quite a bit of TC fonts missing

@humitos
Copy link
Member

humitos commented Oct 22, 2019

Hi @iruletheworld! Thanks a lot for your amazing report on this issue.

Did you change something in your repository regarding these settings? I found that the live PDF does not have the "F" in squares anymore in the title: https://buildmedia.readthedocs.org/media/pdf/latex2img-zh-cn/latest/latex2img-zh-cn.pdf

I'd like if you can create a branch in your repo with the correct settings that should produce the expected results (I suppose they should be equal as the one that you use to build the PDF locally). That way, I'd be able to trigger some builds on RTD and debug it more efficiently.

@iruletheworld
Copy link
Author

iruletheworld commented Oct 23, 2019

Hi @iruletheworld! Thanks a lot for your amazing report on this issue.

Did you change something in your repository regarding these settings? I found that the live PDF does not have the "F" in squares anymore in the title: https://buildmedia.readthedocs.org/media/pdf/latex2img-zh-cn/latest/latex2img-zh-cn.pdf

I'd like if you can create a branch in your repo with the correct settings that should produce the expected results (I suppose they should be equal as the one that you use to build the PDF locally). That way, I'd be able to trigger some builds on RTD and debug it more efficiently.

Hi @humitos , about the "F", I think it is to do with the PDF reader. I've tried Adobe, and it wouldn't display the "F" (Chrome also wouldn't show). But Foxit reader would do that. To make it more clear, I screenshoted using Foxit.

I've done another two things with the repo.

  • edited the conf.py in master so the configs for Latex is just bare ctex
  • created a branch called debug_remote_pdf. Please feel free to play with this one

At the moment I am trying to get a virtual machine setup for Ubuntu with texlive to test the local PDF build with ctex + fonset and see whether I could reproduce the problem with ctex + [fontset = ubuntu] locally.

@iruletheworld
Copy link
Author

iruletheworld commented Oct 23, 2019

An (unperfect) Solution

Ok, after much trial and error, I've found an acceptable solution, only to zh-hant (TW) and neither zh-hant (HK) nor zh-hans (therefore unperfect).

I now believe this is a font problem on the server since changing to Debian available fonts does work (to an extend). The available fonts I found are here.

If you go into the repo and use tag zh-hant_TW_passed_1.0.0 then you can examine the solution. I will explain the details below, including:

  • root cause of the problem
  • font problems with zh-hant (TW), zh-hant (HK) (they use different characters under the umbrella of traditional Chinese) and zh-hans
  • some proposed solutions
  • conclusions to date

Root Cause of the Problem

I believe the root cause of the problem is the Fandol font which ctex defaults to when the OS is neither Mac or Windows. Fandol is quite incomplete, especially for traditional Chinese.

Why ctex with fontset = unbuntu doesn't work either?

This is because with fontset = unbuntu, ctex would try to use the WenQunYi family (fonts-wqy-zenhei and others). But this font family is no longer shipped with Ubuntu (e.g. 18.0.4). Therefor Latex will not be able to find the fonts needed.

Ok, what font then?

The Droid Sans Fallback works, but you don't have serif with it (as it says on the tin already).

  • Minimum setting with Droid Sans Fallback using xeCJK:
latex_engine = 'xelatex'
latex_use_xindy = False

latex_elements = {

    'papersize': 'a4paper',

    'pointsize': '10pt',

    'preamble': r'''

    \usepackage{xeCJK}
    \setCJKmainfont{Droid Sans Fallback}

    '''
}

Droid Sans Fallback works with both zh-hant (TW) and zh-hant (HK).

But I DO want serif and Chinese italic (KaiTi, 楷体/楷體)

This is where the constraint comes in, as I have not found a serif font supports zh-hant (TW), zh-hant (HK) and zh-hans all three on Debian.

I only manage to get zh-hant (TW) working, but zh-hant (HK) and zh-hans will have missing characters.

  • The solution (inperfect): Use AR PL Mingti2L Big5 for CJK main font, AR PL KaitiM Big5 for italic, and Droid Sans Fallback for sans (AR fonts from Arphic Technology, i.e., 文鼎,Wén Dǐng in Chinese pinyin):
latex_engine = 'xelatex'
latex_use_xindy = False

latex_elements = {

    'papersize': 'a4paper',

    'pointsize': '10pt',

    'preamble': r'''

    \usepackage{xeCJK}
    \setCJKmainfont{AR PL Mingti2L Big5}[ItalicFont = AR PL KaitiM Big5]
    \setCJKsansfont{Droid Sans Fallback}
    '''
}

Note that you must use zh-hant (TW) characters, otherwise some characters would be missing. For example, “爲” is HK, while “為” is TW, and “为” is the simplified version of them. I recommend opencc for translation.

So, until Ubuntu ships some really good Chinese fonts by default (e.g., the Noto Han/Source Han family, which gets installed if you add the Chinese language to Ubuntu), I am stuck (I am not a fan of the AR family. But I love the Noto Han/Source Han family).

What about zh-hans (simplified Chinese)?

Surprisingly, I tried the GB versions of the AR family and it did not work (GB is “GuóBiāo”, meaning "National Standard", not "Great Britain", lol). So, you may be stuck with Fandol. But many characters seem to be ok.

What about zh-hant (HK) then?

Well, someone donate a font to Ubuntu? Maybe the HK gov. should do it? Lol. At the moment, you may be stuck with \setCJKmainfont{Droid Sans Fallback} and will lose all serif.

Why xeCJK instead of ctex?

xeCJK is newer and more flexible and needs fewer configs.

Proposal to expand #5453

Since I ran into this trap (specific to remote readthedocs.org PDF build), I propose to expand #5453 a bit. My proposal would use xeCJK instead of ctex.

  • For zh-hant (TW), in the conf.py, use the following options for Latex
latex_engine = 'xelatex'
latex_use_xindy = False

latex_elements = {

    'papersize': 'a4paper',

    'pointsize': '10pt',

    'preamble': r'''

    \usepackage{xeCJK}
    \setCJKmainfont{AR PL Mingti2L Big5}[ItalicFont = AR PL KaitiM Big5]
    \setCJKsansfont{Droid Sans Fallback}
    '''
    }    

AR PL Mingti2L Big5 is the main font as in serif/宋体/宋體/明體; AR PL KaitiM Big5 is the italic/KaiTi/楷体/楷體; Droid Sans Fallback is the sans serif/无衬线/無襯線.

  • For zh-hans, you may get away with Fandol (default to on readthedocs.org remote)
latex_engine = 'xelatex'
latex_use_xindy = False

latex_elements = {

    'papersize': 'a4paper',

    'pointsize': '10pt',

    'preamble': r'''

    \usepackage{xeCJK}
    '''
}

If not, add \setCJKmainfont{Droid Sans Fallback} under \usepackage{xeCJK}. You will lose all serif but the characters will show on the remote built PDF.

Conclusions on this stage

  • zh-hans seems to work just all right with Fandol
  • zh-hant (TW) needs the AR family with Droid Sans Fallback (the AR family is not handsome though)
  • zh-hant (HK) can work with Droid Sans Fallback but losing all serif
  • Ubuntu, please ship your next version with Noto CJK/Source CJK
  • Local build? Use whatever you like. xelatex with xeCJK is brilliant

I think I more or less get to the bottom of this issue and it can be closed now.

@humitos
Copy link
Member

humitos commented Oct 24, 2019

This is because with fontset = unbuntu, ctex would try to use the WenQunYi family (fonts-wqy-zenhei and others). But this font family is no longer shipped with Ubuntu (e.g. 18.0.4). Therefor Latex will not be able to find the fonts needed.

Would installing this package allow to use this font and render all the characters properly?

This package does exist in Ubuntu 18.04 (bionic) --which is the one that we use in production: https://packages.ubuntu.com/bionic/fonts/fonts-wqy-zenhei

@iruletheworld
Copy link
Author

iruletheworld commented Oct 26, 2019

This is because with fontset = unbuntu, ctex would try to use the WenQunYi family (fonts-wqy-zenhei and others). But this font family is no longer shipped with Ubuntu (e.g. 18.0.4). Therefor Latex will not be able to find the fonts needed.

Would installing this package allow to use this font and render all the characters properly?

This package does exist in Ubuntu 18.04 (bionic) --which is the one that we use in production: https://packages.ubuntu.com/bionic/fonts/fonts-wqy-zenhei

The short answer is No. I have made a test repo. You can look into it for details. I would only state the results and conclusions here.

When using fontset = ubuntu with fonts-wqy-zenhei installed, WenQuanYi Zen Hei is used as the sans font. It doesn't look good but it resolves most of the characters. The problem is, with fontset = ubuntu, ctex defaults to the AR family for serif and italic. Then you have the same problem with HK TC and TW TC again.

Note that a missing character in this post is referred to and is rendered as a "tofu" (as Google calls it, basically a rectangular).

This picture is a comparison between HK TC and TW TC in serif

You can see that the 2nd char and the 5th to last char of the HK TC are rendered as tofu.

The followings are examples of Chinese italic (KaiTi)

The 2nd char is a tofu.

The 4th and 5th chars are tofu.

The last char is a tofu.

Conclusions

  • installing fonts-wqy-zenhei should be able to make the PDF build has all the pages but still with missing characters since the AR family is used for serif and italic, and it lacks fonts particularly for HK TC (currently readthedocs.org cannot build the PDF with all the pages with fontset = ubuntu)

  • if the user can accept a sans only PDF, then use \setCJKmainfont{Droid Sans Fallback} with either ctex or xeCJK. No need to install fonts-wqy-zenhei on the server (Droid Sans Fallback should be shipped with Ubuntu and ctex actually calls xeCJK).

  • the AR family is not able to support zh-hant (HK) and zh-hans at the same time (my previous test showed problems with zh-hans when using the AR family)

  • to get rid of all the "tofu", we need a "Noto" font family

By the way, if the AR family used by ctex is not installed already, they need to be installed also.

@humitos
Copy link
Member

humitos commented Oct 28, 2019

I'm impressed with all your analysis, thanks!

I still want to know if there is something that Read the Docs can do to help here and have a fully working PDF with all the Chinese characters (HK, TW and simplified). I understood that you can build the PDF in a perfect way in your local computer, so why we can't on RTD?

if the user can accept a sans only PDF, then use \setCJKmainfont{Droid Sans Fallback} with either ctex or xeCJK.

I understand that this seems the preferred way to suggest to our users, is that correct? At least it will have all the characters on their places and the PDF will build completely.

to get rid of all the "tofu", we need a "Noto" font family

Does this exist? If so, we can install it in our server and make your PDF happy 😄

Since I ran into this trap (specific to remote readthedocs.org PDF build), I propose to expand #5453 a bit. My proposal would use xeCJK instead of ctex.

Would you feel comfortable to make this changes by yourself and open a Pull Request? It seems that you have ton of experience here and I'm sure you will update it way better than myself.

Although, if these setup is very complex or does not cover most of the cases, we may want to keep the "if the user can accept a sans only PDF" solution by default, but expand the guide with this more specific solution for these particular cases.

@iruletheworld
Copy link
Author

Thanks for the prompt reply!

I still want to know if there is something that Read the Docs can do to help here and have a fully working PDF with all the Chinese characters (HK, TW and simplified). I understood that you can build the PDF in a perfect way in your local computer, so why we can't on RTD?

I found that there is a Ubuntu package fonts-noto-cjk (1:20170601+repack1-2) https://packages.ubuntu.com/source/bionic/fonts-noto-cjk which seems to be an repackaging of of Google's Noto CJK family. Since this is just Adobe's Source CJK in a different name (which is the one I use locally), it may very well be able to display all Chinese characters (well, all those normal people use, the full Unicode set is a bit over the top), regarless of the regional differences.

if the user can accept a sans only PDF, then use \setCJKmainfont{Droid Sans Fallback} with either ctex or xeCJK.

I understand that this seems the preferred way to suggest to our users, is that correct? At least it will have all the characters on their places and the PDF will build completely.

I think for most users, the substance comes before the style, and therefore this may be the preferred solution. Though some users may need the serif and italic for their reasons. Also, full sans is kinda valiate the typset customs for the Chinese language, but that should be much less of an issue.

to get rid of all the "tofu", we need a "Noto" font family

Does this exist? If so, we can install it in our server and make your PDF happy 😄

Google's distribution of Adobe's Source CJK series is named "Noto CJK" for that reason 😄 (for the overwhelming majority anyway).

Since I ran into this trap (specific to remote readthedocs.org PDF build), I propose to expand #5453 a bit. My proposal would use xeCJK instead of ctex.

Would you feel comfortable to make this changes by yourself and open a Pull Request? It seems that you have ton of experience here and I'm sure you will update it way better than myself.

Although, if these setup is very complex or does not cover most of the cases, we may want to keep the "if the user can accept a sans only PDF" solution by default, but expand the guide with this more specific solution for these particular cases.

Agreed. I hope the user would at least understand a bit about the difficulty of typesetting CJK. Perhaps, it should cover not just Chinese but also Japanese and Korean, which may make it quite complicated.

I think Japanese may even be more difficult to get right due to the mixture of Kanji (Chinese characters), Hiragana (consider them as lowercase phonetic syllabary) and Katakana (consider them as uppercase phonetic syllabary).

Korean should be consistent, since they have made quite an effort to get rid of the Chinese language after Japanese rule. Though they do use Chinese characters in some cases, these cases are quite limited.

Anyway, I do think the Noto CJK should be able to solve CJK characters problem in most cases. It actually maxes out the characters that can be placed in OTF. I just cannot say it is a sliver bullet until I can verify it in full or see other reliable reports.

The thing with Noto though, is that the filenames can vary depends on distributions or platform (since it is open source, people are free to repackage it and thus the problem). This introduce extra complication for \setCJKmainfonts{} since it needs the exact name.

For PR, I would like to test a bit more, so that it could be more definitive. I also hope to find a vlid solution for all CJK and not just TC and SC (help wanted!)

@stale
Copy link

stale bot commented Dec 12, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Dec 12, 2019
@stale stale bot closed this as completed Dec 19, 2019
@blueset
Copy link

blueset commented Jan 18, 2020

Would definitely like to see Noto CJK (fonts-noto-cjk, or even better fonts-noto-cjk-extra) appear in the next build environment image, and having them set as the default Serif and Sans-serif font for all CJK languages (or Chinese at least).

Adding on to the coverage of Noto CJK fonts, its currently most uniform and aesthetically appealing open-source solution for a pan CJK (super-) font family. It would cover a decent amount of characters used in all languages, taking care of the subtle design difference in each region, and should be good enough for most common usage of the languages.

For an uncommonly extensive coverage of Chinese characters (usually for rare character in names or academic research purposes), Hanazono would be a fallback choice that resides in the Debian packages repo as font-hanazono. Only drawback of this font is that Hanazono is designed based on Japanese conventions. Other fonts with extensive Chinese character coverages I am aware about are not free (neither gratis nor libre). Inclusion of Hanazono in the LaTeX preamble might not be necessary given the rare need of it and the consequence of its design disadvantage.

@humitos humitos reopened this Jan 20, 2020
@stale stale bot removed the Status: stale Issue will be considered inactive soon label Jan 20, 2020
@humitos
Copy link
Member

humitos commented Jan 20, 2020

Would definitely like to see Noto CJK (fonts-noto-cjk, or even better fonts-noto-cjk-extra) appear in the next build environment image, and having them set as the default Serif and Sans-serif font for all CJK languages (or Chinese at least).

@blueset I'm happy adding those font package to our image (I've already created a PR for that).

Also, what would be the default LaTeX preamble that we should include by default for Chinese language to use the most accurate font? How we test it?

@blueset
Copy link

blueset commented Jan 21, 2020

I have tested Simplified Chinese (zh-hans, zh_CN on RTD), Traditional Chinese (zh-trad, zh_TW on RTD) and Japanese (ja). I came out with the following config for each of the languages:

zh-hans

latex_elements = {
    "preamble": r"""
\usepackage[AutoFallBack=true]{xeCJK}
\setCJKmainfont{Noto Serif CJK SC}[Language=Chinese Simplified, BoldFont={* Bold}, ItalicFont=AR PL KaitiM GB]
\setCJKsansfont{Noto Sans CJK SC}[Language=Chinese Simplified, BoldFont={* Bold}, ItalicFont=AR PL KaitiM GB]
\setCJKmonofont{Noto Sans CJK SC}[Language=Chinese Simplified, BoldFont={* Bold}, ItalicFont=AR PL KaitiM GB]
\setCJKfallbackfamilyfont{\CJKrmdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKsfdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKttdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
"""
}

zh-hant (updated to solve # 2 below)

latex_elements = {
    "preamble": r"""
\usepackage[AutoFallBack=true]{xeCJK}
\setCJKmainfont{Noto Serif CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKsansfont{Noto Sans CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKmonofont{Noto Sans CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKfallbackfamilyfont{\CJKrmdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKsfdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKttdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\xeCJKEditPunctStyle{quanjiao}{optimize-kerning=true}
"""
}
About zh-hant-hk RTD currently doesn’t tell Hong Kong and Taiwan variants of Traditional Chinese apart, this portion would not contribute much. I would still leave it here in case anyone needs it. (updated to solve # 2 below)
latex_elements = {
    "preamble": r"""
\usepackage[AutoFallBack=true]{xeCJK}
\setCJKmainfont{Noto Serif CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]  % Noto Serif CJK HK is not yet available in the Debian/Ubuntu package repository
\setCJKsansfont{Noto Sans CJK HK}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKmonofont{Noto Sans CJK HK}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKfallbackfamilyfont{\CJKrmdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKsfdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKttdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\xeCJKEditPunctStyle{quanjiao}{optimize-kerning=true}
"""
}

ja

latex_engine = "uplatex"  # works with platex as well
latex_elements = {
    "preamble": r"""
\usepackage[uplatex,deluxe]{otf}
\usepackage[noto-otc]{pxchfon}
"""
}
To use with platex If platex is still to be used instead of uplatex for whatever reason:
latex_engine = "platex"  # works with platex as well
latex_elements = {
    "preamble": r"""
\usepackage[deluxe]{otf}
\usepackage[noto-otc]{pxchfon}
"""
}

* A third font is not needed in Japanese (like italics/Kai) as not much of a need is seen in Japanese typesetting.

Demo and testing

I’m not sure if there is a programmatic way to test if a PDF output contains the correct set of fonts for rendering. But for the sake of completeness, I have included a copy of TeX source, and PDF output I used to test these fonts.

These PDFs are produced on a readthedocs/build:5.0 docker container with extra fonts installed.

Some decisions and points in doubt

  1. OpenType feature ccmp in XeLaTeX.
    As you might have seen in the samples above, there is a long sequence reads “⿺辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂心” that doesn’t seem like Chinese. This is a feature unique to the Noto CJK/Source Han typefaces that replace the sequence into one (super complicated) character through the ccmp (Glyph Composition/Decomposition) GSUB feature. See this blog from Adobe for details. Some suggested that this works with XeLaTeX seemingly out of box, but I can get it to work on my script. @iruletheworld, do you have any experience on this?
  2. Punctuation style in zh-trad.
    Punctuation style is set to “plain” for zh-trad due to an awkward typesetting in default settings. This shouldn’t be much of an issue for general uses. This is potentially an issue with the xeCJK package. I have raised an issue there regarding this. Resolved.
  3. Use uplatex instead of platex for ja.
    According to sources [1] [2], uplatex is a variant of platex that supports Unicode (rather than the old JIS level 1). It thus works with a wider range characters that includes some “rare-but-not-so-rare” characters which often appears in names. This should be a drop-in change on the Sphinx level if user has not defined configurations otherwise (but don’t quote me on that).
  4. Multiple font weights.
    Despite xeCJK has multiple font weight support, no extra effort is made on that so as to align with the default behavior of other LaTeX setups. (otf + pxchfon comes with simple option to enable multiple weights.)

Unfortunately there isn’t much I can research on the Korean usages of TeX as I don’t speak their language. It would be much appreciated if anyone from the Korean TeX community can contribute their opinions on this.

I’m always open to any suggestions and opinions on this, especially from TeX users, and our friends speaking Chinese/Japanese/Korean. Let me know if there is any question.

@blueset
Copy link

blueset commented Jan 21, 2020

Preambles and samples for zh-trad and zh-trad-hk are updated as point # 2 above is resolved.

@humitos
Copy link
Member

humitos commented Jan 23, 2020

@blueset THANKS, this is amazing!

I've already opened a PR to install the package fonts you mentioned in a previous comment.

I can't guarantee that we are going to include these preambles by default on a Read the Docs build because they will probably need a lot of testing (and I'm not an expert on this topic to can manage it) but I'd like to add them as suggestion in our current guide https://docs.readthedocs.io/en/stable/guides/pdf-non-ascii-languages.html or an appendix of it.

I really appreciate the work that all of you have done in this topic and I hope we can manage in a better way all of these languages at Read the Docs 🌏

@stale
Copy link

stale bot commented Mar 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Mar 8, 2020
@blueset
Copy link

blueset commented Mar 9, 2020

Thank you for the effort! Including the fonts in the building environment is still better than nothing. Looking forward to readthedocs/build:7.0 joining the production environment!

@stale stale bot removed the Status: stale Issue will be considered inactive soon label Mar 9, 2020
@humitos
Copy link
Member

humitos commented Mar 9, 2020

@blueset actually, 7.0 is our current testing image (as it name says, it's for testing purposes only) and it would be awesome if you have some time and try to use these fonts from there and letting us know that it works :)

you can put build: image: testing in your configuration file to try it out

@blueset
Copy link

blueset commented Mar 19, 2020

you can put build: image: testing in your configuration file to try it out

@humitos
I have tried to enable my project to run on the testing image, but it seems like XeLaTeX cannot find the new fonts. How can I verify if I am on the testing image? Thanks.

The full build log is here FYI: https://readthedocs.org/api/v2/build/10643925.txt

@humitos
Copy link
Member

humitos commented Mar 19, 2020

@blueset I left a comment in your commit 😄. See ehForwarderBot/ehForwarderBot@dbb959a#r37923286

@blueset
Copy link

blueset commented Mar 20, 2020

Oops, I forgot to change the image name after copying.

Now everything works and the PDF output looks much better with the new set of fonts. Thank you for the effort!

@humitos
Copy link
Member

humitos commented Mar 20, 2020

@blueset wow! I'm very happy reading that 😄 --Thank you a lot for helping us debugging this issue and make Read the Docs better and improve our support with other fonts :)

If anything is missing here, I'd say that we can improve our Documentation Guide mentioning how these fonts can be configured but I think we can close this issue now.

@humitos humitos added Needed: documentation Documentation is required and removed Support Support question labels Apr 21, 2020
@stale
Copy link

stale bot commented Jun 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Jun 5, 2020
@humitos humitos added Accepted Accepted issue on our roadmap and removed Status: stale Issue will be considered inactive soon labels Jun 8, 2020
@humitos humitos closed this as completed Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Needed: documentation Documentation is required
Projects
None yet
Development

No branches or pull requests

4 participants