Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build PDF files using latexmk #5437

Merged
merged 6 commits into from Mar 18, 2019
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Mar 12, 2019

This PR makes a Japanese project to build properly. It would be awesome to have more projects as examples to test here. Although, since this code is behind a feature flag we can merge without worry too much for now on making it work perfectly.

I'm referencing these issues/prs to be closed here since most of the requests are covered in this PR. It may be some things missings, like supporting xindy or using environment variables like XINDYOPTS.

Closes #1556
Closes #4454
Closes #5405

@humitos humitos requested a review from a team March 12, 2019 17:35
New versions of Sphinx use `latexmk` to build the PDF files. This
command uses a file called `latexmkrc` (or `latexmkjarc` for Japanese)
which contains all the proper commands that needs to be ran depending
on different Sphinx configurations.

`latexmk` will take care by itself on the amount of phases that need
to be ran without us worrying about it.

Currently, this is not considering LATEXMKOPTS and XINDYOPTS
environment variables configured by Sphinx.

This feature is implemented under a Feature flag so we can test it
easily without breaking other working projects.

References:

- #1556
- #4454
- #5405
- toppers/tecs-docs#7
- https://github.com/sphinx-doc/sphinx/blob/master/sphinx/texinputs/Makefile_t
- https://www.sphinx-doc.org/en/master/usage/builders/index.html#sphinx.builders.latex.LaTeXBuilder
# ``latex_engine`` is ``platex``
pdfs = []
if self.project.language == 'ja':
pdfs = latex_path.glob('*.pdf')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this only required by ja lang? Don't we already run this whole function under a feature flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Japanese language is the only one that requires this extra step. I don't know exactly why but most of the documentation that I read differentiate this language from the others. I suppose it's because it mix kanji (Chinese) with its own symbols.

I took this step from the Makefile generated by Sphinx when the language is Japanese.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above github comment should be a code comment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Japanese language uses latex+dvipdfmx path. Until TeXLive 2015 release, extractbb step was needed. But it is not needed with newer dvipdfmx binary. Here is a quote of an exchange I had in 2017 on TeXLive mailing list:

Can someone confirm that since TL2014 there is no need
on Unix-like systems to prepare beforehand .xbb
files for graphics inclusion of .png, .jpeg, ...,
when doing latex+dvipdfmx or platex+dvipdfmx ?
(and that it was either needed before, or at least
shell-escape had to be enabled) ?

I remember xbb files were necessary for TL2014 or earlier.
Japanese people has been heavily using dvipdfmx, so they added
extractbb to their local texmf.cnf by hand

...

Thanks for the information ! This is for support
of some other software, now I know they can avoid
the explicit extractbb calls but under assumption
of a TL2015 or more recent install,

Thus Sphinx may soon drop the extractbb and make the Japanese set-up in Makefile less singular: indexe 2.0 release requires "LaTeX builder now depends on TeX Live 2015 or above." but we forgot to actually remove the extractbb dependency as we don't want either to break too easily user projects with older TeX installations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have prepared sphinx-doc/sphinx#6189 for Sphinx to remove the extra handling of image files for Japanese language by extractbb as it is superfluous for TL2015 or later based TeX distribution. Not sure it will be merged during 2.x series.

Copy link
Member

@stsewd stsewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature flag should be documented so users that need it can request it to us

if self.project.language == 'ja':
pdfs = latex_path.glob('*.pdf')

for image in itertools.chain(images, pdfs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image and pdfs are not related I guess, but not sure about this step

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

images may be provided in pdf format: in fact this is the preferred format for image inclusion

From sphinx doc here is the list of file extensions for images in order of preference for LaTeX:

supported_image_types = ['application/pdf', 'image/png', 'image/jpeg']

'-r',
rcfile,

# FIXME: check for platex here as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about platex and latexmk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latexmk uses internally pdflatex or xelatex or lualatex or platex (defined by latex_engine

cwd=latex_cwd,
)

self.pdf_file_name = f'{self.project.slug}.pdf'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from? From the -jobname option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@humitos
Copy link
Member Author

humitos commented Mar 13, 2019

This feature flag should be documented so users that need it can request it to us

Yes. I don't want to expose this yet since we don't have too much working examples. Although, I plan to write a guide on our docs explaining what are the required steps from the users to build PDF for these languages.

@humitos
Copy link
Member Author

humitos commented Mar 13, 2019

It would be awesome to have more projects as examples to test here

I tested some projects today and I was able to build most of them. Some of them did not build because other problems not related to this PR (missing png file, invalid rst table on the docs, or similar)

Automatically select `xelatex` for Chinese and `platex` for Japanese.
These defaults can be overridden by the user.

Force `latex_use_xindy=False` for now until we support it in our
Docker image.
@humitos
Copy link
Member Author

humitos commented Mar 18, 2019

This feature flag should be documented so users that need it can request it to us

I added this since I feel more comfortable with these code after some testing and also because I added a default configuration in our conf.py.tmpl which does not need user intervention.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Is the goal to turn this on for all Chinese & Japanese languages by default after we test it behind the feature flag?

Does this require additional ops or docker changes, to ensure we have the proper binaries installed?

# ``latex_engine`` is ``platex``
pdfs = []
if self.project.language == 'ja':
pdfs = latex_path.glob('*.pdf')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above github comment should be a code comment.

'cat',
rcfile,
cwd=latex_cwd,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to output to the user, or some other reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is only to show the content of the latexmkrc file to the user.

I want to show this because that file is built by Sphinx depending on some configurations. Also, this is the file that will say what command execute to build the PDF in the end. So, without this output will be very hard to debug a problem.

@humitos
Copy link
Member Author

humitos commented Mar 18, 2019

Seems reasonable. Is the goal to turn this on for all Chinese & Japanese languages by default after we test it behind the feature flag?

I'd say yes. Although, not only for Chinese & Japanese, but for all the languages. This is the default way of building PDF on Sphinx >=1.6 and seems to be more robust that the one that we are using. Also, it allows customization using Sphinx common configuration options.

Does this require additional ops or docker changes, to ensure we have the proper binaries installed?

No. It shouldn't.

@ericholscher ericholscher merged commit 11a674f into master Mar 18, 2019
@delete-merged-branch delete-merged-branch bot deleted the humitos/build-pdf-latexmk branch March 18, 2019 18:55
Copy link

@jfbu jfbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have provided some comments for background LaTeX info... it looks great that you merged this, but as I said I can not help much on Chinese related matters.

project_language in ('zh_CN', 'zh_TW'),
])

japanase = any([
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the spelling japanase intended here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a typo!

'preamble': '\\usepackage[UTF8]{ctex}\n',
}
latex_elements = latex_elements_user or latex_elements_rtd
elif japanase:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf my prior comment about spelling

latex_use_xindy = False

latex_elements_rtd = {
'preamble': '\\usepackage[UTF8]{ctex}\n',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly, I am not at all competent in Chinese. I don't know how Chinese users of LaTeX go about producing indices... Xindy does not seem to have any special support for CJK languages, but at least it is Unicode aware contrarily to makeindex. There is zhmakeindex but its usage has not been incorporated to Sphinx, I am not aware of PRs about this.

The Japanese language is handled separately by Sphinx because it uses a specific LaTeX binary, platex and a specific indexing binary mendex. But they are not Unicode aware, so mixing Japanese with European languages is not easy/feasible apart from English as it only needs ascii range. In future Sphinx will switch presumably to uplatex+upmendex for Japanese support (see e.g. sphinx-doc/sphinx#4187 (comment)).

if self.project.language == 'ja':
pdfs = latex_path.glob('*.pdf')

for image in itertools.chain(images, pdfs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

images may be provided in pdf format: in fact this is the preferred format for image inclusion

From sphinx doc here is the list of file extensions for images in order of preference for LaTeX:

supported_image_types = ['application/pdf', 'image/png', 'image/jpeg']

# ``latex_engine`` is ``platex``
pdfs = []
if self.project.language == 'ja':
pdfs = latex_path.glob('*.pdf')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Japanese language uses latex+dvipdfmx path. Until TeXLive 2015 release, extractbb step was needed. But it is not needed with newer dvipdfmx binary. Here is a quote of an exchange I had in 2017 on TeXLive mailing list:

Can someone confirm that since TL2014 there is no need
on Unix-like systems to prepare beforehand .xbb
files for graphics inclusion of .png, .jpeg, ...,
when doing latex+dvipdfmx or platex+dvipdfmx ?
(and that it was either needed before, or at least
shell-escape had to be enabled) ?

I remember xbb files were necessary for TL2014 or earlier.
Japanese people has been heavily using dvipdfmx, so they added
extractbb to their local texmf.cnf by hand

...

Thanks for the information ! This is for support
of some other software, now I know they can avoid
the explicit extractbb calls but under assumption
of a TL2015 or more recent install,

Thus Sphinx may soon drop the extractbb and make the Japanese set-up in Makefile less singular: indexe 2.0 release requires "LaTeX builder now depends on TeX Live 2015 or above." but we forgot to actually remove the extractbb dependency as we don't want either to break too easily user projects with older TeX installations.

@humitos
Copy link
Member Author

humitos commented Mar 19, 2019

@jfbu thanks for your feedback on this!

I don't have too much experience with LaTeX and building other languages that are not Spanish or English. So, I decided to follow what current stable Sphinx version was doing instead of "propose something better/polished" to avoid entering new and different bugs to the ones that Sphinx already have.

Once Sphinx 2.x is released, we could refactor these steps and make them work better but for now I prefer to keep it stable.

From my tests locally I can say that I was able to build PDF in Chinese, Japanese and English without problem. So, this is way better to what we had.

Let's see how it goes in production with real projects and we can iterate from there.

@humitos humitos mentioned this pull request Mar 19, 2019
imphil added a commit to opensocdebug/osd-doc that referenced this pull request Mar 28, 2020
ReadTheDocs switched from calling pdflatex directly to using latexmk.
This change broke the previous workaround that only the last-built PDF
file was made available on RTD. Use a more explicit approach instead to
fix that.

Changed upstream in readthedocs/readthedocs.org#5437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants