Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX: avoid quotes and TeX ligature replacements in PDF output #6891

Merged
merged 2 commits into from Dec 15, 2019

Conversation

jfbu
Copy link
Contributor

@jfbu jfbu commented Dec 5, 2019

Refs: #6890

But I had to leave out the hyphen because replacing it would break URLs containing it, e.g. the ones generated by the pep role.

Finally I also escape the hyphen to avoid en-dash and em-dash ligatures.

By the way I wonder if URLs should get some specific restricted TeX escaping, in a way similar to the highlighting variant of tex escaping.

Not clear what I meant.

About tex escaping of highlighted blocks, I had to be careful about the latter, because it applies not only to the body of code-block but also to its start ~~~ \begin{Verbatim}[commandchars=\\\{\}] ~~~ coming back from Pygments but that part may have options, separated by commas, and the comma ended up escaped. It would be simpler not to worry about this.

edit: Anyway, I opted later on to not escape the comma. Thus the problem is not urgent, but still, LaTeX escaping of options to \usepackage{Verbatim}[options,...] coming back from Pygments looks like not a correct thing to do, in principle. It does not harm currently but nevertheless I did separate ascii_tex_replacements which will not apply there.

Tex escaping of highlighted blocks makes sense only for Unicode replacements because anyway all special chars are either Pygments-escaped or are allowed to be there as is in Verbatim. But currently some Unicode replacements are still in the same map as escaping of TeX specials, so situation is not optimal.

Feature or Bugfix

  • Bugfix

@jfbu jfbu added this to the 2.3.0 milestone Dec 5, 2019
@codecov
Copy link

codecov bot commented Dec 5, 2019

Codecov Report

❗ No coverage uploaded for pull request base (2.0@d6f1351). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##             2.0    #6891   +/-   ##
======================================
  Coverage       ?   83.91%           
======================================
  Files          ?      271           
  Lines          ?    41932           
  Branches       ?     6039           
======================================
  Hits           ?    35189           
  Misses         ?     5401           
  Partials       ?     1342
Impacted Files Coverage Δ
sphinx/highlighting.py 83.63% <ø> (ø)
tests/test_markup.py 98.29% <ø> (ø)
sphinx/writers/latex.py 81.93% <100%> (ø)
sphinx/util/texescape.py 100% <100%> (ø)
tests/test_build_latex.py 95.84% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6f1351...e49e98a. Read the comment docs.

Copy link
Contributor Author

@jfbu jfbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested if this is ok with uplatex.

# escaping it to \textquotedbl would break documents using OT1
# Sphinx does \shorthandoff{"} to avoid problems with some languages
# Can't replace - by {}- because it breaks URLs
# ('-', '{}-'), # -- and --- are TeX ligatures
Copy link
Contributor Author

@jfbu jfbu Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some solution is needed to escape - to {}- only outside of URLs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested yet but escaping rather to \sphinxhyphen{} with a suitable definition of \sphinxhyphen and a re-definition of it inside \sphinxhref, \sphinxurl, \sphinxnolinkurl should do it wih no need of other change in Sphinx in particular no need to change current way by latex writer to handle hyperlinks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well actually I did move the needed changes to latex.py, not in sphinx.sty, for handling of hyperlinks

@@ -49,6 +43,22 @@
# OHM SIGN U+2126 is handled by LaTeX textcomp package
]

# A map to avoid TeX ligatures or character replacements in PDF output
# xelatex/lualatex/uplatex are handled differently (#5790, #6888)
ascii_tex_replacements = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer tex_ascii_replacements but the other one was already called unicode_tex_replacements ...

@@ -130,6 +140,10 @@ def init() -> None:
_tex_escape_map_without_unicode[ord(a)] = b
tex_replace_map[ord(a)] = '_'

for a, b in ascii_tex_replacements:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is that this will not be done for the _tex_hlescape_map and _tex_hlescape_map_without_unicode. One reason is that highlighter tex escaping includes the option of Verbatim which are comma separated and replacing , by {}, breaks sphinxVerbatim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this comment is now partially obsolete because the escaping of the comma has been removed)

# Sphinx does \shorthandoff{"} to avoid problems with some languages
# Can't replace - by {}- because it breaks URLs
# ('-', '{}-'), # -- and --- are TeX ligatures
(',', '{},'), # ,, is a TeX ligature in T1 encoding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is pushing too far doing that? , , in non literal text is probably a typo anyhow. And this change forced to adapt a few tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because commas are very frequent in text, escaping to \sphinxcomma{} is not very enticing. It will augment size of tex file. Overall I am actually tempted to do nothing with the comma, because avoiding the ,, T1-ligature does not seem important. Visually the ligature looks like ,, except it is a single character hence copying pasting from PDF will not give back ,,. +1 for actually not doing anything with comma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, the comma is not escaped anymore.

@@ -1416,7 +1416,7 @@ def test_default_latex_documents():
config.init_values()
config.add('latex_engine', None, True, None)
expected = [('index', 'stasi.tex', 'STASI™ Documentation',
r"Wolfgang Schäuble \& G'Beckstein.\@{}", 'manual')]
r"Wolfgang Schäuble \& G\textquotesingle{}Beckstein.\@{}", 'manual')]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the handling of author by tex escaping seems to be oblivious to smartquotes setting of course as it is not parsed by Docutils I believe. This thus is breaking change because people using a straight quote in author names saw a curly one in PDF output, but now it will be a straight one (independently of smartquotes setting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1; Let's do smartquotes in another PR.

@@ -49,6 +43,22 @@
# OHM SIGN U+2126 is handled by LaTeX textcomp package
]

# A map to avoid TeX ligatures or character replacements in PDF output
# xelatex/lualatex/uplatex are handled differently (#5790, #6888)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, shouldn't sphinx.util.texescape.escape() treats uplatex like xelatex and lualatex (untested). In fact the comment assumed that but it is not yet the case.

# escaping it to \textquotedbl would break documents using OT1
# Sphinx does \shorthandoff{"} to avoid problems with some languages
# Can't replace - by {}- because it breaks URLs
# ('-', '{}-'), # -- and --- are TeX ligatures
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested yet but escaping rather to \sphinxhyphen{} with a suitable definition of \sphinxhyphen and a re-definition of it inside \sphinxhref, \sphinxurl, \sphinxnolinkurl should do it wih no need of other change in Sphinx in particular no need to change current way by latex writer to handle hyperlinks.

(',', '{},'), # ,, is a TeX ligature in T1 encoding
# the next two require textcomp package
("'", r'\textquotesingle{}'), # else ' renders curly, and '' is a ligature
('`', r'\textasciigrave{}'), # else \` and \`\` render curly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be possible to allow line-breaks in parsed literals at the left and right ticks by updating \sphinxbreaksattexescapedchars in sphinx.sty but this would be easier if escaping here is \sphinxquotesingle and \sphinxasciigrave.

Actually all escaping should be to \sphinx... prefixed macros for flexibility in obtaining special effects simply by redefining them.

# Sphinx does \shorthandoff{"} to avoid problems with some languages
# Can't replace - by {}- because it breaks URLs
# ('-', '{}-'), # -- and --- are TeX ligatures
(',', '{},'), # ,, is a TeX ligature in T1 encoding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because commas are very frequent in text, escaping to \sphinxcomma{} is not very enticing. It will augment size of tex file. Overall I am actually tempted to do nothing with the comma, because avoiding the ,, T1-ligature does not seem important. Visually the ligature looks like ,, except it is a single character hence copying pasting from PDF will not give back ,,. +1 for actually not doing anything with comma.

@@ -1879,8 +1887,18 @@
\let\sphinxemail \@firstofone
\let\sphinxcrossref \@firstofone
\let\sphinxtermref \@firstofone
\def\sphinxhyphen#1{-}% the #1 is there to gobble the {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually it should be without the #1 gobbling the {} else hyperref seems to encode -- into a Unicode en-dash in the PDF bookmarks, whereas a section title with -{}-{} does give two hyphens also in bookmarks of PDF. Will update after some additional testing.

@@ -487,7 +487,7 @@ def __init__(self, document: nodes.document, builder: "LaTeXBuilder") -> None:
self.first_document = 1
self.this_is_the_title = 1
self.literal_whitespace = 0
self.no_contractions = 0
self.no_contractions = 0 # not used
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering is this needed some deprecation and how actually achieve that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this member by property.
This is an example.

sphinx/sphinx/writers/latex.py

Lines 2300 to 2304 in 24908e0

@property
def in_container_literal_block(self) -> int:
warnings.warn('LaTeXTranslator.in_container_literal_block is deprecated.',
RemovedInSphinx30Warning, stacklevel=2)
return 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -487,7 +487,7 @@ def __init__(self, document: nodes.document, builder: "LaTeXBuilder") -> None:
self.first_document = 1
self.this_is_the_title = 1
self.literal_whitespace = 0
self.no_contractions = 0
self.no_contractions = 0 # not used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this member by property.
This is an example.

sphinx/sphinx/writers/latex.py

Lines 2300 to 2304 in 24908e0

@property
def in_container_literal_block(self) -> int:
warnings.warn('LaTeXTranslator.in_container_literal_block is deprecated.',
RemovedInSphinx30Warning, stacklevel=2)
return 0

@@ -1416,7 +1416,7 @@ def test_default_latex_documents():
config.init_values()
config.add('latex_engine', None, True, None)
expected = [('index', 'stasi.tex', 'STASI™ Documentation',
r"Wolfgang Schäuble \& G'Beckstein.\@{}", 'manual')]
r"Wolfgang Schäuble \& G\textquotesingle{}Beckstein.\@{}", 'manual')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1; Let's do smartquotes in another PR.

@jfbu
Copy link
Contributor Author

jfbu commented Dec 15, 2019

Thanks @tk0miya for reviewing and advice. It seems Travis testing fails with some internal error for Python 3.8 also for #6908. To not interfere with your release work I prefer to leave merging to you 😄

@tk0miya
Copy link
Member

tk0miya commented Dec 15, 2019

Ya, I also saw the error. It seems the error is raised inside pytest-cov. Now I'm working for it.
I believe it would not be related with our product. But I must leave from internet a few hours. So please wait a moment.

@tk0miya
Copy link
Member

tk0miya commented Dec 15, 2019

Hmm... coverage package is released just today. I guess this might be related with this problem.
https://pypi.org/project/coverage/#history

@tk0miya
Copy link
Member

tk0miya commented Dec 15, 2019

I posted #6924 to investigate the error. But it's time over. I'm away from keyboard for a while. See you later :-)

@jfbu
Copy link
Contributor Author

jfbu commented Dec 15, 2019 via email

@tk0miya
Copy link
Member

tk0miya commented Dec 15, 2019

@jfbu Could you rebase or merge HEAD of 2.0 branch please?

@jfbu
Copy link
Contributor Author

jfbu commented Dec 15, 2019

ok please wait a minute or two

Refs: sphinx-doc#6890

The comma character is not TeX-escaped because it is frequent in general
text and escaping it would make the LaTeX output larger for only dealing
with the problem of the LaTeX-ligature of ,, into a single character.
And one there is problem with the commas in options to Verbatim from
PygmentsBridge.

The hyphen character is escaped (not in ids and URIs!) to
\sphinxhyphen{} for both Unicode and non-Unicode engines. This is needed
to work around hyperref transforming -- and --- from section titles into
EN DASH resp. EM DASH in PDF bookmarks.

latex3/hyperref#112

Note to expert LaTeX users: if Sphinx latex user with xelatex has

- turned off Smart Quotes for some reason,

- but does want TeX ligatures and thus overrode Sphinx
latex_elements['fontenc'] default (since sphinx-doc#6888) to this effect,

then this should be added to LaTeX preamble:

    \def\sphinxhyphen#1{-}% (\protected is now not needed)
    \let\sphinxhyphenforbookmarks\sphinxhyphen
@jfbu
Copy link
Contributor Author

jfbu commented Dec 15, 2019

Rebased and squashed commits.

"1 or minute or 2" actually meant 12 minutes in good French 😄

@tk0miya
Copy link
Member

tk0miya commented Dec 15, 2019

OMG, Travis CI is just under maintenance now... But we've already know this PR has been already passed. So I'll merge this without Travis CI.
https://travis-ci.org/sphinx-doc/sphinx

@jfbu
Copy link
Contributor Author

jfbu commented Dec 15, 2019

Ready for merge... thanks for reviewing and good luck with release work, now leaving ;-)

@tk0miya tk0miya merged commit 0760136 into sphinx-doc:2.0 Dec 15, 2019
@tk0miya
Copy link
Member

tk0miya commented Dec 15, 2019

Merged. Thank you always!

@jfbu jfbu deleted the latex_6890_curlyquote branch February 10, 2020 08:31
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants