New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LaTeX: avoid quotes and TeX ligature replacements in PDF output #6891
Conversation
Codecov Report
@@ Coverage Diff @@
## 2.0 #6891 +/- ##
======================================
Coverage ? 83.91%
======================================
Files ? 271
Lines ? 41932
Branches ? 6039
======================================
Hits ? 35189
Misses ? 5401
Partials ? 1342
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested if this is ok with uplatex.
sphinx/util/texescape.py
Outdated
# escaping it to \textquotedbl would break documents using OT1 | ||
# Sphinx does \shorthandoff{"} to avoid problems with some languages | ||
# Can't replace - by {}- because it breaks URLs | ||
# ('-', '{}-'), # -- and --- are TeX ligatures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some solution is needed to escape - to {}- only outside of URLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested yet but escaping rather to \sphinxhyphen{}
with a suitable definition of \sphinxhyphen
and a re-definition of it inside \sphinxhref
, \sphinxurl
, \sphinxnolinkurl
should do it wih no need of other change in Sphinx in particular no need to change current way by latex writer to handle hyperlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well actually I did move the needed changes to latex.py, not in sphinx.sty, for handling of hyperlinks
@@ -49,6 +43,22 @@ | |||
# OHM SIGN U+2126 is handled by LaTeX textcomp package | |||
] | |||
|
|||
# A map to avoid TeX ligatures or character replacements in PDF output | |||
# xelatex/lualatex/uplatex are handled differently (#5790, #6888) | |||
ascii_tex_replacements = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer tex_ascii_replacements but the other one was already called unicode_tex_replacements ...
@@ -130,6 +140,10 @@ def init() -> None: | |||
_tex_escape_map_without_unicode[ord(a)] = b | |||
tex_replace_map[ord(a)] = '_' | |||
|
|||
for a, b in ascii_tex_replacements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is that this will not be done for the _tex_hlescape_map and _tex_hlescape_map_without_unicode. One reason is that highlighter tex escaping includes the option of Verbatim which are comma separated and replacing ,
by {},
breaks sphinxVerbatim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this comment is now partially obsolete because the escaping of the comma has been removed)
sphinx/util/texescape.py
Outdated
# Sphinx does \shorthandoff{"} to avoid problems with some languages | ||
# Can't replace - by {}- because it breaks URLs | ||
# ('-', '{}-'), # -- and --- are TeX ligatures | ||
(',', '{},'), # ,, is a TeX ligature in T1 encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is pushing too far doing that? , , in non literal text is probably a typo anyhow. And this change forced to adapt a few tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because commas are very frequent in text, escaping to \sphinxcomma{}
is not very enticing. It will augment size of tex file. Overall I am actually tempted to do nothing with the comma, because avoiding the ,,
T1-ligature does not seem important. Visually the ligature looks like ,,
except it is a single character hence copying pasting from PDF will not give back ,,
. +1 for actually not doing anything with comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, the comma is not escaped anymore.
@@ -1416,7 +1416,7 @@ def test_default_latex_documents(): | |||
config.init_values() | |||
config.add('latex_engine', None, True, None) | |||
expected = [('index', 'stasi.tex', 'STASI™ Documentation', | |||
r"Wolfgang Schäuble \& G'Beckstein.\@{}", 'manual')] | |||
r"Wolfgang Schäuble \& G\textquotesingle{}Beckstein.\@{}", 'manual')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the handling of author by tex escaping seems to be oblivious to smartquotes setting of course as it is not parsed by Docutils I believe. This thus is breaking change because people using a straight quote in author names saw a curly one in PDF output, but now it will be a straight one (independently of smartquotes setting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1; Let's do smartquotes in another PR.
@@ -49,6 +43,22 @@ | |||
# OHM SIGN U+2126 is handled by LaTeX textcomp package | |||
] | |||
|
|||
# A map to avoid TeX ligatures or character replacements in PDF output | |||
# xelatex/lualatex/uplatex are handled differently (#5790, #6888) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, shouldn't sphinx.util.texescape.escape() treats uplatex like xelatex and lualatex (untested). In fact the comment assumed that but it is not yet the case.
sphinx/util/texescape.py
Outdated
# escaping it to \textquotedbl would break documents using OT1 | ||
# Sphinx does \shorthandoff{"} to avoid problems with some languages | ||
# Can't replace - by {}- because it breaks URLs | ||
# ('-', '{}-'), # -- and --- are TeX ligatures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested yet but escaping rather to \sphinxhyphen{}
with a suitable definition of \sphinxhyphen
and a re-definition of it inside \sphinxhref
, \sphinxurl
, \sphinxnolinkurl
should do it wih no need of other change in Sphinx in particular no need to change current way by latex writer to handle hyperlinks.
(',', '{},'), # ,, is a TeX ligature in T1 encoding | ||
# the next two require textcomp package | ||
("'", r'\textquotesingle{}'), # else ' renders curly, and '' is a ligature | ||
('`', r'\textasciigrave{}'), # else \` and \`\` render curly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be possible to allow line-breaks in parsed literals at the left and right ticks by updating \sphinxbreaksattexescapedchars
in sphinx.sty but this would be easier if escaping here is \sphinxquotesingle
and \sphinxasciigrave
.
Actually all escaping should be to \sphinx...
prefixed macros for flexibility in obtaining special effects simply by redefining them.
sphinx/util/texescape.py
Outdated
# Sphinx does \shorthandoff{"} to avoid problems with some languages | ||
# Can't replace - by {}- because it breaks URLs | ||
# ('-', '{}-'), # -- and --- are TeX ligatures | ||
(',', '{},'), # ,, is a TeX ligature in T1 encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because commas are very frequent in text, escaping to \sphinxcomma{}
is not very enticing. It will augment size of tex file. Overall I am actually tempted to do nothing with the comma, because avoiding the ,,
T1-ligature does not seem important. Visually the ligature looks like ,,
except it is a single character hence copying pasting from PDF will not give back ,,
. +1 for actually not doing anything with comma.
085068e
to
098d1d7
Compare
sphinx/texinputs/sphinx.sty
Outdated
@@ -1879,8 +1887,18 @@ | |||
\let\sphinxemail \@firstofone | |||
\let\sphinxcrossref \@firstofone | |||
\let\sphinxtermref \@firstofone | |||
\def\sphinxhyphen#1{-}% the #1 is there to gobble the {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well actually it should be without the #1
gobbling the {}
else hyperref seems to encode --
into a Unicode en-dash in the PDF bookmarks, whereas a section title with -{}-{}
does give two hyphens also in bookmarks of PDF. Will update after some additional testing.
sphinx/writers/latex.py
Outdated
@@ -487,7 +487,7 @@ def __init__(self, document: nodes.document, builder: "LaTeXBuilder") -> None: | |||
self.first_document = 1 | |||
self.this_is_the_title = 1 | |||
self.literal_whitespace = 0 | |||
self.no_contractions = 0 | |||
self.no_contractions = 0 # not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering is this needed some deprecation and how actually achieve that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this member by property.
This is an example.
sphinx/sphinx/writers/latex.py
Lines 2300 to 2304 in 24908e0
@property | |
def in_container_literal_block(self) -> int: | |
warnings.warn('LaTeXTranslator.in_container_literal_block is deprecated.', | |
RemovedInSphinx30Warning, stacklevel=2) | |
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sphinx/writers/latex.py
Outdated
@@ -487,7 +487,7 @@ def __init__(self, document: nodes.document, builder: "LaTeXBuilder") -> None: | |||
self.first_document = 1 | |||
self.this_is_the_title = 1 | |||
self.literal_whitespace = 0 | |||
self.no_contractions = 0 | |||
self.no_contractions = 0 # not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this member by property.
This is an example.
sphinx/sphinx/writers/latex.py
Lines 2300 to 2304 in 24908e0
@property | |
def in_container_literal_block(self) -> int: | |
warnings.warn('LaTeXTranslator.in_container_literal_block is deprecated.', | |
RemovedInSphinx30Warning, stacklevel=2) | |
return 0 |
@@ -1416,7 +1416,7 @@ def test_default_latex_documents(): | |||
config.init_values() | |||
config.add('latex_engine', None, True, None) | |||
expected = [('index', 'stasi.tex', 'STASI™ Documentation', | |||
r"Wolfgang Schäuble \& G'Beckstein.\@{}", 'manual')] | |||
r"Wolfgang Schäuble \& G\textquotesingle{}Beckstein.\@{}", 'manual')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1; Let's do smartquotes in another PR.
Thanks @tk0miya for reviewing and advice. It seems Travis testing fails with some internal error for Python 3.8 also for #6908. To not interfere with your release work I prefer to leave merging to you 😄 |
Ya, I also saw the error. It seems the error is raised inside pytest-cov. Now I'm working for it. |
Hmm... coverage package is released just today. I guess this might be related with this problem. |
I posted #6924 to investigate the error. But it's time over. I'm away from keyboard for a while. See you later :-) |
Thanks, I promise I won't break anything while you are offline 😉
|
@jfbu Could you rebase or merge HEAD of 2.0 branch please? |
ok please wait a minute or two |
Refs: sphinx-doc#6890 The comma character is not TeX-escaped because it is frequent in general text and escaping it would make the LaTeX output larger for only dealing with the problem of the LaTeX-ligature of ,, into a single character. And one there is problem with the commas in options to Verbatim from PygmentsBridge. The hyphen character is escaped (not in ids and URIs!) to \sphinxhyphen{} for both Unicode and non-Unicode engines. This is needed to work around hyperref transforming -- and --- from section titles into EN DASH resp. EM DASH in PDF bookmarks. latex3/hyperref#112 Note to expert LaTeX users: if Sphinx latex user with xelatex has - turned off Smart Quotes for some reason, - but does want TeX ligatures and thus overrode Sphinx latex_elements['fontenc'] default (since sphinx-doc#6888) to this effect, then this should be added to LaTeX preamble: \def\sphinxhyphen#1{-}% (\protected is now not needed) \let\sphinxhyphenforbookmarks\sphinxhyphen
1742f5c
to
e49e98a
Compare
Rebased and squashed commits. "1 or minute or 2" actually meant 12 minutes in good French 😄 |
OMG, Travis CI is just under maintenance now... But we've already know this PR has been already passed. So I'll merge this without Travis CI. |
Ready for merge... thanks for reviewing and good luck with release work, now leaving ;-) |
Merged. Thank you always! |
Refs: #6890
But I had to leave out the hyphen because replacing it would break URLs containing it, e.g. the ones generated by the pep role.Finally I also escape the hyphen to avoid en-dash and em-dash ligatures.
By the way I wonder if URLs should get some specific restricted TeX escaping, in a way similar to the highlighting variant of tex escaping.Not clear what I meant.
About tex escaping of highlighted blocks, I had to be careful about the latter, because it applies not only to the body of code-block but also to its start ~~~ \begin{Verbatim}[commandchars=\\\{\}] ~~~ coming back from Pygments but that part may have options, separated by commas, and the comma ended up escaped. It would be simpler not to worry about this.edit: Anyway, I opted later on to not escape the comma. Thus the problem is not urgent, but still, LaTeX escaping of options to
\usepackage{Verbatim}[options,...]
coming back from Pygments looks like not a correct thing to do, in principle. It does not harm currently but nevertheless I did separateascii_tex_replacements
which will not apply there.Tex escaping of highlighted blocks makes sense only for Unicode replacements because anyway all special chars are either Pygments-escaped or are allowed to be there as is in Verbatim. But currently some Unicode replacements are still in the same map as escaping of TeX specials, so situation is not optimal.
Feature or Bugfix