Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser agnostic i18n post transform #12238

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

n-peugnet
Copy link
Contributor

@n-peugnet n-peugnet commented Apr 7, 2024

This is a proof of concept to fix #8852 based on my idea : #8852 (comment)

Feature or Bugfix

  • Bugfix

Purpose

By adding a parse_inline() function to the Parser, we can get rid of all the RST specific hacks that the i18n post transform contained:

  • The title hack is not needed any-more, since we only parse inline elements.
  • The literal block hack has been moved to RSTParser.parse_inline, leaving the post_transform parser agnostic.

I also successfully implemented this function in MyST-Parser which is I think the second most used Sphinx parser: executablebooks/MyST-Parser@master...n-peugnet:MyST-Parser:add-parse-inline

Detail

  • I am not yet sure about the API and I am absolutely open to discussions about this.
  • In the process of removing RST specific code, I had to stop emitting RST code for images (now they won't be parsed anyway with parse_inline()), so I chose to instead only emit the url as the message to be translated.
  • I checked with diff -r the results of Sphinx's doc's french translation and the result is identical except for the autodoc and python module parts. To make this check I recommend doing a git reset master to keep the same commit hash (otherwise a lot of pages are different) and to comment out the non-implemented parse_inline() method of the Parser class. This allows to keep the same inventory as the master branch.

Relates

From my testing, it allows to fix at least executablebooks/MyST-Parser#852, but could potentially fix all the issues linked in #8852 (didn't check yet).

Fixes #8852
Fixes executablebooks/MyST-Parser#444
Fixes executablebooks/MyST-Parser#852
Fixes #12287

Comment on lines 75 to 89
self.statemachine = states.RSTStateMachine(
state_classes=self.state_classes,
initial_state='Text',
debug=document.reporter.debug_flag,
)

inputlines = StringList([inputstring], document.current_source)

self.decorate(inputlines)
self.statemachine.run(inputlines, document, inliner=self.inliner)
self.finish_parse()
if has_literal:
p = document[0]
assert isinstance(p, nodes.paragraph)
p += nodes.Text(':')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this was my initial idea, but it was a lot more complicated to implement. I don't remember if I managed to make it work in the end, but if I did, then it had the same result as using Text as the initial_state (the :: without literal block were still causing issues), as I did on line 77

initial_state='Text',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes I indeed managed to make it work, but it looked like this:

diff --git a/sphinx/parsers.py b/sphinx/parsers.py
index 09ee7e8ff..a99c6d498 100644
--- a/sphinx/parsers.py
+++ b/sphinx/parsers.py
@@ -7,7 +7,7 @@ from typing import TYPE_CHECKING
 import docutils.parsers
 import docutils.parsers.rst
 from docutils import nodes
-from docutils.parsers.rst import states
+from docutils.parsers.rst import states, languages
 from docutils.statemachine import StringList
 from docutils.transforms.universal import SmartQuotes
 
@@ -71,18 +71,26 @@ class RSTParser(docutils.parsers.rst.Parser, Parser):
         if has_literal:
             inputstring = inputstring[:-2]
 
-        self.setup_parse(inputstring, document)  # type: ignore[arg-type]
-        self.statemachine = states.RSTStateMachine(
-            state_classes=self.state_classes,
-            initial_state='Text',
-            debug=document.reporter.debug_flag,
-        )
-
-        inputlines = StringList([inputstring], document.current_source)
-
-        self.decorate(inputlines)
-        self.statemachine.run(inputlines, document, inliner=self.inliner)
-        self.finish_parse()
+        language = languages.get_language(
+            document.settings.language_code, document.reporter)
+        if self.inliner is None:
+            inliner = states.Inliner()
+        else:
+            inliner = self.inliner
+        inliner.init_customizations(document.settings)
+        memo = states.Struct(document=document,
+                             reporter=document.reporter,
+                             language=language,
+                             title_styles=[],
+                             section_level=0,
+                             section_bubble_up_kludge=False,
+                             inliner=inliner)
+        memo.reporter.get_source_and_line = lambda x: (document.source, x)
+        textnodes, _ = inliner.parse(inputstring, 1, memo, document)
+        p = nodes.paragraph(inputstring, '', *textnodes)
+        p.source = document.source
+        p.line = 1
+        document.append(p)
         if has_literal:
             p = document[0]
             assert isinstance(p, nodes.paragraph)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it had the same result as using Text

Text also passes definition lists and section titles; its definitely much cleaner to use the proper inline parsing, even if docutils does not make this as easy 😒

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textnodes, _ = inliner.parse(inputstring, 1, memo, document)

here you should also have parse_inline take the actual line number and use that, plus the _ is system_message nodes that should be appended to the paragraph

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but since the message is translated in .po files, there is no real source line number to provide.

Ok, but here you are proposing to add a "generic" RstParser.parse_inline method, so it needs to handle more than just this special use case, i.e. you can't rely on the line being at the top of the source file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll try to add it, but not sure how to test it.

Well.. if this proposal is to be accepted, then really it needs to have proper "generic" tests, not just specific to the i18n use case, as obviously it could be used for other use cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I get it. Sorry I was too focused on the problem I was trying to solve, but you are right.

Regardless of the implementation details, what do you think about this proposal? As this is more of a proof of concept than a finished product.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I mean, coming from the MyST perspective, in principle I am certainly in favor of removing "rST hard-coded" aspects of the code base 😄 (the other big problematic aspect of sphinx for this is executablebooks/MyST-Parser#228)

But indeed, it is quite a "core" addition to sphinx, more broad reaching than just this use case,
so I would obviously want to be very careful (and have good agreement from other maintainers) before merging anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I marked it as "draft" to make it clearer that it is not ready yet.

@n-peugnet n-peugnet marked this pull request as draft April 8, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants