Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'updating environment' intensively computed and pausing for a long time #7840

Closed
ghost opened this issue Jun 15, 2020 · 3 comments
Closed

'updating environment' intensively computed and pausing for a long time #7840

ghost opened this issue Jun 15, 2020 · 3 comments

Comments

@ghost
Copy link

ghost commented Jun 15, 2020

Describe the bug
running command:

$ sphinx-build -b html -D language=<my 2 digits language code> ./manual "build/html<language_location>"

and always encountered the 'updating environment' and computer pausing for a longtime.

To Reproduce
Go here:
https://docs.blender.org/manual/en/2.83/about/index.html

and follows the 'Install', 'Build' instructions for the OS you have.

Then go here:
https://docs.blender.org/manual/en/2.83/about/contribute/translations/contribute.html

to read how to download a foreign language set of your choice. You can see the list of languages available at the moment at this link:
https://svn.blender.org/svnroot/bf-manual-translations/trunk/blender_docs/locale/

Configure the file

manual/conf.py

and choose the language matches the set you've downloaded, then in the 'blender_docs' directory, run the command:

make html

now jump into the

blender_docs/locale/<language you download>/LC_MESSAGES

and edit any files.

then run:

$ sphinx-build -b html -D language=<2 digits language code> ./manual "build/html"

you should see the message 'updating environment' and the pausing could take up to one to two minutes, which is VERY UNUSUAL

Expected behavior
With a very small amount of changes, the code should run very quick, a couple seconds max.

Your project
https://docs.blender.org/manual/en/latest/

Screenshots
N/A

Environment info

  • OS: Operating system: Darwin-19.4.0-x86_64-i386-64bit 64 Bits

Additional context
Debugging the code and found the problem occurs in the

sphinx/environment/__init__.py

at the routine:

 def find_files(self, config: Config, builder: "Builder") -> None:

in the class:

class BuildEnvironment:

The code was trying to find an EXACT MATCH for doc name in self.found_docs and the catalog.domain, and thus looping recursively through a LARGE LIST of files.

I have found a fix for this problem, by splitting the loop to find all doc name in self.found_docs to a separate loop, digest the value of domain then store this to a dictionary

entry = {domain_hash_digest: doc name}

then later in the catalog search loop, perform the same digest function on catalog.domain, and using the 'in' comparing function to get the EXACT MATCH. This has speeding up the code enormously.

this is the entire block of code I use:

import hashlib 

    def find_files(self, config: Config, builder: "Builder") -> None:
        """Find all source files in the source dir and put them in
        self.found_docs.
        """
        domain_list = {}
        try:
            exclude_paths = (self.config.exclude_patterns +
                             self.config.templates_path +
                             builder.get_asset_paths())
            self.project.discover(exclude_paths)

            # Current implementation is applying translated messages in the reading
            # phase.Therefore, in order to apply the updated message catalog, it is
            # necessary to re-process from the reading phase. Here, if dependency
            # is set for the doc source and the mo file, it is processed again from
            # the reading phase when mo is updated. In the future, we would like to
            # move i18n process into the writing phase, and remove these lines.
            if builder.use_message_catalog:
                # add catalog mo file dependency
                repo = CatalogRepository(self.srcdir, self.config.locale_dirs,
                                         self.config.language, self.config.source_encoding)
                logger.info(f'repo:{repo}', nonl=False)

                for docname in self.found_docs:
                    domain = docname_to_domain(docname, self.config.gettext_compact)
                    domain_hash = hashlib.sha512(domain.encode('utf-8'))
                    domain_hash_digest = domain_hash.digest()
                    entry = {domain_hash_digest: docname}
                    # print(f'hash_entry: {entry}')
                    domain_list.update(entry)

                for catalog in repo.catalogs:
                    catalog_hash = hashlib.sha512(catalog.domain.encode('utf-8'))
                    catalog_hash_digest = catalog_hash.digest()
                    is_in = (catalog_hash_digest in domain_list)
                    # print(f'looking for: {catalog_hash_digest}, catalog.domain:{catalog.domain} catalog.mo_path:{catalog.mo_path}')
                    if is_in:
                        docname = domain_list[catalog_hash_digest]
                        # print(f'found docname: {docname}, catalog.mo_path: {catalog.mo_path}')
                        self.dependencies[docname].add(catalog.mo_path)
        except OSError as exc:
            raise DocumentError(__('Failed to scan documents in %s: %r') % (self.srcdir, exc))

Sorry, I do not have a facility to produce 'diff' file.

@ghost ghost added the type:bug label Jun 15, 2020
@tk0miya
Copy link
Member

tk0miya commented Jul 16, 2020

Sorry for late. Does this patch help your case? If so, I'll apply this into Sphinx in next release.

diff --git a/sphinx/environment/__init__.py b/sphinx/environment/__init__.py
index 1e58542bb..cf3364494 100644
--- a/sphinx/environment/__init__.py
+++ b/sphinx/environment/__init__.py
@@ -387,11 +387,11 @@ class BuildEnvironment:
                 # add catalog mo file dependency
                 repo = CatalogRepository(self.srcdir, self.config.locale_dirs,
                                          self.config.language, self.config.source_encoding)
+                mo_paths = {c.domain: c.mo_path for c in repo.catalogs}
                 for docname in self.found_docs:
                     domain = docname_to_domain(docname, self.config.gettext_compact)
-                    for catalog in repo.catalogs:
+                    if domain in mo_paths:
-                        if catalog.domain == domain:
-                            self.dependencies[docname].add(catalog.mo_path)
+                        self.dependencies[docname].add(mo_paths[domain])
         except OSError as exc:
             raise DocumentError(__('Failed to scan documents in %s: %r') %
                                 (self.srcdir, exc)) from exc

@tk0miya tk0miya added this to the 3.2.0 milestone Jul 16, 2020
@tkoyama010
Copy link
Contributor

@tk0miya Thanks for your patch. It is cool.
I checked time in my project.

touch ./source/about.rst && time sphinx-build -b html -D language=ja ./source "build/html<language_location>"

Before

real    0m39.845s
user    0m39.460s
sys     0m0.380s

After

real    0m9.966s
user    0m9.812s
sys     0m0.154s

@tk0miya
Copy link
Member

tk0miya commented Jul 19, 2020

Thank you for confirming! Let's apply this patch :-)

tk0miya added a commit to tk0miya/sphinx that referenced this issue Jul 19, 2020
…trap

Replace a nested-loop comparision by hash-search to improve the
performance of dependencies check on bootstrap.
tk0miya added a commit to tk0miya/sphinx that referenced this issue Jul 19, 2020
…trap

Replace a nested-loop comparison by hash-search to improve the
performance of dependencies check on bootstrap.
tk0miya added a commit that referenced this issue Jul 19, 2020
Close #7840: i18n: Optimize the dependencies check on bootstrap
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants