Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement merge_domaindata() on Zeek domain to support parallel builds #126

Merged
merged 2 commits into from
Jul 28, 2022

Conversation

awelzel
Copy link
Contributor

@awelzel awelzel commented Jul 8, 2022

Only the resulting environment.pickle file appears different.

$ diff -ru ./single-proc-html/ ./build/html/
Binary files ./single-proc-html/.doctrees/environment.pickle and
./build/html/.doctrees/environment.pickle differ
$

I am not quite sure incremental updates will work right,
but this may be a start.

Changes build time locally from 8m 40s to 2m 50s.

Closes #122, maybe.

@awelzel
Copy link
Contributor Author

awelzel commented Jul 8, 2022

@timwoj - this doesn't seem to have much of an effect on the github action timings. Do you see a difference in runtime locally?

@timwoj
Copy link
Contributor

timwoj commented Jul 8, 2022

On macOS I get the following warning. It's maybe 5 seconds faster on your branch, but that's probably just that it had cached part of it in RAM on the second run.

For security reason, parallel mode is disabled on macOS and python3.8 and above. For more details, please read sphinx-doc/sphinx#6803

On Linux, it's wildly faster. master took 423 seconds and your branch took 122 seconds. That's with 8 CPUs though. If I force it down to 2 CPUs similar to how it is on GitHub, it takes 448 seconds.

@awelzel
Copy link
Contributor Author

awelzel commented Jul 11, 2022

The OSX piece is sad. It seems to have been fixed with Sphinx 4.4 though:

sphinx-doc/sphinx#6881 (comment)

On Linux, it's wildly faster. master took 423 seconds and your branch took 122 seconds. That's with 8 CPUs though. If I force it down to 2 CPUs similar to how it is on GitHub, it takes 448 seconds.

Yep, it doesn't seem -j scales quite as nice as maybe expected.

I wonder if caching the build dir could be an alternative, but not sure how safe/reliable that works.

@timwoj
Copy link
Contributor

timwoj commented Jul 15, 2022

I installed a newer version of sphinx into a virtualenv on macOS here and can confirm that it seems they fixed parallel builds on here now. This is what I have in requirements.txt now:

#jinja2<3.1
#pygments>=2.12.0
#docutils==0.16
sphinx_rtd_theme==1.0.0
Sphinx==5.0.2
GitPython==3.1.27

docutils didn't get installed at all. jinja2 3.1.2 and pygments 2.12.0 got automatically installed. It seems that the newer version of Sphinx also fixed whatever conflict they were having with the newer version of jinja2. I was also able to bump the sphinx_rtd_theme package up to the latest version.

@awelzel awelzel force-pushed the topic/awelzel/parallel-doc-build branch from e2378b6 to 3c302fa Compare July 27, 2022 08:17
@awelzel
Copy link
Contributor Author

awelzel commented Jul 27, 2022

After writing some documentation and building it quite regularly and being annoyed about the lengthy process, I included the version updates and cleaned up the -j passing a bit in the Makefile. Otherwise no changes. I skimmed the produced docs and they seemed okay.

docutils didn't get installed at all

It installed for me as 0.17.1, so I kept that in the requirements.txt.

Should we merge this and see if something breaks? 🤷‍♂️

@timwoj
Copy link
Contributor

timwoj commented Jul 27, 2022

Sounds good to me. I scrolled through the changes as well and it looks like mostly upgrades to how Sphinx generates both the CSS and the JS that drive the site. I don't think there's anything significant with regards to the docs themselves here.

I was curious if RTD had any problem with the new python modules, but I doubt it will. I triggered a build off it just in case: https://docs.zeek.org/en/topic-awelzel-parallel-doc-build/. The only thing I can't figure out is how to tell RTD to pass the -j flag to the build. I don't think it did since the master build I did right before it only took about 8 seconds. The command it runs to start the build is below:

 /home/docs/checkouts/readthedocs.org/user_builds/zeek-docs/envs/topic-awelzel-parallel-doc-build/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html 

Makefile Show resolved Hide resolved
Only the resulting environment.pickle file appears different.

    $ diff -ru ./single-proc-html/ ./build/html/
    Binary files ./single-proc-html/.doctrees/environment.pickle and
    ./build/html/.doctrees/environment.pickle differ
    $

Closes #122.
Diffing the HTML files shows much whitespace different and some changes
in classes/css files. Smoke checking the produced HTML, nothing
stood out as broken, so lets see if we can just move this forward.
@awelzel awelzel force-pushed the topic/awelzel/parallel-doc-build branch from 00e13f7 to 5eb57c0 Compare July 28, 2022 17:07
Copy link
Contributor

@timwoj timwoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look at the RTD thing separately.

@timwoj timwoj merged commit 4550b7f into master Jul 28, 2022
@timwoj timwoj deleted the topic/awelzel/parallel-doc-build branch July 28, 2022 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Figure out how to make zeek-docs builds run in parallel
2 participants