Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: ban coverage 6.3 that may be causing random hangs in fork test #22326

Merged
merged 1 commit into from Jan 26, 2022

Conversation

tacaswell
Copy link
Member

PR Summary

About 24 hours ago (some time 2022-01-25) we started getting random failures on CI that look like:

__________________________________ test_fork ___________________________________
[gw0] linux -- Python 3.7.12 /opt/hostedtoolcache/Python/3.7.12/x64/bin/python

    @pytest.mark.skipif(not hasattr(os, "register_at_fork"),
                        reason="Cannot register at_fork handlers")
    def test_fork():
        _model_handler(0)  # Make sure the font cache is filled.
        ctx = multiprocessing.get_context("fork")
        with ctx.Pool(processes=2) as pool:
>           pool.map(_model_handler, range(2))

lib/matplotlib/tests/test_font_manager.py:206: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/pool.py:623: in __exit__
    self.terminate()
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/pool.py:548: in terminate
    self._terminate()
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/util.py:224: in __call__
    res = self._callback(*self._args, **self._kwargs)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/pool.py:617: in _terminate_pool
    p.join()
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/process.py:140: in join
    res = self._popen.wait(timeout)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/popen_fork.py:48: in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <multiprocessing.popen_fork.Popen object at 0x7f258326c450>, flag = 0

    def poll(self, flag=os.WNOHANG):
        if self.returncode is None:
            try:
>               pid, sts = os.waitpid(self.pid, flag)
E               Failed: Timeout >300.0s

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/popen_fork.py:28: Failed
----------------------------- Captured stderr call -----------------------------

~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (139800230110976) ~~~~~~~~~~~~~~~~~~~~~
  File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/execnet/gateway_base.py", line 285, in _perform_spawn
    reply.run()
  File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/execnet/gateway_base.py", line 220, in run
    self._result = func(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
    msg = Message.from_io(io)
  File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/execnet/gateway_base.py", line 432, in from_io
    header = io.read(9)  # type 1, channel 4, payload 4
  File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/execnet/gateway_base.py", line 400, in read
    data = self._read(numbytes - len(buf))
------------------------------ Captured log call -------------------------------
DEBUG    matplotlib.axes._base:_base.py:2965 title position was updated manually, not adjusting
DEBUG    matplotlib.backends.backend_pdf:backend_pdf.py:875 Assigning font /b'F1' = '/home/runner/work/matplotlib/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf'
DEBUG    matplotlib.backends.backend_pdf:backend_pdf.py:917 Embedding font /home/runner/work/matplotlib/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf.
DEBUG    matplotlib.backends.backend_pdf:backend_pdf.py:924 Writing TrueType font.

We are seeing this mostly on py37, but it also showed up on 3.8 / 3.9 so this won't just go away when we merge #22194 .

The only version change between the last working run to the first failing run was coverage 6.2 -> 6.3. To get coverage on subprocesses there is some extra configuration done at (sub-)process spawn time to get the hooks in place (which is actually managed for us by pytest-cov) so it is plausible that this is related, but do not actually understand the underlying problem.

@tacaswell tacaswell added this to the v3.5.2 milestone Jan 26, 2022
Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure why this broke things, but I see the change in dependencies in the Actions logs, as you say.

@QuLogic QuLogic merged commit 4784ffa into matplotlib:main Jan 26, 2022
@tacaswell tacaswell deleted the ci_pin_back_coverage branch January 26, 2022 20:28
meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Jan 26, 2022
@QuLogic
Copy link
Member

QuLogic commented Jan 26, 2022

Might be an upstream issue here: nedbat/coveragepy#1310

tacaswell added a commit that referenced this pull request Jan 26, 2022
…326-on-v3.5.x

Backport PR #22326 on branch v3.5.x (CI: ban coverage 6.3 that may be causing random hangs in fork test)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants