Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process hangs in Github workflows with coverage 7.1.0/7.0.5 #1559

Closed
sergei-maertens opened this issue Feb 21, 2023 · 3 comments
Closed

Process hangs in Github workflows with coverage 7.1.0/7.0.5 #1559

sergei-maertens opened this issue Feb 21, 2023 · 3 comments
Labels
bug Something isn't working cpython Reported as a bug in CPython

Comments

@sergei-maertens
Copy link

sergei-maertens commented Feb 21, 2023

Describe the bug

Similarly to #1310, the coverage process seems to hang on Github Workflow actions/jobs when running tests.

This has started occurring over the weekend, and from what I can see it looks like the Github runner version was recently changed and Python 3.10.10 became available. See also actions/runner#2454

To Reproduce

Reproducing this seems hard - so far I haven't been able to reproduce this on my local machine (admittedly, not Ubuntu 22.04 either so probably comparing apples and oranges).

This is occurring in the https://github.com/open-formulieren/open-forms repository, on all branches/PRs. A job that was passing before but was re-run now also fails without any changes made to our code (so only Github infrastructure can have changed).

  1. What version of Python are you using? -> 3.10.10, and also tried pinning 3.10.9 as the previous attempt passed on that version (to no avail)
  2. What version of coverage.py shows the problem? -> Pinned in our dependencies - tried 7.1.0 (which used to work) and tried 7.0.5 too (mostly to invalidate dependency caches and get a fresh wheel).
  3. What versions of what packages do you have installed? The output of pip freeze is helpful. -> See https://github.com/open-formulieren/open-forms/blob/09e724bf9d60451cb3dc2f260ca00890eee9afe0/requirements/ci.txt
  4. What code shows the problem? Give us a specific commit of a specific repo that we can check out. If you've already worked around the problem, please provide a commit before that fix.
    -> The coverage run invocation never seems to exit.
  5. What commands did you run?
    • verified that doing no coverage at all makes the problem go away: commit and CI run
    • coverage run src/manage.py test src --exclude-tag=e2e (removed the --parallel and multiprocessing options) -> problem persist
    • removed coverage combine but kept coverage run - problem persists

Expected behavior

coverage run exits, both with and without multiprocessing mode.

Additional context

Per #1310 I've also enabled some debug logging: COVERAGE_DEBUG=process,pid. The output from that is taken from this run

5248.dda3: cwd is now '/home/runner/work/open-forms/open-forms'
 5248.dda3: New process: executable: '/opt/hostedtoolcache/Python/3.10.10/x64/bin/python'
 5248.dda3: New process: cmd: ['/opt/hostedtoolcache/Python/3.10.10/x64/bin/coverage', 'run', 'src/manage.py', 'test', 'src', '--exclude-tag=e2e']
 5248.dda3: New process: pid: 5248, parent pid: 51[37](https://github.com/open-formulieren/open-forms/actions/runs/4231994829/jobs/7351195983#step:7:38)

[test suite output removed for brevity]

Ran 1784 tests in 451.143s

OK (skipped=1, expected failures=1)
Destroying test database for alias 'default'...
 5248.dda3: atexit: pid: 5248, instance: <coverage.control.Coverage object at 0x7fb6ee0dbd30>
Error: The operation was canceled.

-> the job timed out after the configured 30 minutes. Normally these tests (in parallel) complete in under 10 minutes.

edit: I seem to be able to reproduce this using https://github.com/nektos/act, let's see if I can find a minimum reproducible example

@sergei-maertens
Copy link
Author

I've run the test command through strace (which I have very little experience with), hopefully someone smarter than me sees something from this:

2023-02-21T14:15:28.9911981Z newfstatat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage", {st_mode=S_IFREG|0644, st_size=1359872, ...}, AT_SYMLINK_NOFOLLOW) = 0
2023-02-21T14:15:28.9912193Z getpid()                                = 5143
2023-02-21T14:15:28.9912648Z openat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 6
2023-02-21T14:15:28.9912902Z newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=1359872, ...}, AT_EMPTY_PATH) = 0
2023-02-21T14:15:28.9913146Z newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=1359872, ...}, AT_EMPTY_PATH) = 0
2023-02-21T14:15:28.9913610Z newfstatat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage", {st_mode=S_IFREG|0644, st_size=1359872, ...}, 0) = 0
2023-02-21T14:15:28.9913869Z pread64(6, "SQLite format 3\0\20\0\1\1\0@  \0\0\0\n\0\0\1L"..., 100, 0) = 100
2023-02-21T14:15:28.9914145Z fcntl(6, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
2023-02-21T14:15:28.9914419Z fcntl(6, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510}) = 0
2023-02-21T14:15:28.9914681Z fcntl(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
2023-02-21T14:15:28.9915209Z newfstatat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage-journal", 0x7fff6322a4c0, 0) = -1 ENOENT (No such file or directory)
2023-02-21T14:15:28.9915718Z newfstatat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage-wal", 0x7fff6322a4c0, 0) = -1 ENOENT (No such file or directory)
2023-02-21T14:15:28.9915973Z newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=1359872, ...}, AT_EMPTY_PATH) = 0
2023-02-21T14:15:28.9916229Z pread64(6, "SQLite format 3\0\20\0\1\1\0@  \0\0\0\n\0\0\1L"..., 4096, 0) = 4096
2023-02-21T14:15:28.9916480Z fcntl(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
2023-02-21T14:15:28.9916981Z newfstatat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage", {st_mode=S_IFREG|0644, st_size=1359872, ...}, 0) = 0
2023-02-21T14:15:28.9917234Z newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=1359872, ...}, AT_EMPTY_PATH) = 0
2023-02-21T14:15:28.9917694Z newfstatat(AT_FDCWD, "/home/runner/work/open-forms/open-forms/.coverage", {st_mode=S_IFREG|0644, st_size=1359872, ...}, 0) = 0
2023-02-21T14:15:28.9917856Z close(6)                                = 0
2023-02-21T14:15:28.9918269Z newfstatat(AT_FDCWD, "/tmp/tmpmo9dunai/.lock", 0x7fff6322c270, 0) = -1 ENOENT (No such file or directory)
2023-02-21T14:15:28.9918425Z close(5)                                = 0
2023-02-21T14:15:28.9918574Z close(3)                                = 0
2023-02-21T14:15:28.9918736Z getpid()                                = 5143
2023-02-21T14:15:28.9919021Z getcwd("/home/runner/work/open-forms/open-forms", 1024) = 40
2023-02-21T14:15:28.9919184Z getpid()                                = 5143
2023-02-21T14:15:28.9919596Z write(2, " 5143.3d38: atexit: pid: 5143, i"..., 94 5143.3d38: atexit: pid: 5143, instance: <coverage.control.Coverage object at 0x7f4d4ded7d60>
2023-02-21T14:15:28.9919676Z ) = 94
2023-02-21T14:15:28.9920248Z rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK, sa_restorer=0x7f4d4e47b520}, {sa_handler=0x7f4d4e78ec1f, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK, sa_restorer=0x7f4d4e47b520}, 8) = 0
2023-02-21T14:15:29.4322930Z munmap(0x7f4d43c73000, 1073408)         = 0
2023-02-21T14:15:29.4323413Z munmap(0x7f4d43dfb000, 192992)          = 0
2023-02-21T14:15:29.4323792Z munmap(0x7f4d43d7a000, 524976)          = 0
2023-02-21T14:15:29.4324115Z munmap(0x7f4d43c67000, 45336)           = 0
2023-02-21T14:15:29.4324487Z munmap(0x7f4d43c62000, 16680)           = 0
2023-02-21T14:15:29.4324853Z munmap(0x7f4d43c4e000, 78248)           = 0
2023-02-21T14:15:29.4325155Z munmap(0x7f4d43c41000, 49544)           = 0
2023-02-21T14:15:29.4325510Z munmap(0x7f4d43c35000, 45512)           = 0
2023-02-21T14:15:29.4325861Z munmap(0x7f4d43c30000, 16520)           = 0
2023-02-21T14:15:29.4326218Z munmap(0x7f4d43c2b000, 16552)           = 0
2023-02-21T14:15:29.4326563Z munmap(0x7f4d43c23000, 28936)           = 0
2023-02-21T14:15:29.4326860Z munmap(0x7f4d43bdc000, 287088)          = 0
2023-02-21T14:15:29.4327449Z munmap(0x7f4d43bd2000, 37352)           = 0
2023-02-21T14:15:29.4327799Z munmap(0x7f4d43bcd000, 16440)           = 0
2023-02-21T14:15:29.4328149Z munmap(0x7f4d43bbc000, 66696)           = 0
2023-02-21T14:15:29.4328493Z munmap(0x7f4d43a87000, 1265544)         = 0
2023-02-21T14:15:29.6200202Z close(4)                                = 0
2023-02-21T14:15:29.6212648Z close(17)                               = 0
2023-02-21T14:15:29.6244640Z close(13)                               = 0
2023-02-21T14:15:29.6736216Z munmap(0x7f4d44246000, 69816)           = 0
2023-02-21T14:15:29.6737525Z munmap(0x7f4d44f2d000, 1212016)         = 0
2023-02-21T14:15:29.6738622Z munmap(0x7f4d44ba7000, 84192)           = 0
2023-02-21T14:15:29.6739471Z munmap(0x7f4d45a05000, 49776)           = 0
2023-02-21T14:15:29.6740333Z munmap(0x7f4d44bbc000, 1310000)         = 0
2023-02-21T14:15:29.6741223Z munmap(0x7f4d44cfc000, 57464)           = 0
2023-02-21T14:15:29.6742012Z munmap(0x7f4d4c170000, 16488)           = 0
2023-02-21T14:15:29.6742836Z munmap(0x7f4d44d0b000, 168584)          = 0
2023-02-21T14:15:29.6743647Z munmap(0x7f4d44b5a000, 28800)           = 0
2023-02-21T14:15:29.6744518Z munmap(0x7f4d44b1f000, 94432)           = 0
2023-02-21T14:15:29.6745336Z munmap(0x7f4d44b12000, 49384)           = 0
2023-02-21T14:15:29.6746119Z munmap(0x7f4d44b62000, 20744)           = 0
2023-02-21T14:15:29.6747070Z munmap(0x7f4d44e82000, 696664)          = 0
2023-02-21T14:15:29.6749760Z munmap(0x7f4d4422b000, 106888)          = 0
2023-02-21T14:15:29.6750652Z munmap(0x7f4d44e38000, 300640)          = 0
2023-02-21T14:15:29.6751469Z munmap(0x7f4d44b76000, 196792)          = 0
2023-02-21T14:15:29.6752379Z munmap(0x7f4d4479b000, 420280)          = 0
2023-02-21T14:15:29.6753378Z munmap(0x7f4d444cd000, 847064)          = 0
2023-02-21T14:15:29.6754474Z munmap(0x7f4d44426000, 155824)          = 0
2023-02-21T14:15:29.6755483Z munmap(0x7f4d44d70000, 815120)          = 0
2023-02-21T14:15:29.6756312Z munmap(0x7f4d44b68000, 53352)           = 0
2023-02-21T14:15:29.6757083Z munmap(0x7f4d44b37000, 139320)          = 0
2023-02-21T14:15:29.6757926Z munmap(0x7f4d44d35000, 237584)          = 0
2023-02-21T14:15:29.6758723Z munmap(0x7f4d4459c000, 43152)           = 0
2023-02-21T14:15:29.6759517Z munmap(0x7f4d4444d000, 32784)           = 0
2023-02-21T14:15:29.6760378Z munmap(0x7f4d445a7000, 110808)          = 0
2023-02-21T14:15:29.6797560Z munmap(0x7f4d41a66000, 1048576)         = 0
2023-02-21T14:15:29.6800944Z munmap(0x7f4d41b66000, 1048576)         = 0
2023-02-21T14:15:29.6846055Z rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f4d4e47b520}, NULL, 8) = 0
2023-02-21T14:15:29.6847254Z rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f4d4e47b520}, NULL, 8) = 0
2023-02-21T14:15:29.6848292Z rt_sigaction(SIGFPE, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f4d4e47b520}, NULL, 8) = 0
2023-02-21T14:15:29.6849314Z rt_sigaction(SIGABRT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f4d4e47b520}, NULL, 8) = 0
2023-02-21T14:15:29.6850338Z rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f4d4e47b520}, NULL, 8) = 0
2023-02-21T14:15:29.6851272Z sigaltstack(NULL, {ss_sp=0x563828ea73e0, ss_flags=0, ss_size=18160}) = 0
2023-02-21T14:15:29.6852172Z sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}, NULL) = 0

The full log is 98MB and too large for Github - if earlier output is relevant, I can send it if needed.

sergei-maertens added a commit to open-formulieren/open-forms that referenced this issue Feb 21, 2023
The CI pipeline is hanging - see nedbat/coveragepy#1559 for details.

Disabling coverage.py completely for now at least
let's us run tests for ongoing work.
@sergei-maertens
Copy link
Author

We managed to narrow it down to the use of hypotheses recursive call in https://github.com/open-formulieren/open-forms/blob/master/src/openforms/variables/tests/test_validators.py#L43, which is possibly related to HypothesisWorks/hypothesis#3585

@sergei-maertens
Copy link
Author

Closing this as we're fairly certain it can be pinned to a regression from Python 3.10.9 to 3.10.10, see python/cpython#102126

@sergei-maertens sergei-maertens closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2023
@nedbat nedbat added cpython Reported as a bug in CPython and removed needs triage labels Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpython Reported as a bug in CPython
Projects
None yet
Development

No branches or pull requests

2 participants