Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test]: Building a wheel for integration tests sometimes times out #28703

Closed
1 of 16 tasks
tvalentyn opened this issue Sep 27, 2023 · 7 comments · Fixed by #30204
Closed
1 of 16 tasks

[Failing Test]: Building a wheel for integration tests sometimes times out #28703

tvalentyn opened this issue Sep 27, 2023 · 7 comments · Fixed by #30204
Assignees
Labels
bug done & done Issue has been reviewed after it was closed for verification, followups, etc. failing test flake P1 python tests

Comments

@tvalentyn
Copy link
Contributor

What happened?

We are hitting subprocess.TimeoutExpired within cibuildwheel:

12:48:24 > Task :sdks:python:bdistPy38linux
12:48:24 
12:48:24 [notice] A new release of pip available: 22.2.2 -> 23.2.1
12:48:24 [notice] To update, run: pip install --upgrade pip
12:48:24 Processing /project
12:48:24   Preparing metadata (setup.py): started
12:48:24   Preparing metadata (setup.py): finished with status 'done'
12:48:24 Building wheels for collected packages: apache-beam
12:48:24   Building wheel for apache-beam (setup.py): started
12:48:24   Building wheel for apache-beam (setup.py): still running...
12:48:24   Building wheel for apache-beam (setup.py): still running...
12:48:24   Building wheel for apache-beam (setup.py): still running...
12:48:24   Building wheel for apache-beam (setup.py): finished with status 'done'
12:48:24   Created wheel for apache-beam: filename=apache_beam-2.52.0.dev0-cp38-cp38-linux_x86_64.whl size=15364571 sha256=e737d0e77efe4e77279557c8b6226fd9a5d2f36b71343343a00ca1aae340e4ba
12:48:24   Stored in directory: /tmp/pip-ephem-wheel-cache-jf8ssnkj/wheels/d4/3a/c0/4dc152b1840724d5b992a8268bb4ef33fdbe42ffe1429b845c
12:48:24 Successfully built apache-beam
12:48:24     + /opt/python/cp38-cp38/bin/python -c 'import sys, json, glob; json.dump(glob.glob('"'"'/tmp/cibuildwheel/built_wheel/*.whl'"'"'), sys.stdout)'
12:48:24     + rm -rf /tmp/cibuildwheel/repaired_wheel
12:48:25     + mkdir -p /tmp/cibuildwheel/repaired_wheel
12:48:25 
12:48:25                                                                      ✓ 253.39s
12:48:25 Repairing wheel...
12:48:25 
12:48:25     + sh -c 'auditwheel repair -w /tmp/cibuildwheel/repaired_wheel /tmp/cibuildwheel/built_wheel/apache_beam-2.52.0.dev0-cp38-cp38-linux_x86_64.whl'
12:48:25 INFO:auditwheel.main_repair:Repairing apache_beam-2.52.0.dev0-cp38-cp38-linux_x86_64.whl
12:48:25 INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64
12:48:27 INFO:auditwheel.wheeltools:New filename tags: manylinux_2_17_x86_64, manylinux2014_x86_64
12:48:27 INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp38-cp38-linux_x86_64
12:48:27 INFO:auditwheel.wheeltools:New WHEEL info tags: cp38-cp38-manylinux_2_17_x86_64, cp38-cp38-manylinux2014_x86_64
12:48:27 INFO:auditwheel.main_repair:
12:48:31 Fixed-up wheel written to /tmp/cibuildwheel/repaired_wheel/apache_beam-2.52.0.dev0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
12:48:31     + /opt/python/cp38-cp38/bin/python -c 'import sys, json, glob; json.dump(glob.glob('"'"'/tmp/cibuildwheel/repaired_wheel/*.whl'"'"'), sys.stdout)'
12:48:31     + mkdir -p /output
12:48:31     + mv /tmp/cibuildwheel/repaired_wheel/apache_beam-2.52.0.dev0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl /output
12:48:31 
12:48:31                                                                        ✓ 6.64s
12:48:31 
12:48:31 ✓ cp38-manylinux_x86_64 finished in 270.43s
12:48:31 Copying wheels back to host...
12:48:31 
12:48:31 
12:48:31                                                                        ✓ 0.17s
12:48:31 Traceback (most recent call last):
12:49:01   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Integration_Commit/src/build/gradleenv/1922375555/bin/cibuildwheel", line 8, in <module>
12:49:01     sys.exit(main())
12:49:01   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Integration_Commit/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/__main__.py", line 129, in main
12:49:01     build_in_directory(args)
12:49:01   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Integration_Commit/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/__main__.py", line 248, in build_in_directory
12:49:01     cibuildwheel.linux.build(options, tmp_path)
12:49:01   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Integration_Commit/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/linux.py", line 384, in build
12:49:01     build_in_container(
12:49:01   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Integration_Commit/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/oci_container.py", line 134, in __exit__
12:49:01     self.process.wait(timeout=30)
12:49:01   File "/usr/lib/python3.8/subprocess.py", line 1083, in wait
12:49:01     return self._wait(timeout=timeout)
12:49:01   File "/usr/lib/python3.8/subprocess.py", line 1798, in _wait
12:49:01     raise TimeoutExpired(self.args, timeout)
12:49:01 subprocess.TimeoutExpired: Command '['docker', 'start', '--attach', '--interactive', 'cibuildwheel-925e8aec-2a6f-4f02-b2e5-22df08e131cc']' timed out after 30 seconds
12:49:01 
12:49:13 > Task :sdks:python:bdistPy38linux FAILED
12:49:13 

We could perhaps add a retry around cibuildwheel call.

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@jrmccluskey
Copy link
Contributor

Reading through the documentation for cibuildwheel we don't get a route for configuring any timeouts. I'm a little concerned about just wrapping the cibuildwheel calls in a retry though, we'd be adding more runtime to the actions and not necessarily addressing the underlying issue

@tvalentyn
Copy link
Contributor Author

What if we retry a limited # of times?

@AnandInguva
Copy link
Contributor

Is this still observed?

@tvalentyn
Copy link
Contributor Author

Yes, still happening:

13:46:31 
13:46:38 > Task :sdks:python:bdistPy311linux
13:46:38 Traceback (most recent call last):
13:46:38   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/build/gradleenv/1922375555/bin/cibuildwheel", line 8, in <module>
13:46:38     sys.exit(main())
13:46:38   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/__main__.py", line 129, in main
13:46:38     build_in_directory(args)
13:46:38   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/__main__.py", line 248, in build_in_directory
13:46:38     cibuildwheel.linux.build(options, tmp_path)
13:46:38   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/linux.py", line 384, in build
13:46:38     build_in_container(
13:46:38   File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2/src/build/gradleenv/1922375555/lib/python3.8/site-packages/cibuildwheel/oci_container.py", line 134, in __exit__
13:46:38     self.process.wait(timeout=30)
13:46:38   File "/usr/lib/python3.8/subprocess.py", line 1083, in wait
13:46:38     return self._wait(timeout=timeout)
13:46:38   File "/usr/lib/python3.8/subprocess.py", line 1798, in _wait
13:46:38     raise TimeoutExpired(self.args, timeout)
13:46:38 subprocess.TimeoutExpired: Command '['docker', 'start', '--attach', '--interactive', 'cibuildwheel-690819bd-0a6a-47c7-bf5a-4c280b516a4f']' timed out after 30 seconds
13:46:38 
13:46:41 > Task :sdks:python:bdistPy311linux FAILED

@tvalentyn
Copy link
Contributor Author

I think timeout 30sec is hardcoded in:
https://github.com/pypa/cibuildwheel/blob/886ba0f6628c5234efb2fb16c5628e3125ab3173/cibuildwheel/oci_container.py#L185

Beam code where we can have some retry logic:

@tvalentyn
Copy link
Contributor Author

Filed pypa/cibuildwheel#1692 for CIBuildWheel folks.

@tvalentyn
Copy link
Contributor Author

Added a retry logic. Let's reopen if we see it again.

@github-actions github-actions bot added this to the 2.53.0 Release milestone Dec 8, 2023
@tvalentyn tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug done & done Issue has been reviewed after it was closed for verification, followups, etc. failing test flake P1 python tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants