Skip to content

Commit

Permalink
Use same retry logic as macos binary builds (#89014)
Browse files Browse the repository at this point in the history
Occasionally the command to download sccache via curl fails with network errors (example below). The default curl retry option only retries errors that are considered "transient", but but the set of actual transient commands is greater than what curl considers to be transient.

This PR modifies the retry logic for downloading sccache to match what's in https://github.com/pytorch/pytorch/blob/master/.github/templates/macos_binary_build_workflow.yml.j2#L79-L89, using the retry action to ensure we both retry all transient errors, and including a longer retry delay to give the transient issue time to resolve itself.

Example failure from [this run](https://github.com/pytorch/pytorch/actions/runs/3422664884/jobs/5700595220):
```
Run sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to s3.amazonaws.com:443
Error: Process completed with exit code 35.
```
Pull Request resolved: #89014
Approved by: https://github.com/huydhn
  • Loading branch information
ZainRizvi authored and pytorchmergebot committed Nov 15, 2022
1 parent 7a37bbe commit f346283
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions .github/workflows/_mac-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,17 @@ jobs:
brew link --force libomp
- name: Install sccache (only for non-forked PRs, and pushes to trunk)
uses: nick-fields/retry@v2.8.2
if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }}
run: |
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}"
echo "SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${GITHUB_ENV}"
with:
timeout_minutes: 5
max_attempts: 3
retry_wait_seconds: 90
command: |
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}"
echo "SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${GITHUB_ENV}"
- name: Get workflow job id
id: get-job-id
Expand Down

0 comments on commit f346283

Please sign in to comment.