Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[skip-ci] GHA: Attempt fix exceeded a secondary rate limit #22550

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

cevich
Copy link
Member

@cevich cevich commented Apr 30, 2024

Frequent but intermittently, the stale issue and PR locking workflow generates the error:

You have exceeded a secondary rate limit. Please wait a few minutes
before you try again. If you reach out to GitHub Support for help,
please include the request ID XYZ

According to upstream dessant/lock-threads issue 48, this seems to be coming from the GitHub side (bug/feature/limitation), since the action uses an official github API rate-limiting library. It's unlikely related to which style/syntax of github token is used, nor if the action is executed concurrently across multiple repos.

According to the rate-limiting docs:
https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#about-secondary-rate-limits it's possible the issue is caused due to an unknown aspect of the clause:

These secondary rate limits are subject to change without notice. You
may also encounter a secondary rate limit for undisclosed reasons.

The same docs indicate Github Apps have enhanced rate-limits which scale with the org's repo count. Attempt to fix the intermittent failures by making use of a new, dedicated, org-specific, private "Stale Locking App" I recently created. This requires the addition of a new action to the workflow that obtains a short-lived token for passing to lock-threads.

Note: Because both vars.STALE_LOCKING_APP_ID and secrets.STALE_LOCKING_APP_PRIVATE_KEY are defined at the containers-organization level, the Buildah and Skopeo re-use of this workflow should continue to function normally w/o change.

Does this PR introduce a user-facing change?

None

Frequent but intermittently, the stale issue and PR locking workflow
generates the error:

```
You have exceeded a secondary rate limit. Please wait a few minutes
before you try again. If you reach out to GitHub Support for help,
please include the request ID XYZ
```

According to upstream `dessant/lock-threads` issue 48, this seems to be
coming from the GitHub side (bug/feature/limitation), since the action
uses an official github API rate-limiting library.  It's unlikely related
to which style/syntax of github token is used, nor if the action is
executed concurrently across multiple repos.

According to the rate-limiting docs:
https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#about-secondary-rate-limits
it's possible the issue is caused due to an unknown aspect of the clause:

```
These secondary rate limits are subject to change without notice. You
may also encounter a secondary rate limit for undisclosed reasons.
```

The same docs indicate Github Apps have enhanced rate-limits which
scale with the org's repo count.  Attempt to fix the intermittent
failures by making use of a new, dedicated, org-specific, private "Stale
Locking App" I recently created.  This requires the addition of a new
action to the workflow that obtains a short-lived token for passing to
lock-threads.

Note: Because both `vars.STALE_LOCKING_APP_ID` and
`secrets.STALE_LOCKING_APP_PRIVATE_KEY` are defined at the
containers-organization level, the Buildah and Skopeo re-use
of this workflow should continue to function normally w/o change.

Signed-off-by: Chris Evich <cevich@redhat.com>
@cevich
Copy link
Member Author

cevich commented Apr 30, 2024

Review note: I think this is the needed fix and done correctly. The only practical way to test it, is to merge it ☹️ I've stored all the details of the new github app under our team's secrets storage service.

@edsantiago I know this isn't your forte, but if you could look over my commit message and comments, I'd appreciate the feedback.

@ashley-cui @lsm5 I realize this is a lot for such a small diff, but PTAL once you gave time to consume the background details.

Copy link
Collaborator

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as best I can tell. One question inline.

uses: actions/create-github-app-token@v1
with:
# N/B: These are both defined at the containers-org level
app-id: ${{ vars.STALE_LOCKING_APP_ID }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string vars does not appear anywhere else in .github/workflows; the usual form is secrets.XX. Is vars a real thing, or did you mean secrets here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharp eye. This was surprising to me also (new feature maybe?). Yes it's intended and documented that it will consider org-level action-vars (which this is).

Copy link
Member

@lsm5 lsm5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we can intentionally trigger a rate limit to try this out.

Anyway, LGTM

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2024
@cevich
Copy link
Member Author

cevich commented Apr 30, 2024

Would be great if we can intentionally trigger a rate limit to try this out.

Unf. the problem is intermittent and nobody has a clue as to exactly which limit is being breached (including github support). As for testing, I s'pose I could give this a try over on the containers/automation_sandbox repo, set it to run every hour and leave it for a few days.
It's a bit of a pain since I'll need to setup the secrets over there, but it's not impossible. LMK if that would improve the comfort level.

@lsm5
Copy link
Member

lsm5 commented Apr 30, 2024

Would be great if we can intentionally trigger a rate limit to try this out.

Unf. the problem is intermittent and nobody has a clue as to exactly which limit is being breached (including github support). As for testing, I s'pose I could give this a try over on the containers/automation_sandbox repo, set it to run every hour and leave it for a few days. It's a bit of a pain since I'll need to setup the secrets over there, but it's not impossible. LMK if that would improve the comfort level.

not worth the pain imho. Besides if we never see any rate limit complaints ever again, we can assume it works :)

Copy link
Member

@lsm5 lsm5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Apr 30, 2024
Copy link
Contributor

openshift-ci bot commented Apr 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich, lsm5

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ashley-cui
Copy link
Member

LGTM as far as I can tell, but only merging will reveal the truth :)

@edsantiago
Copy link
Collaborator

The truth is out there.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 9c24033 into containers:main Apr 30, 2024
89 of 91 checks passed
@cevich
Copy link
Member Author

cevich commented Apr 30, 2024

Thanks guys. Now that it's merged, I can at least manually trigger it to confirm there's no typos or such...

@cevich
Copy link
Member Author

cevich commented Apr 30, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants