Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cancelling newer builds with cancel_newer #59

Closed
wants to merge 6 commits into from

Conversation

Gisleburt
Copy link

Very happy to take feedback and suggestions on this PR, or for it to be ignored, this is a bit of a weird one. One thing I wondered about was the interplay between cancel_newer and ignore_sha as you may want to only cancel newer if the SHA is the same, so would love input on that.

Changes:

  1. Added a cancel_newer optional parameter, added documentation in README.md.
  2. Added an ID check (as you can't rely on the Date difference with this option)
  3. Increased the number of workflows downloaded to the max per page (100, up from the default of 30). If you want more than 100, you'd have to add Paging.
  4. Sped up the cancelation process by sending all the cancelation requests at once, adding the Promises to an array, and awaiting the array at the end.
  5. Changed action.yml to match the schema (I don't think it was causing any problems but just tidying up while I was there).

Use Case:

We receive webhooks from a CMS when content is published and use this to build a new static website based on that content.

The CMS builds pages out of parts, and sends a publish event for every part of something thats being published. Irritatingly it does this even if the part hasn't changed, so every time we publish a page, we end up with 64 new workflows starting up.

These tend to get queued, and we actually don't care which of the jobs finishes as the webhook only triggers the job, and the job will then go download the new content so all the jobs will get the same content.

The next problem we has is that we have multiple jobs starting at once, downloading a list of other jobs and trying to cancel them one at a time. GitHub doesn't wait for the job to be cancelled before returning a 202 response so its possible for two jobs to cancel each other. In order to reduce the chance of this happening we decided to send all of the cancellations in one go, and wait for the 202s in one lump at the end. This change will improve the speed of the task for everyone with more than one workflow to cancel.

It's not perfect, our builds are now so fast we still end up with > 1 job running to completion, but its not 64 running to completion and I think thats the best we can hope for.

@@ -11,10 +11,15 @@ inputs:
ignore_sha:
description: 'Optional - Allow canceling other workflows with the same SHA. Useful for the `pull_request.closed` event.'
required: false
default: false
default: 'false'
cancel_newer:
Copy link
Owner

@styfle styfle Mar 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this ignore_date to match the ignore_sha prop

Copy link
Owner

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to #35. I'm not sure if this would be the same use case though.

I like the code in this PR but I think PR #35 is probably the correct solution.

@@ -11,10 +11,15 @@ inputs:
ignore_sha:
description: 'Optional - Allow canceling other workflows with the same SHA. Useful for the `pull_request.closed` event.'
required: false
default: false
default: 'false'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed from false to 'false'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the specification, default is a string value. In YAML true and false are booleans, so adding the 's turns this it into the string "false".

The way YAML handles strings causes a number of common gotchas, especially when it comes to booleans.

access_token:
description: 'Your GitHub Access Token, defaults to: {{ github.token }}'
default: '${{ github.token }}'
required: true
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually required if it has a default? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, not sure why I added that 😅

@thomwiggers
Copy link
Contributor

This PR and #35 don't solve the same problem, if I read your PR description correctly.

Given you have jobs 1,2,3,4,5, and for some reason task 3 would be scheduled to the runners, this PR would close 1, 2, 4, 5, and run (scheduled) task 3.

PR #35 would cancel 1-4, including the executed task 3 itself, such that the next task that would run would be task 5.

@Gisleburt
Copy link
Author

#35 definitely a better solution.

One problem we've run into is that GHA does not immediately cancel worflows (it returns a 202 to tell you it'll get to it eventually).

This means that if there are two jobs running at the same time, they send cancel requests for the other, before they themselves are cancelled.

Closing this PR but might still do others for the other fixes if you'd like them.

@Gisleburt Gisleburt closed this Mar 9, 2021
@thomwiggers
Copy link
Contributor

@Gisleburt if you've got suggestions to make #35 better, they'd be welcome: that PR is mostly a big hackjob :)

@@ -52,6 +53,7 @@ async function main() {
await Promise.all(workflow_ids.map(async (workflow_id) => {
try {
const { data } = await octokit.actions.listWorkflowRuns({
per_page: 100,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of increasing this number. Do you want to submit another PR with just this one change?

mikehardy added a commit to mikehardy/cancel-workflow-action that referenced this pull request Apr 1, 2021
This was noted by @Gisleburt here styfle#59 (comment)

According to the spec the parameter should be a string, but if you use false without
quotes in YAML it is taken as a boolean so the types are not quite correct
mikehardy added a commit to mikehardy/cancel-workflow-action that referenced this pull request Apr 1, 2021
This was noted by @Gisleburt here styfle#59 (comment)

According to the spec the parameter should be a string, but if you use false without
quotes in YAML it is taken as a boolean so the types are not quite correct
mikehardy added a commit to mikehardy/cancel-workflow-action that referenced this pull request Apr 1, 2021
This was suggested by @Gisleburt in styfle#59

I quote:

"The next problem we has is that we have multiple jobs starting at once, downloading a list of other jobs and trying to cancel them one at a time. GitHub doesn't wait for the job to be cancelled before returning a 202 response so its possible for two jobs to cancel each other. In order to reduce the chance of this happening we decided to send all of the cancellations in one go, and wait for the 202s in one lump at the end. This change will improve the speed of the task for everyone with more than one workflow to cancel."
mikehardy added a commit to mikehardy/cancel-workflow-action that referenced this pull request Apr 1, 2021
This was suggested by @Gislebert in styfle#59

the default was 30, this widens the number of workflows we scan to
cancel the most workflows possible, without adding complication by paging
mikehardy added a commit to mikehardy/cancel-workflow-action that referenced this pull request Apr 1, 2021
* chore: add prettier config, format file, add lint workflow

The prettier config was adapted from the official GitHub Actions repo,
bent to fit the prevailing style (where possible) already in the project

The intent is not to be controversial or argue about whitespace, it is just
to have a consistent easy-to-verify style specifically to avoid all arguments
about whitespace. If anything in here is objectionable, just name the setting
to alter and I can edit / re-format / re-push

* chore(lint): add eslint config, check lint in workflow

Similar to the prettier config, this is adapted from the main GitHub Actions repo,
and the intent is not to be controversial it is simply to have any consistent / easy
to check standard. If anything is objectionable, just point out the setting

The config was adapted from the main repo to match what appeared to be prevailing opinion
on this project (semicolons preferred, bracket spacing preferred, etc)

The following commits will fix the small errors that seemed worth fixing

* lint: add explicit return type on main() method

* lint: use for..of vs array.forEach

* lint: workflow_id variable was shadowed, make it unambiguous

* lint: remote ts-ignore via typing workflow_id as number in all places

* chore: add husky and hook build/format/lint checks to pre-commit

This enforces the same checks locally that will execute in CI

With this, everyone should have a clean / consistent dev environment,
and it will be clear to contributors if they submit code that is not valid
typescript

Additionally, after doing the build it adds the dist/index.js output to the
commit list so contributors can't forget to commit it

* fix(ignore_sha): change default from false to 'false'

This was noted by @Gisleburt here styfle#59 (comment)

According to the spec the parameter should be a string, but if you use false without
quotes in YAML it is taken as a boolean so the types are not quite correct

* refactor: extraction of cancel workflow runs method

this is intended to be completely non-functional, nothing should be different
here in how this works from before, it is a pure method extraction

testing this change:
the cancel_self workflow action was updated so it may be run manually, and
the no-longer-required access_token was removed from it

* perf: cancel workflow runs in parallel

This was suggested by @Gisleburt in styfle#59

I quote:

"The next problem we has is that we have multiple jobs starting at once, downloading a list of other jobs and trying to cancel them one at a time. GitHub doesn't wait for the job to be cancelled before returning a 202 response so its possible for two jobs to cancel each other. In order to reduce the chance of this happening we decided to send all of the cancellations in one go, and wait for the 202s in one lump at the end. This change will improve the speed of the task for everyone with more than one workflow to cancel."

* fix: list maximum (100) workflows possible before paging

This was suggested by @Gislebert in styfle#59

the default was 30, this widens the number of workflows we scan to
cancel the most workflows possible, without adding complication by paging

* refactor: break workflow run filter into multiple parts

this will allow the cancel all but latest feature to be easier to review

* feat(all_but_latest): add ability to clear every run but latest

This is an adaptation of styfle#35 from @thomwiggers - the logic is entirely
from that PR (thank you!)

A new workflow adds a 240-second sleep job on macos (limit 5 concurrent)
with manual dispatch available for testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants