Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful handling of error (non-validation one) #1122

Merged
merged 8 commits into from Jun 24, 2022

Conversation

tiwarishub
Copy link
Contributor

We recently saw due to unavailability of Actions cache service, some of the setup-* start failing because of their caching feature. Ideally from actions-cache team it is recommended to handle cache service error (non-validation) gracefully and not to fail to workflow. This PR moves that dependency to caches packages, so that we will not throw error and throw the error as warning/info to stderr/stdout.

@tiwarishub tiwarishub marked this pull request as ready for review June 20, 2022 05:09
@tiwarishub tiwarishub requested a review from a team as a code owner June 20, 2022 05:09
@@ -86,23 +86,24 @@ export async function restoreCache(
}

const compressionMethod = await utils.getCompressionMethod()
let archivePath = ''
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will essentially suppress all failures including temp directory creation and tar extraction failures. Essentially we are trading accuracy for continuity here. We definitely need a monitor which will tell us how many users are hitting such failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suppression is already there https://github.com/actions/cache/blob/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d/src/restore.ts#L55-L62 and https://github.com/actions/cache/blob/main/src/save.ts#L51-L59 , the only thing is now it is moved to toolkit instead of Github Actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need a monitor which will tell us how many users are hitting such failures.

I will check how we can monitors for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is same problem as what @anuragc617 would be looking into w.r.t getting warnings from cache action so that we can have monitor on those.

@@ -217,6 +223,13 @@ export async function saveCache(

core.debug(`Saving Cache (ID: ${cacheId})`)
await cacheHttpClient.saveCache(cacheId, archivePath, options)
} catch (error) {
const typedError = error as Error
if (typedError.name === ReserveCacheError.name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check happen in action itself? I see that we do some checks in action itself. Should we also not do this check there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the condition which you mentioned in actions will be removed after this change and basically idea here is we will also need to add any checks like this in other actions setup-* or wherever this toolkit cache package is getting used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiwarishub but I don't see you catching the ValidationError and handling it. Does that still happen in action while rest happens in toolkit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, in the given try block we are not throwing ValidationError exception https://github.com/actions/toolkit/pull/1122/files#diff-34e72845d1658a8a9b9fdfe6b5b9261952e7f0496a0a3fcec87dae722718a64aL151-L152. So ValidationError will be still thrown as a error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Though I feel we should also add the ValidationError check here as well. We do not have a specific Validate method and that means the validation can potentially be added in many places in future and without a check all those validations will get suppressed with this global suppression and will be difficult to understand why that is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -152,7 +158,7 @@ export async function saveCache(
checkKey(key)

const compressionMethod = await utils.getCompressionMethod()
let cacheId = null
let cacheId = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? IIRC this was made null recently. May @t-dedah can chime in on why it was made null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me check with @t-dedah because this function return type Number https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L150 so in case of exception returning null might not be correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain more why this change is required, earlier even though Function return type is number and cacheId was set null but in the code we were making sure this will never return null https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L204 otherwise we were throwing error so this line https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L229 will never get execute (in case of exception). So compiler was happy. but now as i have added catch block https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L229 line can be executed (in case of no error or exception is consumed) so compiler is throwing error that ``return type can only be number not number | null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that seems to be the case.

let archivePath = ''
try {
// path are needed to compute version
const cacheEntry = await cacheHttpClient.getCacheEntry(keys, paths, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the try block be just around the http client to handle errors from the server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea here is to ensure that any failure is suppressed so that user does not get blocked due to cache action failures.
A bit scary thought but I guess it would be good for users' workflows to continue executing as in reality there is no blocker due to cache action failure. This may mean that users may not even notice things like archive/unarchive failing, and that's why I feel we should have monitors to ability to detect any anomaly in such failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can even whitelist error based on Response code (by creating exception for those codes) but i want to keep the end user expectation same and based on current implementation of Github Actions actions/cache https://github.com/actions/cache/blob/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d/src/restore.ts#L55-L62 and https://github.com/actions/cache/blob/main/src/save.ts#L51-L59, it is suppressing all the error apart from ValidationError. And this conditions will be removed from actions side.

return cacheEntry.cacheKey
} catch (error) {
// Supress all cache related errors because caching should be optional
core.warning(`Fail to restore: ${error}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
core.warning(`Fail to restore: ${error}`)
core.warning(`Failed to restore: ${error}`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tiwarishub
Copy link
Contributor Author

@bishal-pdMSFT / @aparna-ravindra, Let me know if you this PR is good to go

@bishal-pdMSFT
Copy link
Contributor

Will this be a breaking change as we are changing the error contract. We are now not throwing errors which were previously being thrown.

@bishal-pdMSFT
Copy link
Contributor

Otherwise looks good to me.

@tiwarishub
Copy link
Contributor Author

tiwarishub commented Jun 21, 2022

Will this be a breaking change as we are changing the error contract. We are now not throwing errors which were previously being thrown.

Yes, i have seen in restore, Github actions generally checks for cache returned key so there we are good.

For actions/cache:
https://github.com/actions/cache/blob/main/src/restore.ts#L33
For Setup-java:
Restore cache: https://github.com/actions/setup-java/blob/main/src/cache.ts#L101-L104
For Setup-node:
https://github.com/actions/setup-node/blob/main/src/cache-restore.ts#L44
For Setup-python:
https://github.com/actions/setup-python/blob/main/src/cache-distributions/cache-distributor.ts#L38

But for save, Actions generally trust to exception to get raised if there is issues to stop execution of their current function. So there it requires change to check return cacheId as well.

For actions/cache: https://github.com/actions/cache/blob/main/src/save.ts#L48
For Setup-java: https://github.com/actions/setup-java/blob/main/src/cache.ts#L127
For Setup-python: https://github.com/actions/setup-python/blob/main/src/cache-save.ts#L47
For setup-node: https://github.com/actions/setup-node/blob/main/src/cache-save.ts#L53

@bishal-pdMSFT , should we go ahead with major release then ?

@bishal-pdMSFT
Copy link
Contributor

New major version looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants