New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful handling of error (non-validation one) #1122
Conversation
@@ -86,23 +86,24 @@ export async function restoreCache( | |||
} | |||
|
|||
const compressionMethod = await utils.getCompressionMethod() | |||
let archivePath = '' | |||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will essentially suppress all failures including temp directory creation and tar extraction failures. Essentially we are trading accuracy for continuity here. We definitely need a monitor which will tell us how many users are hitting such failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suppression is already there https://github.com/actions/cache/blob/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d/src/restore.ts#L55-L62 and https://github.com/actions/cache/blob/main/src/save.ts#L51-L59 , the only thing is now it is moved to toolkit instead of Github Actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need a monitor which will tell us how many users are hitting such failures.
I will check how we can monitors for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is same problem as what @anuragc617 would be looking into w.r.t getting warnings from cache action so that we can have monitor on those.
packages/cache/src/cache.ts
Outdated
@@ -217,6 +223,13 @@ export async function saveCache( | |||
|
|||
core.debug(`Saving Cache (ID: ${cacheId})`) | |||
await cacheHttpClient.saveCache(cacheId, archivePath, options) | |||
} catch (error) { | |||
const typedError = error as Error | |||
if (typedError.name === ReserveCacheError.name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check happen in action itself? I see that we do some checks in action itself. Should we also not do this check there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the condition which you mentioned in actions will be removed after this change and basically idea here is we will also need to add any checks like this in other actions setup-* or wherever this toolkit cache package is getting used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiwarishub but I don't see you catching the ValidationError and handling it. Does that still happen in action while rest happens in toolkit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, in the given try block we are not throwing ValidationError exception https://github.com/actions/toolkit/pull/1122/files#diff-34e72845d1658a8a9b9fdfe6b5b9261952e7f0496a0a3fcec87dae722718a64aL151-L152. So ValidationError will be still thrown as a error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Though I feel we should also add the ValidationError check here as well. We do not have a specific Validate method and that means the validation can potentially be added in many places in future and without a check all those validations will get suppressed with this global suppression and will be difficult to understand why that is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -152,7 +158,7 @@ export async function saveCache( | |||
checkKey(key) | |||
|
|||
const compressionMethod = await utils.getCompressionMethod() | |||
let cacheId = null | |||
let cacheId = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? IIRC this was made null
recently. May @t-dedah can chime in on why it was made null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let me check with @t-dedah because this function return type Number
https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L150 so in case of exception returning null might not be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain more why this change is required, earlier even though Function return type is number and cacheId
was set null but in the code we were making sure this will never return null https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L204 otherwise we were throwing error so this line https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L229 will never get execute (in case of exception). So compiler was happy. but now as i have added catch block https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L229 line can be executed (in case of no error or exception is consumed) so compiler is throwing error that ``return type can only be number not number | null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that seems to be the case.
let archivePath = '' | ||
try { | ||
// path are needed to compute version | ||
const cacheEntry = await cacheHttpClient.getCacheEntry(keys, paths, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the try block be just around the http client to handle errors from the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the idea here is to ensure that any failure is suppressed so that user does not get blocked due to cache action failures.
A bit scary thought but I guess it would be good for users' workflows to continue executing as in reality there is no blocker due to cache action failure. This may mean that users may not even notice things like archive/unarchive failing, and that's why I feel we should have monitors to ability to detect any anomaly in such failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can even whitelist error based on Response code (by creating exception for those codes) but i want to keep the end user expectation same and based on current implementation of Github Actions actions/cache https://github.com/actions/cache/blob/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d/src/restore.ts#L55-L62 and https://github.com/actions/cache/blob/main/src/save.ts#L51-L59, it is suppressing all the error apart from ValidationError. And this conditions will be removed from actions side.
packages/cache/src/cache.ts
Outdated
return cacheEntry.cacheKey | ||
} catch (error) { | ||
// Supress all cache related errors because caching should be optional | ||
core.warning(`Fail to restore: ${error}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core.warning(`Fail to restore: ${error}`) | |
core.warning(`Failed to restore: ${error}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@bishal-pdMSFT / @aparna-ravindra, Let me know if you this PR is good to go |
Will this be a breaking change as we are changing the error contract. We are now not throwing errors which were previously being thrown. |
Otherwise looks good to me. |
Yes, i have seen in restore, Github actions generally checks for cache returned key so there we are good. For actions/cache: But for save, Actions generally trust to exception to get raised if there is issues to stop execution of their current function. So there it requires change to check return cacheId as well. For actions/cache: https://github.com/actions/cache/blob/main/src/save.ts#L48 @bishal-pdMSFT , should we go ahead with major release then ? |
New major version looks good to me |
We recently saw due to unavailability of Actions cache service, some of the setup-* start failing because of their caching feature. Ideally from actions-cache team it is recommended to handle cache service error (non-validation) gracefully and not to fail to workflow. This PR moves that dependency to caches packages, so that we will not throw error and throw the error as warning/info to stderr/stdout.