Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful handling of error (non-validation one) #1122

Merged
merged 8 commits into from Jun 24, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 6 additions & 2 deletions packages/cache/__tests__/restoreCache.test.ts
Expand Up @@ -73,13 +73,17 @@ test('restore with no cache found', async () => {
test('restore with server error should fail', async () => {
const paths = ['node_modules']
const key = 'node-test'
const logWarningMock = jest.spyOn(core, 'warning')

jest.spyOn(cacheHttpClient, 'getCacheEntry').mockImplementation(() => {
throw new Error('HTTP Error Occurred')
})

await expect(restoreCache(paths, key)).rejects.toThrowError(
'HTTP Error Occurred'
const cacheKey = await restoreCache(paths, key)
expect(cacheKey).toBe(undefined)
expect(logWarningMock).toHaveBeenCalledTimes(1)
expect(logWarningMock).toHaveBeenCalledWith(
'Fail to restore: Error: HTTP Error Occurred'
)
})

Expand Down
42 changes: 31 additions & 11 deletions packages/cache/__tests__/saveCache.test.ts
Expand Up @@ -48,6 +48,7 @@ test('save with large cache outputs should fail', async () => {
const cachePaths = [path.resolve(filePath)]

const createTarMock = jest.spyOn(tar, 'createTar')
const logWarningMock = jest.spyOn(core, 'warning')

const cacheSize = 11 * 1024 * 1024 * 1024 //~11GB, over the 10GB limit
jest
Expand All @@ -58,8 +59,11 @@ test('save with large cache outputs should fail', async () => {
.spyOn(cacheUtils, 'getCompressionMethod')
.mockReturnValueOnce(Promise.resolve(compression))

await expect(saveCache([filePath], primaryKey)).rejects.toThrowError(
'Cache size of ~11264 MB (11811160064 B) is over the 10GB limit, not saving cache.'
const cacheId = await saveCache([filePath], primaryKey)
expect(cacheId).toBe(-1)
expect(logWarningMock).toHaveBeenCalledTimes(1)
expect(logWarningMock).toHaveBeenCalledWith(
'Fail to save: Cache size of ~11264 MB (11811160064 B) is over the 10GB limit, not saving cache.'
)

const archiveFolder = '/foo/bar'
Expand All @@ -79,6 +83,7 @@ test('save with large cache outputs should fail in GHES with error message', asy
const cachePaths = [path.resolve(filePath)]

const createTarMock = jest.spyOn(tar, 'createTar')
const logWarningMock = jest.spyOn(core, 'warning')

const cacheSize = 11 * 1024 * 1024 * 1024 //~11GB, over the 10GB limit
jest
Expand Down Expand Up @@ -106,8 +111,11 @@ test('save with large cache outputs should fail in GHES with error message', asy
return response
})

await expect(saveCache([filePath], primaryKey)).rejects.toThrowError(
'The cache filesize must be between 0 and 1073741824 bytes'
const cacheId = await saveCache([filePath], primaryKey)
expect(cacheId).toBe(-1)
expect(logWarningMock).toHaveBeenCalledTimes(1)
expect(logWarningMock).toHaveBeenCalledWith(
'Fail to save: The cache filesize must be between 0 and 1073741824 bytes'
)

const archiveFolder = '/foo/bar'
Expand All @@ -127,6 +135,7 @@ test('save with large cache outputs should fail in GHES without error message',
const cachePaths = [path.resolve(filePath)]

const createTarMock = jest.spyOn(tar, 'createTar')
const logWarningMock = jest.spyOn(core, 'warning')

const cacheSize = 11 * 1024 * 1024 * 1024 //~11GB, over the 10GB limit
jest
Expand All @@ -150,8 +159,11 @@ test('save with large cache outputs should fail in GHES without error message',
return response
})

await expect(saveCache([filePath], primaryKey)).rejects.toThrowError(
'Cache size of ~11264 MB (11811160064 B) is over the data cap limit, not saving cache.'
const cacheId = await saveCache([filePath], primaryKey)
expect(cacheId).toBe(-1)
expect(logWarningMock).toHaveBeenCalledTimes(1)
expect(logWarningMock).toHaveBeenCalledWith(
'Fail to save: Cache size of ~11264 MB (11811160064 B) is over the data cap limit, not saving cache.'
)

const archiveFolder = '/foo/bar'
Expand All @@ -168,6 +180,7 @@ test('save with large cache outputs should fail in GHES without error message',
test('save with reserve cache failure should fail', async () => {
const paths = ['node_modules']
const primaryKey = 'Linux-node-bb828da54c148048dd17899ba9fda624811cfb43'
const logInfoMock = jest.spyOn(core, 'info')

const reserveCacheMock = jest
.spyOn(cacheHttpClient, 'reserveCache')
Expand All @@ -187,9 +200,13 @@ test('save with reserve cache failure should fail', async () => {
.spyOn(cacheUtils, 'getCompressionMethod')
.mockReturnValueOnce(Promise.resolve(compression))

await expect(saveCache(paths, primaryKey)).rejects.toThrowError(
`Unable to reserve cache with key ${primaryKey}, another job may be creating this cache.`
const cacheId = await saveCache(paths, primaryKey)
expect(cacheId).toBe(-1)
expect(logInfoMock).toHaveBeenCalledTimes(1)
expect(logInfoMock).toHaveBeenCalledWith(
`Fail to save: Unable to reserve cache with key ${primaryKey}, another job may be creating this cache. More details: undefined`
)

expect(reserveCacheMock).toHaveBeenCalledTimes(1)
expect(reserveCacheMock).toHaveBeenCalledWith(primaryKey, paths, {
compressionMethod: compression
Expand All @@ -203,7 +220,7 @@ test('save with server error should fail', async () => {
const filePath = 'node_modules'
const primaryKey = 'Linux-node-bb828da54c148048dd17899ba9fda624811cfb43'
const cachePaths = [path.resolve(filePath)]

const logWarningMock = jest.spyOn(core, 'warning')
const cacheId = 4
const reserveCacheMock = jest
.spyOn(cacheHttpClient, 'reserveCache')
Expand All @@ -228,9 +245,12 @@ test('save with server error should fail', async () => {
.spyOn(cacheUtils, 'getCompressionMethod')
.mockReturnValueOnce(Promise.resolve(compression))

await expect(saveCache([filePath], primaryKey)).rejects.toThrowError(
'HTTP Error Occurred'
await saveCache([filePath], primaryKey)
expect(logWarningMock).toHaveBeenCalledTimes(1)
expect(logWarningMock).toHaveBeenCalledWith(
'Fail to save: HTTP Error Occurred'
)

expect(reserveCacheMock).toHaveBeenCalledTimes(1)
expect(reserveCacheMock).toHaveBeenCalledWith(primaryKey, [filePath], {
compressionMethod: compression
Expand Down
47 changes: 30 additions & 17 deletions packages/cache/src/cache.ts
Expand Up @@ -86,23 +86,24 @@ export async function restoreCache(
}

const compressionMethod = await utils.getCompressionMethod()
let archivePath = ''
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will essentially suppress all failures including temp directory creation and tar extraction failures. Essentially we are trading accuracy for continuity here. We definitely need a monitor which will tell us how many users are hitting such failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suppression is already there https://github.com/actions/cache/blob/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d/src/restore.ts#L55-L62 and https://github.com/actions/cache/blob/main/src/save.ts#L51-L59 , the only thing is now it is moved to toolkit instead of Github Actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need a monitor which will tell us how many users are hitting such failures.

I will check how we can monitors for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is same problem as what @anuragc617 would be looking into w.r.t getting warnings from cache action so that we can have monitor on those.

// path are needed to compute version
const cacheEntry = await cacheHttpClient.getCacheEntry(keys, paths, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the try block be just around the http client to handle errors from the server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea here is to ensure that any failure is suppressed so that user does not get blocked due to cache action failures.
A bit scary thought but I guess it would be good for users' workflows to continue executing as in reality there is no blocker due to cache action failure. This may mean that users may not even notice things like archive/unarchive failing, and that's why I feel we should have monitors to ability to detect any anomaly in such failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can even whitelist error based on Response code (by creating exception for those codes) but i want to keep the end user expectation same and based on current implementation of Github Actions actions/cache https://github.com/actions/cache/blob/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d/src/restore.ts#L55-L62 and https://github.com/actions/cache/blob/main/src/save.ts#L51-L59, it is suppressing all the error apart from ValidationError. And this conditions will be removed from actions side.

compressionMethod
})

if (!cacheEntry?.archiveLocation) {
// Cache not found
return undefined
}

// path are needed to compute version
const cacheEntry = await cacheHttpClient.getCacheEntry(keys, paths, {
compressionMethod
})
if (!cacheEntry?.archiveLocation) {
// Cache not found
return undefined
}

const archivePath = path.join(
await utils.createTempDirectory(),
utils.getCacheFileName(compressionMethod)
)
core.debug(`Archive Path: ${archivePath}`)
archivePath = path.join(
await utils.createTempDirectory(),
utils.getCacheFileName(compressionMethod)
)
core.debug(`Archive Path: ${archivePath}`)

try {
// Download the cache from the cache entry
await cacheHttpClient.downloadCache(
cacheEntry.archiveLocation,
Expand All @@ -123,6 +124,11 @@ export async function restoreCache(

await extractTar(archivePath, compressionMethod)
core.info('Cache restored successfully')

return cacheEntry.cacheKey
} catch (error) {
// Supress all cache related errors because caching should be optional
core.warning(`Fail to restore: ${error}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
core.warning(`Fail to restore: ${error}`)
core.warning(`Failed to restore: ${error}`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} finally {
// Try to delete the archive to save space
try {
Expand All @@ -132,7 +138,7 @@ export async function restoreCache(
}
}

return cacheEntry.cacheKey
return undefined
}

/**
Expand All @@ -152,7 +158,7 @@ export async function saveCache(
checkKey(key)

const compressionMethod = await utils.getCompressionMethod()
let cacheId = null
let cacheId = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? IIRC this was made null recently. May @t-dedah can chime in on why it was made null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me check with @t-dedah because this function return type Number https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L150 so in case of exception returning null might not be correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain more why this change is required, earlier even though Function return type is number and cacheId was set null but in the code we were making sure this will never return null https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L204 otherwise we were throwing error so this line https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L229 will never get execute (in case of exception). So compiler was happy. but now as i have added catch block https://github.com/actions/toolkit/blob/main/packages/cache/src/cache.ts#L229 line can be executed (in case of no error or exception is consumed) so compiler is throwing error that ``return type can only be number not number | null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that seems to be the case.


const cachePaths = await utils.resolvePaths(paths)
core.debug('Cache Paths:')
Expand Down Expand Up @@ -217,6 +223,13 @@ export async function saveCache(

core.debug(`Saving Cache (ID: ${cacheId})`)
await cacheHttpClient.saveCache(cacheId, archivePath, options)
} catch (error) {
const typedError = error as Error
if (typedError.name === ReserveCacheError.name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check happen in action itself? I see that we do some checks in action itself. Should we also not do this check there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the condition which you mentioned in actions will be removed after this change and basically idea here is we will also need to add any checks like this in other actions setup-* or wherever this toolkit cache package is getting used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiwarishub but I don't see you catching the ValidationError and handling it. Does that still happen in action while rest happens in toolkit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, in the given try block we are not throwing ValidationError exception https://github.com/actions/toolkit/pull/1122/files#diff-34e72845d1658a8a9b9fdfe6b5b9261952e7f0496a0a3fcec87dae722718a64aL151-L152. So ValidationError will be still thrown as a error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Though I feel we should also add the ValidationError check here as well. We do not have a specific Validate method and that means the validation can potentially be added in many places in future and without a check all those validations will get suppressed with this global suppression and will be difficult to understand why that is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

core.info(`Fail to save: ${typedError.message}`)
} else {
core.warning(`Fail to save: ${typedError.message}`)
}
} finally {
// Try to delete the archive to save space
try {
Expand Down