Fix: don't fail test helpers if cleanup returns a 404 #527

sebasslash · 2022-09-07T16:08:05Z

Description

We've had a series of flaky tests where the test itself succeeds but the cleanup fails, most notably workspaces deleting their children before the test is able to make the API call to delete a given child resource. We want to reduce the possibility of a flake during cleanup.

The idea here is we shouldn't fail a test if the resource that it's trying to clean up is not found (ErrResourceNotFound). We don't use explicit synchronization when firing off API calls which could explain why parent resources are occasionally deleted first. Rather than to wrangle with explicit ordering, we can simply add a fail-safe to ignore deleting non existent resources.

It is worth mentioning, all remaining resources are destroyed when the tflocal instance is rebuilt nightly so these dangling resources don't sit around for too long. We don't however ignore all deletion errors since they can potentially raise an API level concern.

External links

Comment that posed question

brandonc

I like that this targets 404 errors only, and I only have one reservation: 404 is also used when an authorization error occurs so that could potentially hide a bug that the test runner does not have authorization to delete something, which would be surprising.

I think I am leaning toward being against this change because proper ordering (deleting the thing the workspace owns before deleting the workspace, for instance) and using t.Cleanup so ordering is simplified eliminates the problem. I don't think this has the potential to resolve flakes that are otherwise successful tests, but I think it has the potential to obscure test structure bugs or authorization bugs.

Uk1288 · 2022-09-08T14:14:58Z

Good point, sounds like we might have to handle this on a case by case flake. Given that some of the clean up failures are difficult to reproduce, I am thinking we should monitor and fix them as they occur.

sebasslash · 2022-09-08T19:30:27Z

Hmm, I don't think this obscures an authorization bug because we always test the delete method for a given interface -- which if in the case it was an auth bug, those tests would fail as well. I think if we're getting a 404 during the cleanup, but the related delete integration test succeeds, it is an actual 404.

laurenolivia · 2022-09-12T22:13:38Z

@sebasslash The StateVersion flake failing your PR has been fixed 😸

sebasslash · 2022-09-13T13:52:10Z

Alrighty, there haven't been any more comments and it seems we're set on failing tests with 404s on cleanup. Closing.

Fix: don't fail test helpers if cleanup returns a 404

a9195d9

sebasslash requested a review from a team as a code owner September 7, 2022 16:08

brandonc reviewed Sep 7, 2022

View reviewed changes

sebasslash closed this Sep 13, 2022

sebasslash mentioned this pull request Oct 25, 2022

Add safe-delete workspace API, and org setting to restrict force-delete #539

Merged

annawinkler deleted the ignore-404-deletion branch February 2, 2023 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: don't fail test helpers if cleanup returns a 404 #527

Fix: don't fail test helpers if cleanup returns a 404 #527

sebasslash commented Sep 7, 2022

brandonc left a comment •

edited

Uk1288 commented Sep 8, 2022

sebasslash commented Sep 8, 2022

laurenolivia commented Sep 12, 2022

sebasslash commented Sep 13, 2022 •

edited

Fix: don't fail test helpers if cleanup returns a 404 #527

Fix: don't fail test helpers if cleanup returns a 404 #527

Conversation

sebasslash commented Sep 7, 2022

Description

External links

brandonc left a comment • edited

Choose a reason for hiding this comment

Uk1288 commented Sep 8, 2022

sebasslash commented Sep 8, 2022

laurenolivia commented Sep 12, 2022

sebasslash commented Sep 13, 2022 • edited

brandonc left a comment •

edited

sebasslash commented Sep 13, 2022 •

edited