Add safe-delete workspace API, and org setting to restrict force-delete #539

JarrettSpiker · 2022-09-29T03:44:04Z

Description

Adds the new workspace safe-delete API (and tests)

Adds the can-force-delete permission hash to the workspace permissions hash. This value is a *bool which will allow clients to detect if safe vs force delete is available

Adds the allow_force_delete_workspaces setting to organizations

These are all part of a change which adds some guardrails around workspace deletion in TFC. There will be a new safe-delete API to remove a workspace only if it does not have resources under management. The existing delete API will continue to function as a force delete, but may be restricted to org owners based on an org level setting

This change is in draft as we complete the feature in TFC and prepare for it in a TFE release

Testing plan

Test that Safe Delete works when a workspace does not have RUM, and fails when it does
Check that force delete still works
Check that can-force-delete is returned in workspace GETs
Check that can-force-delete is nil when communicating with older versions of TFE
Check that allow_force_delete is settable/readable through go-tfe

External links

API documentation (in progress)
RFC
JIRA

Output from tests

Including output from tests may require access to a TFE instance. Ignore this section if you have no environment to test against.

❯ go test .  -run TestOrganizationsAllowForceDeleteSetting -tags=integration -v
=== RUN   TestOrganizationsAllowForceDeleteSetting
=== RUN   TestOrganizationsAllowForceDeleteSetting/creates_and_updates_allow_force_delete
--- PASS: TestOrganizationsAllowForceDeleteSetting (1.52s)
    --- PASS: TestOrganizationsAllowForceDeleteSetting/creates_and_updates_allow_force_delete (1.23s)
PASS
ok      github.com/hashicorp/go-tfe     1.658s

❯ go test .  -run TestWorkspacesSafeDelete -tags=integration -v 
=== RUN   TestWorkspacesSafeDelete
=== RUN   TestWorkspacesSafeDelete/with_valid_options
=== RUN   TestWorkspacesSafeDelete/when_organization_is_invalid
=== RUN   TestWorkspacesSafeDelete/when_workspace_is_invalid
=== RUN   TestWorkspacesSafeDelete/when_workspace_is_locked
=== RUN   TestWorkspacesSafeDelete/when_workspace_has_resources_under_management
--- PASS: TestWorkspacesSafeDelete (4.33s)
    --- PASS: TestWorkspacesSafeDelete/with_valid_options (0.35s)
    --- PASS: TestWorkspacesSafeDelete/when_organization_is_invalid (0.00s)
    --- PASS: TestWorkspacesSafeDelete/when_workspace_is_invalid (0.00s)
    --- PASS: TestWorkspacesSafeDelete/when_workspace_is_locked (0.78s)
    --- PASS: TestWorkspacesSafeDelete/when_workspace_has_resources_under_management (1.77s)
=== RUN   TestWorkspacesSafeDeleteByID
=== RUN   TestWorkspacesSafeDeleteByID/with_valid_options
=== RUN   TestWorkspacesSafeDeleteByID/without_a_valid_workspace_ID
=== RUN   TestWorkspacesSafeDeleteByID/when_workspace_is_locked
=== RUN   TestWorkspacesSafeDeleteByID/when_workspace_has_resources_under_management
--- PASS: TestWorkspacesSafeDeleteByID (3.99s)
    --- PASS: TestWorkspacesSafeDeleteByID/with_valid_options (0.34s)
    --- PASS: TestWorkspacesSafeDeleteByID/without_a_valid_workspace_ID (0.00s)
    --- PASS: TestWorkspacesSafeDeleteByID/when_workspace_is_locked (0.78s)
    --- PASS: TestWorkspacesSafeDeleteByID/when_workspace_has_resources_under_management (1.39s)
PASS
ok      github.com/hashicorp/go-tfe     8.436s

❯ go test .  -run TestWorkspacesDelete -tags=integration -v 
=== RUN   TestWorkspacesDelete
=== RUN   TestWorkspacesDelete/with_valid_options
=== RUN   TestWorkspacesDelete/when_organization_is_invalid
=== RUN   TestWorkspacesDelete/when_workspace_is_invalid
--- PASS: TestWorkspacesDelete (1.61s)
    --- PASS: TestWorkspacesDelete/with_valid_options (0.35s)
    --- PASS: TestWorkspacesDelete/when_organization_is_invalid (0.00s)
    --- PASS: TestWorkspacesDelete/when_workspace_is_invalid (0.00s)
=== RUN   TestWorkspacesDeleteByID
=== RUN   TestWorkspacesDeleteByID/with_valid_options
=== RUN   TestWorkspacesDeleteByID/without_a_valid_workspace_ID
--- PASS: TestWorkspacesDeleteByID (1.66s)
    --- PASS: TestWorkspacesDeleteByID/with_valid_options (0.34s)
    --- PASS: TestWorkspacesDeleteByID/without_a_valid_workspace_ID (0.00s)
PASS
ok      github.com/hashicorp/go-tfe     3.378s

❯ go test .  -run TestCanForceDeletePermission -tags=integration -v
=== RUN   TestCanForceDeletePermission
=== RUN   TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete
--- PASS: TestCanForceDeletePermission (1.59s)
    --- PASS: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.26s)
PASS
ok      github.com/hashicorp/go-tfe     1.704s
...

brandonc · 2022-10-06T15:46:43Z

organization.go

@@ -78,6 +78,7 @@ type Organization struct {
 	TrialExpiresAt                                    time.Time                `jsonapi:"attr,trial-expires-at,iso8601"`
 	TwoFactorConformant                               bool                     `jsonapi:"attr,two-factor-conformant"`
 	SendPassingStatusesForUntriggeredSpeculativePlans bool                     `jsonapi:"attr,send-passing-statuses-for-untriggered-speculative-plans"`
+	AllowForceDeleteWorkspaces                        bool                     `jsonapi:"attr,allow-force-delete-workspaces"`


If this property is missing for older TFE versions, the default will be false. And then, of course, all deletes will be force deletes because safe delete isn't implemented. I think we are OK with this contradiction but I wanted to point it out.

That is a really good point...I am also ok with the contradiction, it seems less confusing than making this a *bool or something when it isn't strictly necessary and we use regular bools are used for everything else in this struct. I also cant think of a less confusing name...

I will add a comment though to make it clear that this will be false for older TFE versions, even though all deletes are force deletes in that case

SwiftEngineer

Only thing I saw was that little duplicate assertion, apart from that I'd say this PR looks great 💯

SwiftEngineer · 2022-10-13T22:17:56Z

workspace_integration_test.go

+		w, err := client.Workspaces.ReadByID(ctx, wTest.ID)
+		require.NoError(t, err)
+		assert.Equal(t, wTest, w)
+		assert.Equal(t, wTest, w)


duplicate assertion here

good catch, thanks!

sebasslash

Overall looks great 👍 . There are a few minor blocking issues concerning your tests. I also wanted to mention the changelog is a bit out of order at the moment and will be resolved in #549

sebasslash · 2022-10-20T20:55:04Z

organization_integration_test.go

+		}
+
+		org, err := client.Organizations.Create(ctx, options)
+		assert.Nil(t, err)


This should be require.NoError(t, error) -- reason being is the assert package will not halt the execution of the test if it fails, leading to more cascading errors and more noise. Make sure to change all error checks to use require 👍

sebasslash · 2022-10-20T20:56:47Z

organization_integration_test.go

+		t.Cleanup(func() {
+			err := client.Organizations.Delete(ctx, org.Name)
+			if err != nil {
+				t.Logf("error deleting organization (%s): %s", org.Name, err)


Normally we t.Errorf() when cleanup fails.

sebasslash · 2022-10-20T20:59:31Z

organization_integration_test.go

@@ -565,7 +565,41 @@ func TestOrganizationsReadRunTasksEntitlement(t *testing.T) {
 		assert.NotEmpty(t, entitlements.ID)
 		assert.True(t, entitlements.RunTasks)
 	})
+}
+
+func TestOrganizationsAllowForceDeleteSetting(t *testing.T) {


Our docs should be much more explicit, but we've recently introduced test splitting in our CI and we're requiring all top-level tests to call skipIfNotCINode(t) as the first check. This will ensure the test runs only once on the correct test node instead of on every single node. You do not need to do this for subtests.

sebasslash · 2022-10-20T21:09:12Z

workspace_integration_test.go

+	orgTest, orgTestCleanup := createOrganization(t, client)
+	defer orgTestCleanup()
+
+	wTest, _ := createWorkspace(t, client, orgTest)


Hm, Is there a reason we don't clean this workspace up?

I believe I was copying from the update tests in this file, which also ignore the cleanup

Im not sure why though...I will try to update all the tests to do the cleanup

Oh, I remember. Because the the workspace gets removed in the process of the test, and then the cleanup call throws an errors. Or, in some other tests (and also this one, I would guess) the org also gets cleaned up first, which deletes the workspace and causes a defered workspace cleanup to fail

E.g. if you try to defer the cleanup in the test I linked above you get

=== RUN TestWorkspacesUpdate === CONT TestWorkspacesUpdate helper_test.go:1602: Error destroying workspace! WARNING: Dangling resources may exist! The full error is shown below. Workspace: 0874350d-2197-1355-74c7-4ee437bbc010 Error: resource not found --- FAIL: TestWorkspacesUpdate (5.32s)

That feels like it should be a solve-able problem. I will update the workspace cleanup function not to error on 404s, and then update the other tests in this file to cleanup the workspaces which they create

sebasslash · 2022-10-20T21:12:01Z

workspace_integration_test.go

+		defer workspaceCleanup()
+		_, svTestCleanup := createStateVersion(t, client, 0, wTest)
+		t.Cleanup(svTestCleanup)


Non-blocker: Not a huge deal to mix the two, but the preference here is for t.Cleanup()

Do you mean that it is preferred to do t.Cleanup(workspaceCleanup) instead of defer workspaceCleanup() directly?

…e admins can force delete

…TFE versions

sebasslash

Almost at the finish line 🏃 ⬇️

sebasslash · 2022-10-25T14:10:10Z

helper_test.go

+			if err == ErrResourceNotFound {
+				return
+			}


Ah, a recurring pain point for us. I tried introducing this in #527 but it was ultimately rejected considering it could obfuscate an authorization bug. So unfortunately will have to reject it here -- I think we can omit the cleanup for now if an "already deleted" workspace is causing this error.

Interesting...I think I would lean the other direction on that, since a test creating a workspace and then being unable to "see" it when it goes for a delete seems unlikely, but I definitely see the logic

I removed the ErrResourceNotFound check, and switched a lot of the tests to using t.Cleanup instead of defer for the cleanup callbacks, and that does seem to have cleared up most of the cleanup errors~

I also changed this to delete the workspace by ID instead of Name, because we had one test which changes the workspace name (causing this cleanup to fail)

sebasslash · 2022-10-25T14:15:15Z

workspace_integration_test.go

+		wTest, workspaceCleanup := createWorkspace(t, client, orgTest)
+		t.Cleanup(workspaceCleanup)
+		w, err := client.Workspaces.Lock(ctx, wTest.ID, WorkspaceLockOptions{})
+		assert.NoError(t, err)


missed one require.NoError(t, err), I'd even say we should also require.True(t, w.Locked) since the rest of your test depends on this being true. If this fails, there's no point in continuing to run the rest of the test.

sebasslash · 2022-10-25T14:16:18Z

workspace_integration_test.go

+		wTest, workspaceCleanup := createWorkspace(t, client, orgTest)
+		t.Cleanup(workspaceCleanup)
+		w, err := client.Workspaces.Lock(ctx, wTest.ID, WorkspaceLockOptions{})
+		assert.NoError(t, err)


sebasslash · 2022-10-25T14:28:07Z

workspace_integration_test.go

+		assert.Equal(t, wTest, w)
+		assert.NotNil(t, w.Permissions.CanForceDelete)
+		assert.True(t, *w.Permissions.CanForceDelete)


Looks like your test is panicking:

=== FAIL: . TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.13s) workspace_integration_test.go:1237: Error Trace: /home/runner/work/go-tfe/go-tfe/workspace_integration_test.go:1237 Error: Expected value not to be nil. Test: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete --- FAIL: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.13s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7f0a47] goroutine 2995 [running]: testing.tRunner.func1.2({0x861640, 0xbf9470}) /opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1396 +0x24e testing.tRunner.func1() /opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1399 +0x39f panic({0x861640, 0xbf9470}) /opt/hostedtoolcache/go/1.19.2/x64/src/runtime/panic.go:884 +0x212 github.com/hashicorp/go-tfe.TestCanForceDeletePermission.func1(0x1?) /home/runner/work/go-tfe/go-tfe/workspace_integration_test.go:1238 +0xe7 testing.tRunner(0xc0004b56c0, 0xc0006b2780) /opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1446 +0x10b created by testing.(*T).Run /opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1493 +0x35f exit status 2

Two things:

This very well may be an issue with our CI instance not having the latest API changes. In which case we'd need to fix this.

We tend to require the existence of a relationship struct, i.e require.NotNil(t, w.Permissions), before we check said relationship's fields (otherwise we risk a panic).

This very well may be an issue with our CI instance not having the latest API changes. In which case we'd need to fix this.

I think that this is at least partially it...where can I find the instance that the acceptance tests run against? I can check out if it has all the api changes expected

Edit: Found it, and it looks like it has at least some of the expected changes...I will dig into why the tests are failing like this!

Edit edit: I was wrong in the previous edit. I was looking at tfelocal-cloud (which has the correct changes) not tflocal-go-tfe which does not. @sebasslash can you confirm that the integration tests are running against tflocal-go-tfe, and help me figure out how to update it?

sebasslash

🚀 Looks like there are some flakes unrelated to this PR but still approving.

JarrettSpiker requested a review from a team September 29, 2022 03:44

JarrettSpiker requested a review from a team as a code owner September 29, 2022 03:44

JarrettSpiker marked this pull request as draft September 29, 2022 03:44

JarrettSpiker mentioned this pull request Oct 5, 2022

Use safe or force workspace delete for cloud backend hashicorp/terraform#31949

Merged

brandonc previously approved these changes Oct 6, 2022

View reviewed changes

JarrettSpiker dismissed brandonc’s stale review via 509df39 October 11, 2022 15:44

SwiftEngineer reviewed Oct 13, 2022

View reviewed changes

JarrettSpiker force-pushed the mpminardi/safe-delete branch from 302c326 to 354728c Compare October 14, 2022 20:17

JarrettSpiker marked this pull request as ready for review October 20, 2022 18:27

sebasslash requested changes Oct 20, 2022

View reviewed changes

mpminardi and others added 6 commits October 21, 2022 16:04

Add safe-delete workspace API, and org setting to control if workspac…

7e7faed

…e admins can force delete

Add a comment to clarify that AllowForceDelete will be false for old …

8730b72

…TFE versions

Generate mocks for SafeDelete

51341b4

Remove duplicate assertion

16d153b

Add changelog entry

2213c1e

Update tests to exit early if there is a setup error

e8b91ef

JarrettSpiker force-pushed the mpminardi/safe-delete branch from 354728c to e8b91ef Compare October 21, 2022 20:25

Attempt cleanup on workspaces which should be deleted

01e5267

sebasslash requested changes Oct 25, 2022

View reviewed changes

sebasslash reviewed Oct 25, 2022

View reviewed changes

JarrettSpiker added 3 commits October 25, 2022 15:31

Cleanup workspaces by ID not name

dc48cae

Exit test early if workspace permissions nil to avoid panic

d86cb66

Prevent test panic

f3a30d5

sebasslash approved these changes Oct 26, 2022

View reviewed changes

JarrettSpiker merged commit 7b137b4 into main Oct 27, 2022

JarrettSpiker deleted the mpminardi/safe-delete branch October 27, 2022 13:26

annawinkler mentioned this pull request Nov 1, 2022

Update changelog for v1.12.0 release #578

Merged

Add safe-delete workspace API, and org setting to restrict force-delete #539

Add safe-delete workspace API, and org setting to restrict force-delete #539

Conversation

JarrettSpiker commented Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing plan

External links

Output from tests

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwiftEngineer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sebasslash left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sebasslash left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JarrettSpiker Oct 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sebasslash left a comment

Choose a reason for hiding this comment

Uh oh!

JarrettSpiker commented Sep 29, 2022 •

edited

Loading

JarrettSpiker Oct 25, 2022 •

edited

Loading