Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safe-delete workspace API, and org setting to restrict force-delete #539

Merged
merged 10 commits into from Oct 27, 2022

Conversation

JarrettSpiker
Copy link
Contributor

@JarrettSpiker JarrettSpiker commented Sep 29, 2022

Description

Adds the new workspace safe-delete API (and tests)

Adds the can-force-delete permission hash to the workspace permissions hash. This value is a *bool which will allow clients to detect if safe vs force delete is available

Adds the allow_force_delete_workspaces setting to organizations

These are all part of a change which adds some guardrails around workspace deletion in TFC. There will be a new safe-delete API to remove a workspace only if it does not have resources under management. The existing delete API will continue to function as a force delete, but may be restricted to org owners based on an org level setting

This change is in draft as we complete the feature in TFC and prepare for it in a TFE release

Testing plan

  1. Test that Safe Delete works when a workspace does not have RUM, and fails when it does
  2. Check that force delete still works
  3. Check that can-force-delete is returned in workspace GETs
  4. Check that can-force-delete is nil when communicating with older versions of TFE
  5. Check that allow_force_delete is settable/readable through go-tfe

External links

Output from tests

Including output from tests may require access to a TFE instance. Ignore this section if you have no environment to test against.

❯ go test .  -run TestOrganizationsAllowForceDeleteSetting -tags=integration -v
=== RUN   TestOrganizationsAllowForceDeleteSetting
=== RUN   TestOrganizationsAllowForceDeleteSetting/creates_and_updates_allow_force_delete
--- PASS: TestOrganizationsAllowForceDeleteSetting (1.52s)
    --- PASS: TestOrganizationsAllowForceDeleteSetting/creates_and_updates_allow_force_delete (1.23s)
PASS
ok      github.com/hashicorp/go-tfe     1.658s

❯ go test .  -run TestWorkspacesSafeDelete -tags=integration -v 
=== RUN   TestWorkspacesSafeDelete
=== RUN   TestWorkspacesSafeDelete/with_valid_options
=== RUN   TestWorkspacesSafeDelete/when_organization_is_invalid
=== RUN   TestWorkspacesSafeDelete/when_workspace_is_invalid
=== RUN   TestWorkspacesSafeDelete/when_workspace_is_locked
=== RUN   TestWorkspacesSafeDelete/when_workspace_has_resources_under_management
--- PASS: TestWorkspacesSafeDelete (4.33s)
    --- PASS: TestWorkspacesSafeDelete/with_valid_options (0.35s)
    --- PASS: TestWorkspacesSafeDelete/when_organization_is_invalid (0.00s)
    --- PASS: TestWorkspacesSafeDelete/when_workspace_is_invalid (0.00s)
    --- PASS: TestWorkspacesSafeDelete/when_workspace_is_locked (0.78s)
    --- PASS: TestWorkspacesSafeDelete/when_workspace_has_resources_under_management (1.77s)
=== RUN   TestWorkspacesSafeDeleteByID
=== RUN   TestWorkspacesSafeDeleteByID/with_valid_options
=== RUN   TestWorkspacesSafeDeleteByID/without_a_valid_workspace_ID
=== RUN   TestWorkspacesSafeDeleteByID/when_workspace_is_locked
=== RUN   TestWorkspacesSafeDeleteByID/when_workspace_has_resources_under_management
--- PASS: TestWorkspacesSafeDeleteByID (3.99s)
    --- PASS: TestWorkspacesSafeDeleteByID/with_valid_options (0.34s)
    --- PASS: TestWorkspacesSafeDeleteByID/without_a_valid_workspace_ID (0.00s)
    --- PASS: TestWorkspacesSafeDeleteByID/when_workspace_is_locked (0.78s)
    --- PASS: TestWorkspacesSafeDeleteByID/when_workspace_has_resources_under_management (1.39s)
PASS
ok      github.com/hashicorp/go-tfe     8.436s

❯ go test .  -run TestWorkspacesDelete -tags=integration -v 
=== RUN   TestWorkspacesDelete
=== RUN   TestWorkspacesDelete/with_valid_options
=== RUN   TestWorkspacesDelete/when_organization_is_invalid
=== RUN   TestWorkspacesDelete/when_workspace_is_invalid
--- PASS: TestWorkspacesDelete (1.61s)
    --- PASS: TestWorkspacesDelete/with_valid_options (0.35s)
    --- PASS: TestWorkspacesDelete/when_organization_is_invalid (0.00s)
    --- PASS: TestWorkspacesDelete/when_workspace_is_invalid (0.00s)
=== RUN   TestWorkspacesDeleteByID
=== RUN   TestWorkspacesDeleteByID/with_valid_options
=== RUN   TestWorkspacesDeleteByID/without_a_valid_workspace_ID
--- PASS: TestWorkspacesDeleteByID (1.66s)
    --- PASS: TestWorkspacesDeleteByID/with_valid_options (0.34s)
    --- PASS: TestWorkspacesDeleteByID/without_a_valid_workspace_ID (0.00s)
PASS
ok      github.com/hashicorp/go-tfe     3.378s

❯ go test .  -run TestCanForceDeletePermission -tags=integration -v
=== RUN   TestCanForceDeletePermission
=== RUN   TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete
--- PASS: TestCanForceDeletePermission (1.59s)
    --- PASS: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.26s)
PASS
ok      github.com/hashicorp/go-tfe     1.704s
...

brandonc
brandonc previously approved these changes Oct 6, 2022
organization.go Outdated
@@ -78,6 +78,7 @@ type Organization struct {
TrialExpiresAt time.Time `jsonapi:"attr,trial-expires-at,iso8601"`
TwoFactorConformant bool `jsonapi:"attr,two-factor-conformant"`
SendPassingStatusesForUntriggeredSpeculativePlans bool `jsonapi:"attr,send-passing-statuses-for-untriggered-speculative-plans"`
AllowForceDeleteWorkspaces bool `jsonapi:"attr,allow-force-delete-workspaces"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this property is missing for older TFE versions, the default will be false. And then, of course, all deletes will be force deletes because safe delete isn't implemented. I think we are OK with this contradiction but I wanted to point it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a really good point...I am also ok with the contradiction, it seems less confusing than making this a *bool or something when it isn't strictly necessary and we use regular bools are used for everything else in this struct. I also cant think of a less confusing name...

I will add a comment though to make it clear that this will be false for older TFE versions, even though all deletes are force deletes in that case

Copy link
Contributor

@SwiftEngineer SwiftEngineer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing I saw was that little duplicate assertion, apart from that I'd say this PR looks great 💯

w, err := client.Workspaces.ReadByID(ctx, wTest.ID)
require.NoError(t, err)
assert.Equal(t, wTest, w)
assert.Equal(t, wTest, w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate assertion here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

@JarrettSpiker JarrettSpiker marked this pull request as ready for review October 20, 2022 18:27
Copy link
Contributor

@sebasslash sebasslash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great 👍 . There are a few minor blocking issues concerning your tests. I also wanted to mention the changelog is a bit out of order at the moment and will be resolved in #549

}

org, err := client.Organizations.Create(ctx, options)
assert.Nil(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be require.NoError(t, error) -- reason being is the assert package will not halt the execution of the test if it fails, leading to more cascading errors and more noise. Make sure to change all error checks to use require 👍

t.Cleanup(func() {
err := client.Organizations.Delete(ctx, org.Name)
if err != nil {
t.Logf("error deleting organization (%s): %s", org.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we t.Errorf() when cleanup fails.

@@ -565,7 +565,41 @@ func TestOrganizationsReadRunTasksEntitlement(t *testing.T) {
assert.NotEmpty(t, entitlements.ID)
assert.True(t, entitlements.RunTasks)
})
}

func TestOrganizationsAllowForceDeleteSetting(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our docs should be much more explicit, but we've recently introduced test splitting in our CI and we're requiring all top-level tests to call skipIfNotCINode(t) as the first check. This will ensure the test runs only once on the correct test node instead of on every single node. You do not need to do this for subtests.

orgTest, orgTestCleanup := createOrganization(t, client)
defer orgTestCleanup()

wTest, _ := createWorkspace(t, client, orgTest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, Is there a reason we don't clean this workspace up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I was copying from the update tests in this file, which also ignore the cleanup

Im not sure why though...I will try to update all the tests to do the cleanup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I remember. Because the the workspace gets removed in the process of the test, and then the cleanup call throws an errors. Or, in some other tests (and also this one, I would guess) the org also gets cleaned up first, which deletes the workspace and causes a defered workspace cleanup to fail

E.g. if you try to defer the cleanup in the test I linked above you get

=== RUN   TestWorkspacesUpdate
=== CONT  TestWorkspacesUpdate
    helper_test.go:1602: Error destroying workspace! WARNING: Dangling resources
        may exist! The full error is shown below.
        
        Workspace: 0874350d-2197-1355-74c7-4ee437bbc010
        Error: resource not found
--- FAIL: TestWorkspacesUpdate (5.32s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels like it should be a solve-able problem. I will update the workspace cleanup function not to error on 404s, and then update the other tests in this file to cleanup the workspaces which they create

Comment on lines 1322 to 1330
defer workspaceCleanup()
_, svTestCleanup := createStateVersion(t, client, 0, wTest)
t.Cleanup(svTestCleanup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocker: Not a huge deal to mix the two, but the preference here is for t.Cleanup()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that it is preferred to do t.Cleanup(workspaceCleanup) instead of defer workspaceCleanup() directly?

Copy link
Contributor

@sebasslash sebasslash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost at the finish line 🏃 ⬇️

helper_test.go Outdated
Comment on lines 1604 to 1606
if err == ErrResourceNotFound {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, a recurring pain point for us. I tried introducing this in #527 but it was ultimately rejected considering it could obfuscate an authorization bug. So unfortunately will have to reject it here -- I think we can omit the cleanup for now if an "already deleted" workspace is causing this error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting...I think I would lean the other direction on that, since a test creating a workspace and then being unable to "see" it when it goes for a delete seems unlikely, but I definitely see the logic

I removed the ErrResourceNotFound check, and switched a lot of the tests to using t.Cleanup instead of defer for the cleanup callbacks, and that does seem to have cleared up most of the cleanup errors~

I also changed this to delete the workspace by ID instead of Name, because we had one test which changes the workspace name (causing this cleanup to fail)

wTest, workspaceCleanup := createWorkspace(t, client, orgTest)
t.Cleanup(workspaceCleanup)
w, err := client.Workspaces.Lock(ctx, wTest.ID, WorkspaceLockOptions{})
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed one require.NoError(t, err), I'd even say we should also require.True(t, w.Locked) since the rest of your test depends on this being true. If this fails, there's no point in continuing to run the rest of the test.

wTest, workspaceCleanup := createWorkspace(t, client, orgTest)
t.Cleanup(workspaceCleanup)
w, err := client.Workspaces.Lock(ctx, wTest.ID, WorkspaceLockOptions{})
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines 1236 to 1238
assert.Equal(t, wTest, w)
assert.NotNil(t, w.Permissions.CanForceDelete)
assert.True(t, *w.Permissions.CanForceDelete)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like your test is panicking:

=== FAIL: . TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.13s)
    workspace_integration_test.go:1237: 
        	Error Trace:	/home/runner/work/go-tfe/go-tfe/workspace_integration_test.go:1237
        	Error:      	Expected value not to be nil.
        	Test:       	TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete
    --- FAIL: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.13s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7f0a47]

goroutine 2995 [running]:
testing.tRunner.func1.2({0x861640, 0xbf9470})
	/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1399 +0x39f
panic({0x861640, 0xbf9470})
	/opt/hostedtoolcache/go/1.19.2/x64/src/runtime/panic.go:884 +0x212
github.com/hashicorp/go-tfe.TestCanForceDeletePermission.func1(0x1?)
	/home/runner/work/go-tfe/go-tfe/workspace_integration_test.go:1238 +0xe7
testing.tRunner(0xc0004b56c0, 0xc0006b2780)
	/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1493 +0x35f
exit status 2

Two things:

  • This very well may be an issue with our CI instance not having the latest API changes. In which case we'd need to fix this.
  • We tend to require the existence of a relationship struct, i.e require.NotNil(t, w.Permissions), before we check said relationship's fields (otherwise we risk a panic).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very well may be an issue with our CI instance not having the latest API changes. In which case we'd need to fix this.

I think that this is at least partially it...where can I find the instance that the acceptance tests run against? I can check out if it has all the api changes expected

Edit: Found it, and it looks like it has at least some of the expected changes...I will dig into why the tests are failing like this!

Edit edit: I was wrong in the previous edit. I was looking at tfelocal-cloud (which has the correct changes) not tflocal-go-tfe which does not. @sebasslash can you confirm that the integration tests are running against tflocal-go-tfe, and help me figure out how to update it?

Copy link
Contributor

@sebasslash sebasslash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Looks like there are some flakes unrelated to this PR but still approving.

@JarrettSpiker JarrettSpiker merged commit 7b137b4 into main Oct 27, 2022
@JarrettSpiker JarrettSpiker deleted the mpminardi/safe-delete branch October 27, 2022 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants