Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function deleteSWGAutoGenRouter doesn't wait for the operation to finish #18140

Open
teyuchang opened this issue May 14, 2024 · 3 comments
Open

Comments

@teyuchang
Copy link

teyuchang commented May 14, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to a user, that user is claiming responsibility for the issue.
  • Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v0.13.7
on linux/amd64

  • provider registry.terraform.io/hashicorp/google-beta v4.84.0

Affected Resource(s)

google_network_services_gateway

Terraform Configuration

resource "google_compute_network" "default" {
  name                    = "my-network"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "default" {
  name          = "my-subnetwork-name"
  purpose       = "PRIVATE"
  ip_cidr_range = "10.128.0.0/20"
  region        = "us-central1"
  network       = google_compute_network.default.id
  role          = "ACTIVE"
}

resource "google_compute_subnetwork" "proxyonlysubnet" {
  name          = "my-proxy-only-subnetwork"
  purpose       = "REGIONAL_MANAGED_PROXY"
  ip_cidr_range = "192.168.0.0/23"
  region        = "us-central1"
  network       = google_compute_network.default.id
  role          = "ACTIVE"
}

resource "google_network_security_gateway_security_policy" "default" {
  name        = "my-policy-name"
  location    = "us-central1"
}

resource "google_network_security_gateway_security_policy_rule" "default" {
  name                    = "my-policyrule-name"
  location                = "us-central1"
  gateway_security_policy = google_network_security_gateway_security_policy.default.name
  enabled                 = true  
  priority                = 1
  session_matcher         = "host() == 'example.com'"
  basic_profile           = "ALLOW"
}

resource "google_network_services_gateway" "default" {
  name                                 = "my-gateway1"
  location                             = "us-central1"
  addresses                            = ["10.128.0.99"]
  type                                 = "SECURE_WEB_GATEWAY"
  ports                                = [443]
  gateway_security_policy              = google_network_security_gateway_security_policy.default.id
  network                              = google_compute_network.default.id
  subnetwork                           = google_compute_subnetwork.default.id
  delete_swg_autogen_router_on_destroy = true
  depends_on                           = [google_compute_subnetwork.proxyonlysubnet]
}

Debug Output

No response

Expected Behavior

The function deleteSWGAutoGenRouter should wait until the operation finishes.

Actual Behavior

deleteSWGAutoGenRouter returns immediately after it sends a Delete request without waiting the operation to finish. It sometimes results in terraform destroy failure

Error: Error waiting for Deleting Network: The network resource 'projects/xxx/global/networks/my-network' is already being used by 'projects/xxx/regions/us-central1/routers/swg-autogen-router-1234567890'

Steps to reproduce

  1. terraform apply
  2. terraform destroy

Important Factoids

No response

References

No response

b/342170266

@teyuchang teyuchang added the bug label May 14, 2024
@ggtisc ggtisc self-assigned this May 21, 2024
@ggtisc
Copy link
Collaborator

ggtisc commented May 21, 2024

Hi @teyuchang!

I used exactly your same code, terraform version(0.13.7) and Google provider version(4.84.0) and followed your steps to reproduce this issue:

  1. tarraform apply
  2. terraform destroy

But I didn't get any error or the behavior you commented. Are there other configurations or resources involved, or do I need to wait more than a minute after creation to run `terraform destroy?

@teyuchang
Copy link
Author

teyuchang commented May 21, 2024

Thank you for testing. Unfortunately, this bug is flaky and may not happen every time. The key factor in reproducing it is the time it takes to delete the router – a longer deletion time makes it more likely to happen. To consistently reproduce it, you can use a test stub instead of actual GCP endpoints and intentionally delay the router deletion process.

Here's a simplified version of how the Terraform example works:

Creation(terraform apply):

  1. Create a VPC network.
  2. Create subnets.
  3. Create a secure web gateway, which automatically creates a router.

Deletion(terraform destroy):

  1. Delete the secure web gateway. (The router is also deleted automatically because delete_swg_autogen_router_on_destroy is set to true).
  2. Delete subnets.
  3. Delete the VPC network.

The problem occurs during step 1 of the deletion process. Terraform attempts to delete the router but doesn't wait for the long-running operation to finish. This means the router might still exist when Terraform tries to delete the VPC in step 3, causing the VPC deletion to fail.

The relevant code is in deleteSWGAutoGenRouter. The response is ignored using _. Instead, it should be handled similarly to the code in resourceNetworkServicesGatewayDelete, where the response is captured in a res variable and the operation is waited upon, as shown here.

@ggtisc
Copy link
Collaborator

ggtisc commented May 22, 2024

Confirmed issue, as the user reports the more time we wait to delete the resources triggers this behavior. After waiting more than 12 hrs and running a terraform destroy it returns the specified message:

Error: Error waiting for Deleting Network: The network resource 'projects/xxx/global/networks/my-network' is already being used by 'projects/xxx/regions/us-central1/routers/swg-autogen-router-1234567890'

@ggtisc ggtisc removed their assignment May 22, 2024
@ggtisc ggtisc removed the forward/review In review; remove label to forward label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants