Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashicorp/tls provider claims it can plan destroy operations but fails when asked to plan destroy of tls_self_signed_cert #31820

Closed
apparentlymart opened this issue Sep 19, 2022 · 4 comments
Labels
bug providers/protocol Potentially affecting the Providers Protocol and SDKs v1.3 Issues (primarily bugs) reported against v1.3 releases
Milestone

Comments

@apparentlymart
Copy link
Member

Terraform v1.3 is intending to introduce a new provider protocol capability where a provider which opts in will be asked to plan the destruction of any of its resource types. Previous versions of Terraform just always unilaterally generated a destroy plan on a provider's behalf, which prevented the provider from failing the plan with an error or generating warnings.

To avoid exposing existing provider implementations to a new situation they weren't designed to deal with, we designed this as an opt-in capability where the provider can report as part of its GetProviderSchema response that it supports the plan_destroy capability, in which case Terraform Core will call PlanResourceChange with a null new value in order to ask a provider to produce a destroy plan for any of its resource types.

Unfortunately it seems that either the hashicorp/tls is already opting in to this capability (even though it hasn't appeared in any Terraform CLI release yet) or Terraform Core is asking the provider to plan its destroy despite the capability not being set. This fails for tls_self_signed_cert because its planning function is not equipped to deal with the proposed new object being null and it fails like this:

2022-09-19T15:56:39.462-0700 [ERROR] provider.terraform-provider-tls_v4.0.2_x5: Response contains error diagnostic: diagnostic_severity=ERROR diagnostic_summary="Config Read Error" tf_req_id=8717c0f7-37c3-7a23-a00a-69385ca86623 tf_resource_type=tls_self_signed_cert tf_rpc=PlanResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.14.0/tfprotov5/internal/diag/diagnostics.go:55 diagnostic_attribute=AttributeName("validity_end_time") diagnostic_detail="An unexpected error was encountered trying to read an attribute from the configuration. This is always an error in the provider. Please report the following to the provider developer:

Missing attribute value, however no error was returned. Preventing the panic from this situation." tf_provider_addr=registry.terraform.io/hashicorp/tls @module=sdk.proto tf_proto_version=5.3 timestamp=2022-09-19T15:56:39.462-0700

From an end-user standpoint that looks something like the following:

╷
│ Error: Config Read Error
│ 
│   with tls_self_signed_cert.user,
│   on tls-example.tf line 18, in resource "tls_self_signed_cert" "user":
│   18: resource "tls_self_signed_cert" "user" {
│ 
│ An unexpected error was encountered trying to read an attribute from the configuration.
│ This is always an error in the provider. Please report the following to the provider
│ developer:
│ 
│ Missing attribute value, however no error was returned. Preventing the panic from this
│ situation.
╵

The following configuration seems to reproduce this with terraform apply followed by terraform destroy:

resource "tls_private_key" "user" {
  algorithm = "RSA"
}

resource "tls_self_signed_cert" "user" {
  private_key_pem = tls_private_key.user.private_key_pem

  subject {
    common_name  = "example.com"
    organization = "ACME Examples, Inc"
  }

  early_renewal_hours   = 4
  validity_period_hours = 8
  allowed_uses = [
    "key_encipherment",
    "digital_signature",
  ]
  is_ca_certificate = true
}

I've not yet diagnosed the root cause of this bug. I have to possible explanations in mind here, and so the first step will be determining which of these is true.

  1. The TLS provider is not announcing that it supports the plan_destroy capability but Terraform Core is asking it to plan destroy anyway. This would be the most ideal situation because the fix would be localized in Terraform Core and so we can address it before shipping v1.3.0 final.
  2. The TLS provider is announcing the plan_destroy capability even though it isn't actually capable of planning destroy. If this is true then the situation is messier because there's already at least one TLS provider release out there which announces it and so we'd likely need to change Terraform Core to ignore that incorrect announcement and use a different capability attribute to activate this feature instead. That would mean that any existing provider already shipped would never be asked to plan destroy, but later provider releases could still do so by opting in to the new capability.
@apparentlymart apparentlymart added bug providers/protocol Potentially affecting the Providers Protocol and SDKs v1.3 Issues (primarily bugs) reported against v1.3 releases labels Sep 19, 2022
@apparentlymart apparentlymart added this to the v1.3.0 milestone Sep 19, 2022
@apparentlymart
Copy link
Member Author

apparentlymart commented Sep 19, 2022

I've confirmed that the provider does seem to be opting in to being asked to plan destroy, although it's the plugin framework doing it on the provider's behalf: https://github.com/hashicorp/terraform-plugin-framework/blob/7541ab15654b00837015180ecdb7f439e604c6cf/internal/fwserver/server_getproviderschema.go#L29-L31

I initially thought we were lacking a check in Terraform Core but it turns out it was just one layer deeper than I expected:

if r.ProposedNewState.IsNull() && !capabilities.PlanDestroy {
resp.PlannedState = r.ProposedNewState
resp.PlannedPrivate = r.PriorPrivate
return resp
}

Since hashicorp/tls v4.0.2 is already released with this inconsistency in place (it announces that it supports planning destroy but it doesn't actually support planning destroy) I think we are faced with deciding between the following two options that both have annoying consequences:

  1. We could ship with Terraform v1.3.0 as in the release candidate, accepting that anyone already using hashicorp/tls v4.0.2 will be blocked from destroying their certificates until there's a new provider version available which fixes this problem (either by not announcing that it can plan destroy or by correctly handling the destroy plan request).
  2. We could retroactively renumber the plan_destroy capability to have a protobuf attribute number other than 1 -- and also potentially give it a new name in the schema to reduce confusion -- and thereby nullify the opt-in for any already released providers. This means that no providers already released would ever be asked to plan destroy, but once this inconsistency is addressed somehow providers can then opt in with the new capability flag instead and still get the benefit of this new feature.

Right now I find myself leaning towards option 2 because it's something that can be handled totally within this codebase and avoids creating a hazard where an existing provider release isn't compatible with a new Terraform Core release, even though technically it's the provider that is "incorrect" here. However, we'll need to verify that such a change won't impact an already-released provider that does correctly implement destroy planning and will then have that support retroactively revoked from it. My sense is that the likelihood of this is low because there hasn't yet been any stable release of Terraform Core which supports provider-planned destroy, and that any existing provider relying on it would end up treating the final v1.3.0 release as if it were a v1.2.x release; providers must already be able to handle the situation where older versions of Terraform Core don't ask at all.

@jbardin
Copy link
Member

jbardin commented Sep 20, 2022

The problem here appears to have been a bug in the provider framework which has since been patched. An invalid value was being passed to the plan modifier during destroy, so any attribute access within that value would result in the above error.
A patch release of the TLS provider is pending now.

At least within the HashiCorp associated organizations, the TLS provider appears to be the only one using this combination of functionality, so the problem is hopefully not widespread enough to warrant pulling the feature altogether.

@apparentlymart
Copy link
Member Author

The maintainers of the hashicorp/tls provider released a few hours ago version v4.0.3 which uses plugin framework v0.13.0 instead of v0.11.1. Plugin framework v0.12.0 contained the change which fixed this problem, from hashicorp/terraform-plugin-framework#475.

Because of this provider bug, hashicorp/tls v4.0.2 is known to be incompatible with Terraform v1.3 and later. Anyone who has encountered a message like the one I mentioned in the original issue comment above can upgrade to provider version v4.0.3 or later to fix the problem.

As @jbardin noted, we don't know of any other providers that have this problem, but we cannot see into the source code of privately-maintained providers and so it is possible that such a provider may exhibit a similar problem. If so, the resolution would be to upgrade your provider's dependency to at least Terraform Plugin Framework v0.12.0. If that doesn't resolve the problem, please open an issue in the Terraform Plugin Framework repository where we can investigate further.

As far as we can tell there is no change required in this repository, since Terraform Core seems to be behaving correctly and the framework bug which caused this error has now been resolved. Therefore I'm going to close this issue.

@apparentlymart apparentlymart closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug providers/protocol Potentially affecting the Providers Protocol and SDKs v1.3 Issues (primarily bugs) reported against v1.3 releases
Projects
None yet
Development

No branches or pull requests

2 participants