Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-empty plans when migrating existing optional+computed lists to Terraform Plugin Framework #883

Open
maastha opened this issue Nov 30, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@maastha
Copy link

maastha commented Nov 30, 2023

Module version

1.4.2

We are currently migrating existing Plugin-SDK based resources to Plugin Framework (Plugin protocol v6). In below resource, we have some optional+computed list block attributes where certain nested attributes if not configured/partially configured by the user in the config, the API/provider will still return those attributes. That response is currently persisted in the state.

In order to migrate these blocks to the Plugin Framework, we have tried using schema.ListNestedBlock but the returned plan after upgrade to the Framework-migrated resource is always non-empty. This will be a breaking change for our users.

What is the recommended non-breaking way to migrate list attributes such as these?

Relevant provider source code

Terraform Plugin SDK based schema for "advanced_configuration" block:

func ClusterAdvancedConfigurationSchema() *schema.Schema {
	return &schema.Schema{
		Type:       schema.TypeList,
		Optional:   true,
		Computed:   true,
		ConfigMode: schema.SchemaConfigModeAttr,
		MaxItems:   1,
		Elem: &schema.Resource{
			Schema: map[string]*schema.Schema{
				"default_read_concern": {
					Type:     schema.TypeString,
					Optional: true,
					Computed: true,
				},
				"default_write_concern": {
					Type:     schema.TypeString,
					Optional: true,
					Computed: true,
				},
				"fail_index_key_too_long": {
					Type:     schema.TypeBool,
					Optional: true,
					Computed: true,
				},
				"javascript_enabled": {
					Type:     schema.TypeBool,
					Optional: true,
					Computed: true,
				},
				"minimum_enabled_tls_protocol": {
					Type:     schema.TypeString,
					Optional: true,
					Computed: true,
				},
				"no_table_scan": {
					Type:     schema.TypeBool,
					Optional: true,
					Computed: true,
				},
				"oplog_size_mb": {
					Type:     schema.TypeInt,
					Optional: true,
					Computed: true,
				},
				"oplog_min_retention_hours": {
					Type:     schema.TypeInt,
					Optional: true,
				},
				........
			},
		},
	}
}

Terraform Plugin Framework migrated schema:

func clusterRSAdvancedConfigurationSchemaBlock() schema.ListNestedBlock {
	return schema.ListNestedBlock{
		NestedObject: schema.NestedBlockObject{
			Attributes: map[string]schema.Attribute{
				"default_read_concern": schema.StringAttribute{
					Optional: true,
					Computed: true,
				},
				"default_write_concern": schema.StringAttribute{
					Optional: true,
					Computed: true,
				},
				"fail_index_key_too_long": schema.BoolAttribute{
					Optional: true,
					Computed: true,
				},
				"javascript_enabled": schema.BoolAttribute{
					Optional: true,
					Computed: true,
				},
				"minimum_enabled_tls_protocol": schema.StringAttribute{
					Optional: true,
					Computed: true,
				},
				"no_table_scan": schema.BoolAttribute{
					Optional: true,
					Computed: true,
				},
				"oplog_min_retention_hours": schema.Int64Attribute{
					Optional: true,
				},
				"oplog_size_mb": schema.Int64Attribute{
					Optional: true,
					Computed: true,
				},
				..........
			},
		},
		Validators: []validator.List{
			listvalidator.SizeAtMost(1),
		},
		PlanModifiers: []planmodifier.List{
			listplanmodifier.UseStateForUnknown(),
		},
	}
}

Terraform Configuration Files

Use-case 1 (no blocks configured):

main.tf:
resource "mongodbatlas_cluster" "cluster-no-blocks" {
  project_id                                      = mongodbatlas_project.project-tf.id
  provider_name                                   = "AWS"
  name                                            = "tfCluster1"
  backing_provider_name                           = "AWS"
  provider_region_name                            = "US_EAST_1"
  provider_instance_size_name                     = "M10"
  auto_scaling_compute_enabled                    = false
  auto_scaling_compute_scale_down_enabled         = false
  provider_auto_scaling_compute_min_instance_size = "M10"
  provider_auto_scaling_compute_max_instance_size = "M20"
}

terraform.tfstate (including only concerned blocks here due to large resource config):
 "advanced_configuration": [
              {
                "default_read_concern": "",
                "default_write_concern": "",
                "fail_index_key_too_long": false,
                "javascript_enabled": true,
                "minimum_enabled_tls_protocol": "TLS1_2",
                "no_table_scan": false,
                "oplog_min_retention_hours": 0,
                "oplog_size_mb": 0,
                "sample_refresh_interval_bi_connector": 0,
                "sample_size_bi_connector": 0,
                "transaction_lifetime_limit_seconds": 0
              }
            ],
 "bi_connector_config": [
              {
                "enabled": false,
                "read_preference": "secondary"
              }
            ],

Use-case 2 (all blocks configured):

main.tf:
resource "mongodbatlas_cluster" "cluster-multi-region-all-blocks" {
  project_id   = mongodbatlas_project.project-tf.id
  name         = "cluster-test-multi-region"
  num_shards   = 1
  cloud_backup = true
  cluster_type = "REPLICASET"

  provider_name               = "AWS"
  provider_instance_size_name = "M10"

  advanced_configuration {        # block can be partially configured by user    
    minimum_enabled_tls_protocol = "TLS1_2"
    default_read_concern         = "available"
  }

  bi_connector_config {
    enabled         = false
    read_preference = "secondary"
  }


terraform.tfstate (including only concerned blocks here due to large resource config):
"advanced_configuration": [
              {
                "default_read_concern": "available",
                "default_write_concern": "",
                "fail_index_key_too_long": false,
                "javascript_enabled": true,
                "minimum_enabled_tls_protocol": "TLS1_2",
                "no_table_scan": false,
                "oplog_min_retention_hours": 0,
                "oplog_size_mb": 0,
                "sample_refresh_interval_bi_connector": 0,
                "sample_size_bi_connector": 0,
                "transaction_lifetime_limit_seconds": 0
              }
            ],
"bi_connector_config": [
              {
                "enabled": false,
                "read_preference": "secondary"
              }
            ],

Debug Output

Expected Behavior

After user upgrades to new provider version with the framework-migrated resource, user should not see any planned changes for optional+computed lists/set blocks when running terraform plan and not receive errors when running terraform apply.

Actual Behavior

On running terraform plan below plan was produced by Terraform:
Use-case 1 (no blocks configured):

 ~ update in-place

Terraform will perform the following actions:

  # mongodbatlas_cluster.cluster-no-blocks will be updated in-place
  ~ resource "mongodbatlas_cluster" "cluster-no-blocks" {
        id                                      = "Y2x1c3Rlcl9pZA==:NjU2ODhiMmYzZDI3MWYzZjg2MGI0NjQw-Y2x1c3Rlcl9uYW1l:dGZDbHVzdGVyMQ==-cHJvamVjdF9pZA==:NjRlY2MxNTkyNzVmMjM1OWY0Y2FlODEw-cHJvdmlkZXJfbmFtZQ==:QVdT"
        name                                    = "tfCluster1"
      ~ num_shards                              = 1 -> (known after apply)
        # (34 unchanged attributes hidden)

      - advanced_configuration {
          - javascript_enabled           = true -> null
          - minimum_enabled_tls_protocol = "TLS1_2" -> null
          - no_table_scan                = false -> null
          - oplog_min_retention_hours    = 0 -> null
        }

      - bi_connector_config {
          - enabled         = false -> null
          - read_preference = "secondary" -> null
        }

Use-case 2 (all blocks configured):

 ~ update in-place

Terraform will perform the following actions:

  # mongodbatlas_cluster.cluster-multi-region-all-blocks will be updated in-place
  ~ resource "mongodbatlas_cluster" "cluster-multi-region-all-blocks" {
        id                                      = "Y2x1c3Rlcl9pZA==:NjU2ODhlZTgzZDI3MWYzZjg2MGI0ZTY3-Y2x1c3Rlcl9uYW1l:dGYtbXVsdGktcmVnaW9uLWFsbC1ibG9ja3M=-cHJvamVjdF9pZA==:NjRlY2MxNTkyNzVmMjM1OWY0Y2FlODEw-cHJvdmlkZXJfbmFtZQ==:QVdT"
        name                                    = "tf-multi-region-all-blocks"
        # (32 unchanged attributes hidden)

      ~ advanced_configuration {
          + default_write_concern                = (known after apply)
          + fail_index_key_too_long              = (known after apply)
          ~ no_table_scan                        = false -> (known after apply)
          - oplog_min_retention_hours            = 0 -> null
          + oplog_size_mb                        = (known after apply)
          + sample_refresh_interval_bi_connector = (known after apply)
          + sample_size_bi_connector             = (known after apply)
          + transaction_lifetime_limit_seconds   = (known after apply)
            # (3 unchanged attributes hidden)
        }


Steps to Reproduce

  1. run terraform init for provider v.X (contains-Plugin-SDK-based-resource-implementation) for above resource
  2. run terraform plan
  3. run terraform apply
  4. run terraform plan again -> plan returns No changes.
  5. update provider version to v.Y (contains-Plugin-Framework-based-resource-implementation) for above resource, run terraform init --upgrade
  6. run terraform plan -> Plan is not empty

References

Included advanced_configuration schema in this issue. Other block schemas mentioned in the example are:

@zliang-akamai
Copy link
Contributor

zliang-akamai commented Dec 8, 2023

I think we might have a same issue here. Did you try to assign the block value to the state in Read function?

We found that Framework doesn't support optional + computed block, thus the block can't be placed into state in Create function of the resource, when it's not configured by the user.

If you put the block value into state in Read function, during the refresh, Terraform will consider it as a managed infra change (thinking about you made a change to a Terraform managed infra outside of Terraform operations, e.g. direct API call from cURL), instead of a block is being computed and then give you a new plan to change the infra state back to what's stored in Terraform.

We couldn't find a workaround or a solution here neither, and I hope we can get Terraform team to take a look at this.

@bflad
Copy link
Member

bflad commented Dec 11, 2023

Hi @maastha 👋 Thank you for submitting this and sorry you are running into trouble here. Terraform in general is very particular about data consistency and proper value planning once a resource is migrated to the framework, so it may involve teaching the provider code much more about the API behaviors associated with each attribute. Let's see if we cannot troubleshoot this a little further.

For starters, can you run Terraform with warning logs enabled using the SDK version of the resource first? e.g. TF_LOG=WARN terraform apply. What we would be looking for here is whether Terraform thought the SDK-based resource wasn't already following Terraform's data consistency rules. While the migration documentation around checking this type of issue talks about a simpler root-level string attribute, we can hopefully use that information to guide us further in this situation.

Next, we could use some additional information from you about the API responses. Are the left-hand values noted in the "no configuration blocks" plans always returned from the API with these values?

      - advanced_configuration {
          - javascript_enabled           = true -> null
          - minimum_enabled_tls_protocol = "TLS1_2" -> null
          - no_table_scan                = false -> null
          - oplog_min_retention_hours    = 0 -> null
        }

      - bi_connector_config {
          - enabled         = false -> null
          - read_preference = "secondary" -> null
        }

If so, this likely means we need to take a look at resource default attribute value implementations for each. That might help with the "all blocks configured" case. There is the additional caveat here that defaults may need to be added at the list or object level as well, to cover the "no configuration block" case, as defaults of nested object attributes are only consulted if their wrapping object exists already.

Otherwise, as @zliang-akamai alludes to above, the answer may actually be to omit saving certain information to the state when it is not present in the configuration. It is likely that Terraform would have given practitioners errors if they did happen to directly reference those block attributes elsewhere in their configuration since the value would have changed between plan and apply. (In fact, I'm a little surprised that Terraform isn't already raising an error, as I would have expected a block without configuration to error if it was saved with data.)

Finally, it is worth mentioning that when migrating from the SDK to the framework, Terraform may hide very particular plan differences (e.g. empty string to null) because not doing so would cause very noisy plans for practitioners with the older SDK. If you ever see a planned resource change with all the attributes "hidden", then hashicorp/terraform#31887 is an upstream bug report relating to this. Saving the plan file and inspecting it in JSON format or implementing a special plan check in terraform-plugin-testing that automatically compares all resource change before/after values for debugging is handy in those cases. I thought we had an testing feature request to consider implementing this debugging plan check somewhere but I'm having trouble finding it at the moment -- but can recreate it if needed.


To add a little bit more context about this confusing migration space, I'll also mention the blocks with computed fields migration documentation. What I care to talk about there is that the terraform-plugin-sdk did some gymnastics in certain cases to make the single schema.Schema type "work" for both attributes and blocks, in particular that
computed-only blocks were automatically converted to object attributes. In Terraform and HCL, blocks are inherently much different than attributes, although the provider development SDKs try to paper over that detail. There actually is/was no such thing as a "computed block" in Terraform. The framework directly exposes this reality, but understandably it can be difficult to unravel the implications of that implementation detail when migrating from the SDK. Hopefully we can get to the bottom of this for you though and get a workable path forward.

@zliang-akamai
Copy link
Contributor

Hi @bflad, thanks for the detailed answer! I am still hoping to migrate Computed and Optional TypeList in SDKv2 to Framework. What do you think about migrating it to be ListAttribute with Optional and Computed setting to true in the framework?

Something like:

map[string]schema.Attribute{
    "example_nested_block": schema.ListAttribute{
        ElementType: types.ObjectType{
            AttrTypes: map[string]attr.Type{
                "example_block_attribute":  types.StringType,
            },
        },
        Computed:            true,
        Optional:               true
    }
}

But the problem is we can't set a full schema to items of AttrTypes, e.g., we can't specify example_block_attribute as a required attribute or assign it a default value if the object (practically a block in the config file) is configured by the user.

Since SDKv2 already did some gymnastics, I think many providers, including ours, will encounter issue similar to use case 1 in this issue if they are migrating to framework. It will be very nice if there are some ways Terraform can assist providers to make a such breaking change, if direct support of computed and optional block is not possible. Maybe something like state upgrade but for user config? I don't know..

Reference:

@bflad
Copy link
Member

bflad commented Dec 12, 2023

Thanks so much, @zliang-akamai. Can you confirm whether or not Terraform is raising warning logs in your case for the SDK-based resource and whether it was also using the quirky ConfigMode: schema.SchemaConfigModeAttr like the original issue description? Presumably those logs should be raised in the case of the block not being configured, but the resource setting its state. If that is not happening, then maybe we can figure out something.

I would think the list of objects solution would only work for Computed: true without Optional: true blocks, since that case doesn't need to carry any additional schema information (optional, required, validation, etc.). In the case a practitioner does want to configure it, a list of objects attribute would typically require switching the Terraform configuration from block syntax example_nested_block { ... to attribute syntax example_nested_block = {. There is some special handling in Terraform core to cover attributes as blocks behavior which does muddy that typical configuration requirement though.

@maastha
Copy link
Author

maastha commented Dec 12, 2023

Thanks a lot for the response @bflad

defaults may need to be added at the list or object level as well

  1. I believe we are supposed to migrate to Blocks (to avoid change in "= []" syntax), is it possible to set defaults at object level for ListNestedBlock? If so, can you share an example? Also, assuming this works with the drift detection, could this still throw provider inconsistency error on apply?

  2. As requested here are WARN logs if no advanced_configuration or bi_connector_config block is configured. I see the inconsistency rules called out here, is this what you were referring to?

[WARN]  Provider "registry.terraform.io/mongodb/mongodbatlas" produced an invalid plan for mongodbatlas_advanced_cluster.replicaset-cluster-no-adv-config, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .bi_connector_config: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead

WARN logs if advanced_configuration block is configured partially or even an empty block({}):

[WARN]  Provider "registry.terraform.io/mongodb/mongodbatlas" produced an invalid plan for mongodbatlas_advanced_cluster.sharded-cluster-empty-adv-config, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .advanced_configuration: planned value cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"default_read_concern":cty.UnknownVal(cty.String), "default_write_concern":cty.UnknownVal(cty.String), "fail_index_key_too_long":cty.UnknownVal(cty.Bool), "javascript_enabled":cty.UnknownVal(cty.Bool), "minimum_enabled_tls_protocol":cty.UnknownVal(cty.String), "no_table_scan":cty.UnknownVal(cty.Bool), "oplog_min_retention_hours":cty.NullVal(cty.Number), "oplog_size_mb":cty.UnknownVal(cty.Number), "sample_refresh_interval_bi_connector":cty.UnknownVal(cty.Number), "sample_size_bi_connector":cty.UnknownVal(cty.Number), "transaction_lifetime_limit_seconds":cty.UnknownVal(cty.Number)})}) does not match config value cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"default_read_concern":cty.NullVal(cty.String), "default_write_concern":cty.NullVal(cty.String), "fail_index_key_too_long":cty.NullVal(cty.Bool), "javascript_enabled":cty.NullVal(cty.Bool), "minimum_enabled_tls_protocol":cty.NullVal(cty.String), "no_table_scan":cty.NullVal(cty.Bool), "oplog_min_retention_hours":cty.NullVal(cty.Number), "oplog_size_mb":cty.NullVal(cty.Number), "sample_refresh_interval_bi_connector":cty.NullVal(cty.Number), "sample_size_bi_connector":cty.NullVal(cty.Number), "transaction_lifetime_limit_seconds":cty.NullVal(cty.Number)})})
  
      - .bi_connector_config: attribute representing nested block must not be unknown itself; set nested attribute values to unknown instead

@maastha
Copy link
Author

maastha commented Dec 12, 2023

@bflad
3. I also tried out filtering out attributes not in the request.State. In this case, I saw below on terraform plan, there's no attribute change detected but not sure why this diff is still detected, could you help understanding this?:

      ~ replication_specs {
            id           = "657889449387ff62a76e2cb5"
            # (3 unchanged attributes hidden)

          ~ region_configs {
                # (3 unchanged attributes hidden)

                # (2 unchanged blocks hidden)
            }
          ~ region_configs {
                # (3 unchanged attributes hidden)

              ~ analytics_specs {
                  + disk_iops     = (known after apply)
                    # (2 unchanged attributes hidden)
                }

              ~ electable_specs {
                  + disk_iops     = (known after apply)
                    # (2 unchanged attributes hidden)
                }
            }
        }
      ~ replication_specs {
            id           = "657889449387ff62a76e2cb6"
            # (3 unchanged attributes hidden)

          ~ region_configs {
                # (3 unchanged attributes hidden)

                # (2 unchanged blocks hidden)
            }
        }

        # (2 unchanged blocks hidden)
    }
  1. Also, note that disk_iops in above does have a planmodifier here added but it still shows up with (known after apply), how should this be handled?? disk_iops is Optional, but existing state has "disk_iops": 0 but disk_iops is not configured. I am filtering out everything not in the request.State in the Read() handler

Reference:

  replication_specs { # zone n2
    zone_name  = "zone n2"
    num_shards = 2 # 2-shard Multi-Cloud Cluster

    region_configs { # shard n1 
      electable_specs {
        instance_size = "M10"
        node_count    = 3
      }
      analytics_specs {
        instance_size = "M10"
        node_count    = 1
      }
      provider_name = "AWS"
      priority      = 7
      region_name   = "EU_WEST_1"
    }
  }
  • existing .tfstate of second block with id="657889449387ff62a76e2cb6":
   {
                "container_id": {
                  "AWS:EU_WEST_1": "657889449387ff62a76e2cc7"
                },
                "id": "657889449387ff62a76e2cb6",
                "num_shards": 2,
                "region_configs": [
                  {
                    "analytics_auto_scaling": [],
                    "analytics_specs": [
                      {
                        "disk_iops": 3000,
                        "ebs_volume_type": "",
                        "instance_size": "M10",
                        "node_count": 1
                      }
                    ],
                    "auto_scaling": [],
                    "backing_provider_name": "",
                    "electable_specs": [
                      {
                        "disk_iops": 3000,
                        "ebs_volume_type": "",
                        "instance_size": "M10",
                        "node_count": 3
                      }
                    ],
                    "priority": 7,
                    "provider_name": "AWS",
                    "read_only_specs": [],
                    "region_name": "EU_WEST_1"
                  }
                ],
                "zone_name": "zone n2"
              }
           

@bflad
Copy link
Member

bflad commented Dec 13, 2023

Thanks so much, @maastha. I really appreciate you working through this with me. I apologize that I also neglected to mention a newer SDK feature which should promote those warning logs to errors in Terraform to make them a little bit more easier to see during acceptance testing:

schema.Resource{
    // ... other fields as necessary ...
    EnableApplyLegacyTypeSystemErrors: true,
    EnablePlanLegacyTypeSystemErrors:  true,
}

Just please be careful to not release the SDK based resource with those enabled since there are known errors. We had added that to the top of the overall framework migration steps, but I'm not sure its linked deeper in the migration documentation, so I'll take a look at seeing where it might make sense elsewhere if its not already.

I believe we are supposed to migrate to Blocks (to avoid change in "= []" syntax), is it possible to set defaults at object level for ListNestedBlock? If so, can you share an example? Also, assuming this works with the drift detection, could this still throw provider inconsistency error on apply?

What I meant to allude to was that defaults may be applicable for the planned values to match the API behavior. This should work as expected for underlying attributes, but the framework intentionally prevents developers from implementing the "native" default value handling for blocks (e.g. the literal Default field present in attributes). From Terraform's perspective, there should be no such thing as a default for a block; the block either exists in configuration or does not. The prior SDK incorrectly tried to work around that. Technically though, the framework does expose block/object plan modifiers (mainly there for triggering resource replacement) and resource-level plan modification. I've personally never tried and I haven't seen a working example of trying to change plans for blocks themselves. If I had to venture an upfront guess though, I'm guessing that trying to modify the plan to include missing blocks (the list or object) would likely raise errors.

Let's take a step back though with the below new knowledge --

I see the inconsistency rules called out here, is this what you were referring to?

Indeed, which is a good and bad thing. It is good that we are learning that the prior resource implementation was not meeting Terraform's data consistency rules, but of course bad because these need to be resolved in some way as part of this migration as this is an unavoidable problem.

Given that the existing resource is violating those rules, I anticipate the deeper we look into this, the more likely that one of the following will need to happen:

  1. Fixing the SDK implementation, which likely would be very difficult and may not be fully possible given its existing (and mostly frozen) internal details, may require a planned difference to occur. Upfront, honestly, I do not think this is great path.
  2. Migrating to a framework implementation may require a planned difference to occur so existing states can be fixed to match existing configurations. There may be ways we can try to alleviate this via defaults and/or a state upgrade as part of the framework implementation, but I cannot guarantee its possible to get a fully empty plan given the complexity of this resource without a lot of effort. I believe this is the desirable effort per the original bug report description, but it may still leave some technical debt within the resource implementation and may or may not need to include somewhat-breaking changes for uncommon use cases.
  3. Migrating to a framework implementation may require a breaking change (of some sort) if attempting to implement exactly the intent of the SDK implementation. I know this is not ideal by any means, but I think it is worth discussing what this might actually look like, in case you might find the benefits worth any potential practitioner/provider developer burden.

There are quite a few things you are asking about, so maybe let's try to start at a high level for the intent of this configuration/data in the resource. Particularly since you mention each of them, let's talk about advanced_configuration, bi_connector_config, and replication_specs. I'm asking these questions because there is a disconnect from what was possible to implement in the prior SDK and Terraform's rules/capabilities. It's very easy to get lost in the sea of implementation details across the SDK/framework, the API, and Terraform itself.

  • What is the desired behavior of these from a practitioner perspective? Are they intended to be configured by practitioners and/or are they intended to be referenced by practitioners in other parts of the configuration? It seems immediately apparent that practitioners need to configure them, but what about referencing the values elsewhere?
  • What is the behavior of these from an API perspective? Does the API require them? Are they optional and the API will set defaults? Does the API change the values over time? If those values do change, but they are not configured, is it important for practitioners to see the drift?
  • What is the data structure of these from an API perspective? Are advanced_configuration and bi_connector_config a single API object and replication_specs an unordered and unique set of API objects?

I'm going to try to summarize my understanding so far, but please correct me where necessary:

  • advanced_configuration is an optional single API object which practitioners can optionally set certain details. Each underlying configuration value may have static (not dependent on other configuration details) defaults provided by the API if not given. Practitioners likely would never reference these values elsewhere in their configuration and I presume that if they were trying to do that today, they would see the "confusing errors from downstream operations" mentioned in the warning logs.
  • bi_connector_config is an optional single API object which practitioners can optionally set certain details. Each underlying configuration value may have static (not dependent on other configuration details) defaults provided by the API if not given. Practitioners likely would never reference these values elsewhere in their configuration and I presume that if they were trying to do that today, they would see the "confusing errors from downstream operations" mentioned in the warning logs.
  • replication_specs (using some details from Non-empty plans when migrating existing optional+computed nested Set blocks to Terraform Plugin Framework #884 and above) is an optional set of API objects which practitioners must set certain details and can optionally set others. The API will set a computed identifier for each object and only some underlying configuration values may have defaults provided by the API if not given. I'm not sure if practitioners would ever reference these values elsewhere in their configuration.

So in the prior SDK implementation, you had to use what was available:

  • These configurations were optional according to the API, so Optional: true was used.
  • The resource always set the data for these from the API responses, which likely caused confusing behaviors without Computed: true being also set. Since the data was always refreshed, this enabled the resource to perform a level of drift detection and suggest remediation for practitioner configured values.
  • For advanced_configuration and bi_connector_config: The SDK didn't expose single nested block or object attribute support, so the necessary and ecosystem conventional choice was to use list nested blocks that validated there was only one block. The SDK allowed this even though Computed: true was set, since the SDK tried to paper over Terraform not supporting that implementation detail of blocks, by making it an attribute rather than a block. This was further codified by forcing ConfigMode: schema.SchemaConfigModeAttr. Terraform itself also papered over configurations using block syntax over attribute syntax with the "special logic" mentioned in my prior comment.

Now if this was a brand new resource implementation, I would recommend that all these be implemented by leveraging the protocol version 6 (Terraform 1.0+) nested attribute feature, which essentially gives the best solution everywhere since all the data can be optionally configured or computed by the API, and therefore always safely referenced in configurations. All the data would be refreshable as well for drift detection. In particular:

  • Both advanced_configuration and bi_connector_config being an optional+computed single nested attribute with a default object matching the API. The underlying attributes could also be optional+computed with individual default values matching the API to support partial object configuration.
  • replication_specs being an optional set nested attribute with a computed id attribute and other (nested) attributes where applicable.

However, that is quite different than what exists today from a practitioner standpoint. Practitioners would be burdened to upgrade to Terraform 1.0+ if they haven't already and put an equals sign anywhere a block was being configured since its now a (nested) attribute. As a goal of this migration appears to be to minimize practitioner burden the (more common 😄) path of trying to re-implement the existing schema was taken. Since I do not know all the intricacies of this API and its behaviors nor every permutation of how the prior SDK might save a zero-value (e.g. "" for string attributes/0 for integers) into state instead of a null value (a common source of pain for framework migrations to cause unexpected plan differences, and likely related to what is happening with replication_specs) nor every permutation of how the prior SDK might inject unknown values during planning, we will likely need to continue discussing this at a high level with hand-wavy troubleshooting and potential ideas/solutions.

I'm having a hard time determining if you got further with advanced_configuration and bi_connector_config with your "filtering attributes" comment. If the goal was to prevent Terraform from showing differences for the unconfigured blocks, then not refreshing them based on whether or not they are present the prior state seems like it would be a decent solution.

So to potentially offer a tangible potential advanced_configuration/bi_connector_config solution, a setup that follows Terraform's rules I would anticipate is:

  • Using schema.ListNestedBlock (technically you could try schema.SingleNestedBlock, but that has "breaking" configuration considerations in certain situations, so trying to minimize differences)
  • For each individual underlying attribute, setting Optional, Computed, and/or Defaults to match API behaviors, to catch cases where the blocks are partially configured. If the API does not have a default, just use Optional. If the API has an indeterminable default, Optional and Computed.
  • In Create/Update, use the plan values to send requests to API and only store the plan value in state (e.g. do not try to set the state with API values)
  • In Read, if drift detection of the configured values is necessary, check each block's prior state for null, and if not null refreshing its underlying values from the API.

This does mean that Terraform can no longer perform drift detection if the block is unconfigured, but that is a requirement for using blocks. If you don't want that restriction, the implementation will need to change in some way. We can discuss options down that path if you would like.

Figuring out what to do with existing state values if they were caused by the prior SDK saving the zero-value we can discuss once the general attribute/block handling is stable for new resources.

I think the replication_specs issue(s) might be better suited in a separate thread (e.g. #884) since their causes are likely unrelated to the list block issue(s). Briefly, it might be related to a thing I mentioned in my first comment:

Finally, it is worth mentioning that when migrating from the SDK to the framework, Terraform may hide very particular plan differences (e.g. empty string to null) because not doing so would cause very noisy plans for practitioners with the older SDK. If you ever see a planned resource change with all the attributes "hidden", then hashicorp/terraform#31887 is an upstream bug report relating to this. Saving the plan file and inspecting it in JSON format or implementing a special plan check in terraform-plugin-testing that automatically compares all resource change before/after values for debugging is handy in those cases. I thought we had an testing feature request to consider implementing this debugging plan check somewhere but I'm having trouble finding it at the moment -- but can recreate it if needed.

@maastha
Copy link
Author

maastha commented Dec 14, 2023

Thank you so much @bflad for such a detailed response on this, really appreciate this, and thanks for working with us on this, we plan to evaluate all your suggestions.

We have some follow-up questions:

  1. Is there any possibility of having some mechanism where we continue to allow users to use existing blocks in configs but translate those to attributes somehow? I believe this will be similar to something along the lines of ConfigMode: schema.SchemaConfigModeAttr.
  2. Are there any plans for new migration features to help with such scenarios? Or even to support migration to Framework in general?
  3. In Create/Update, use the plan values to send requests to API and only store the plan value in state (e.g. do not try to set the state with API values)

In the above point, to make sure we understand correctly, the end result would effectively remove computed state that users may be referencing. Right?
Therefore, as part of the solution we would have to provide users an alternative way to access these computed values. And as users might have to adjust their configuration for these referenced values it would be considered a breaking change. Is that correct understanding?

  1. In the event we introduce a computed-only attribute, is it possible for users to get drift detection if anything changes from the backend (provider/API response)? My understanding is they will only see a (known after apply) for that attribute during plan and if anything changes, they wouldn't know what exactly changed in that computed attribute upon apply, is that correct?
    If yes, is there a workaround for this? So that users can be made aware that some nested attribute in that computed-only attribute has been updated/or will be updated upon apply?

@maastha
Copy link
Author

maastha commented Dec 18, 2023

Update:
@bflad
We tried approach to basically filter out un-configured values and not persist those in the state using suggestions from your last response:

  • For each individual underlying attribute, setting Optional, Computed, and/or Defaults to match API behaviors, to catch cases where the blocks are partially configured. If the API does not have a default, just use Optional. If the API has an indeterminable default, Optional and Computed.
  • In Create/Update, use the plan values to send requests to API and only store the plan value in state (e.g. do not try to set the state with API values)
  • In Read, if drift detection of the configured values is necessary, check each block's prior state for null, and if not null refreshing its underlying values from the API.
  1. This creates a one-time drift detection when the user would upgrade to the new provider version with framework-migrated resource. Correct? Or could you suggest alternatives to prevent that?
    Reason for that is that unfortunately the Update API for this resource isn't exactly idempotent and can sometimes result in errors when attempting to update some of these lists, hence this does not seem to be working for us.

  2. Another approach I considered is introducing additional computed ListNestedAttributes that store the complete API response while we keep existing block attributes to only store the planned/config data but my guess is this would also require a one-time update as it's essentially gets us to the same Update() API issue as in the last point.

  3. Third alternative I can think of is to have users re-import the resource upon upgrading the provider. But even with this scenario, users CANNOT simply do a terraform state rm and then re-import, because again during import the provider cannot figure out what the user might have in their Config, so I believe this will not work out either.

Please let us know if there are any other options we could look into.

We do think a StateUpgrader-like functionality or simply exposing the Config object in the resource.UpgradeStateRequest would be very useful for similar migration scenarios so we possibly would have been able to avoid the one-time drift detected in # 1 and # 2 above, but unfortunately we can't access the user's Config in the StateUpgrader resource.UpgradeStateRequest to filter out any un-configured attributes from the state.

@bflad
Copy link
Member

bflad commented Dec 19, 2023

This creates a one-time drift detection when the user would upgrade to the new provider version with framework-migrated resource. Correct? Or could you suggest alternatives to prevent that?
Reason for that is that unfortunately the Update API for this resource isn't exactly idempotent and can sometimes result in errors when attempting to update some of these lists, hence this does not seem to be working for us.

It's unfortunately difficult for me to access this without seeing what you're doing/seeing (e.g. prior state, config, and plan) and understanding the exact behaviors of the API. What are "some of these lists", do you mean the advanced_configuration and bi_connector_config list blocks? I'll admit I'm a little perplexed how the update API matters in this case as it seems your goal is to avoid a plan difference, which would avoid the resource update being called, and would therefore avoid any update API calls.

Beyond using the "human-readable" plan output, in the plan JSON output you should be able to compare the before and after data to determine what Terraform thinks is an actual difference. If those updates are differences such as empty strings (caused by the prior SDK) to null, then potentially adding an empty string default value on the attribute can preserve the prior SDK behavior and hide the difference for practitioners that did not configure a value. You'll just need to account for empty string handling in your resource logic in addition to null handling.

We do think a StateUpgrader-like functionality or simply exposing the Config object in the resource.UpgradeStateRequest would be very useful for similar migration scenarios so we possibly would have been able to avoid the one-time drift detected in # 1 and # 2 above, but unfortunately we can't access the user's Config in the StateUpgrader resource.UpgradeStateRequest to filter out any un-configured attributes from the state.

You can try submitting a feature request upstream in Terraform to consider exposing this across the protocol, since that is not something that could be implemented in the framework without it existing there. Please note though that I'm not sure how well that data fits in with the intended design for the RPC (mainly for upgrading existing state to match breaking schema changes) and even if it was implemented today, there would now be some expectation on your practitioners to be on the Terraform version (1.8+ as of this writing, since 1.7 is considered feature complete as its entering release candidate very shortly) that passes the additional configuration information across the protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants