Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.14.2 Optional HeisenBug #27295

Closed
dolpsdw opened this issue Dec 16, 2020 · 9 comments · Fixed by #28116
Closed

0.14.2 Optional HeisenBug #27295

dolpsdw opened this issue Dec 16, 2020 · 9 comments · Fixed by #28116
Assignees
Labels
bug confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code upstream

Comments

@dolpsdw
Copy link

dolpsdw commented Dec 16, 2020

Terraform Version 0.14.2

Bug only appears the first time you do Terraform Validate, after Terraform Init.
(The second terraform Validate will not thorw error at validation and behaves normal)
Module Definition

terraform {
  experiments = [module_variable_optional_attrs]
}
variable "advanced_config" {
  type = map(object({
    transition = optional(set(object({
      days          = number
      storage_class = string
      date          = optional(string)
    })))
    expiration = optional(object({
      days                         = number
      date                         = optional(string)
      expired_object_delete_marker = optional(bool)
    }))
  }))
  default = {}
}
provider "aws" {
  region = "eu-central-1"
}
resource "aws_s3_bucket" "managed_bucket" {

  dynamic "lifecycle_rule" {
    for_each = var.advanced_config
    content {
      enabled = true
      id      = lifecycle_rule.key //replace(,"/[^\\w=.,@+-]/", "_")
      prefix  = lifecycle_rule.key
      dynamic "transition" {
        for_each = lifecycle_rule.value.transition != null ? lifecycle_rule.value.transition : toset([])
        iterator = t
        content {
          days          = t.value["days"]
          storage_class = t.value["storage_class"]
          date          = t.value["date"]
        }
      }
      dynamic "expiration" {
        for_each = lifecycle_rule.value.expiration != null ? [lifecycle_rule.value.expiration] : []
        iterator = e
        content {
          days                         = e.value["days"]
          date                         = e.value["date"]
          expired_object_delete_marker = e.value["expired_object_delete_marker"]
        }
      }
    }
  }

}

Consumer

# S3 bucket for code distribution
module "s3_code_distro" {
  source = "./module"
  # Enable lifecyle for code distribution
  advanced_config = {
    "lambda/release" = {
      transition = toset([{
        days          = 30
        storage_class = "STANDARD_IA"
      }])
    }
    "glue/releases" = {
      transition = toset([{
        days          = 30
        storage_class = "STANDARD_IA"
      }])
    }
    "infra" = {
      transition = toset([{
        days          = 30
        storage_class = "STANDARD_IA"
      }])
    }
    "lambda/snapshots" = {
      expiration = {
        days = 15
      }
    }
    "glue/snapshots" = {
      expiration = {
        days = 15
      }
    }
  }

}

Our pipeline always do clean docker image 0.14.2 then run terraform init and terraform validate.
To reproduce the bug localy i hae to rename the ROOT folder of the main module: example/main.tf -> example2/main.tf

After renaming runing terraform validate, fail with
The given value is not suitable for child module variable "advanced_config" defined at .terraform\modules\regional-eu-central-1.code_distribution.s3_code_distro\s3_vars.tf:96,1-27: object is required.
And its because the snapshots folders does not have the optional transition set

    "lambda/snapshots" = {
      expiration = {
        days = 15
      }
    }
    "glue/snapshots" = {
      expiration = {
        days = 15
      }
    }

The following code is currently not failing:
Consumer

# S3 bucket for code distribution
module "s3_code_distro" {
  source = ".\moduleDef"

  aws_region  = data.aws_region.current.name
  environment = var.environment
  prefix      = "prefix"
  name        = "code-distribution"


  # Enable lifecyle for code distribution
  advanced_config = {
    "lambda/release" = {
      transition = [{
        days          = 30
        storage_class = "STANDARD_IA"
      }]
    }
    "glue/releases" = {
      transition = [{
        days          = 30
        storage_class = "STANDARD_IA"
      }]
    }
    "infra" = {
      transition = [{
        days          = 30
        storage_class = "STANDARD_IA"
      }]
    }
    "lambda/snapshots" = {
      transition = toset([])
      expiration = {
        days = 15
      }
    }
    "glue/snapshots" = {
      transition = toset([])
      expiration = {
        days = 15
      }
    }
  }

}

Edit: Update Module Def and consumer code.
I tested on my home with terraform 0.14.0 -beta2 the bug is there aswell.

Atacched are the logs in trace Mode.

TerraformInitTrace.txt
TerraformValidate1Trace.txt
TerraformValidate2Trace.txt

Maybe when the graph is done the next validate process dont fail.

@dolpsdw dolpsdw added bug new new issue not yet triaged labels Dec 16, 2020
@mildwonkey
Copy link
Contributor

Hi @dolpsdw ,
Thanks for reporting this, and I'm sorry you're experiencing this unexpected behavior! I'd like to look into this, but I'm not able to reproduce the issue with the example configuration (the "module" and "consumer" files) you've provided. Can you update this issue with a configuration that reproduces the issue?

This is optional and up to you, but if you'd like you can check out our issue reproduction repository and update the configuration in the 27295 branch. Updating the original issue comment here is also just fine.

@mildwonkey mildwonkey added waiting for reproduction unable to reproduce issue without further information waiting-response An issue/pull request is waiting for a response from the community labels Dec 16, 2020
@dolpsdw
Copy link
Author

dolpsdw commented Dec 16, 2020

@mildwonkey
Update the code to be more copy friendly.

Added Trace Logs and also happening on 0.14.0 beta2

@ghost ghost removed the waiting-response An issue/pull request is waiting for a response from the community label Dec 16, 2020
@mildwonkey
Copy link
Contributor

Thank you for updating the example @dolpsdw , I'm able to reproduce the issue now, and confirm that it is still present in the master branch.

@mildwonkey mildwonkey added confirmed a Terraform Core team member has reproduced this issue and removed new new issue not yet triaged labels Dec 17, 2020
@mildwonkey mildwonkey self-assigned this Dec 17, 2020
@dolpsdw
Copy link
Author

dolpsdw commented Dec 17, 2020

Ok i thought that using transition = toset([]) will fix the bug.
Changin the image to 0.14.0-rc1 (from amazon aws public gallery) broke the pipeline.

Changing back to 0.14.2 from docker hub fails first time but now is working?.
HeisenBug...

@mildwonkey
Copy link
Contributor

mildwonkey commented Dec 17, 2020

For what it's worth, the error with terraform validate is flapping for me - one out of every 4(ish) runs fails, I don't need to make any changes to the configuration.

The bug appears to be in the upstream go-cty library. I'm decently confident that terraform is sending the correct type to convert, here:

val, convErr = convert.Convert(val, wantType)
if convErr != nil {
diags = diags.Append(&hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Invalid value for module argument",
Detail: fmt.Sprintf(
"The given value is not suitable for child module variable %q defined at %s: %s.",
name, n.Config.DeclRange.String(), convErr,
),
Subject: expr.Range().Ptr(),
})
// We'll return a placeholder unknown value to avoid producing
// redundant downstream errors.
val = cty.UnknownVal(wantType)
}

But periodically, by the time we got to conversionCheckMapElementTypes, we've lost that one of the Objects is actually supposed to be an ObjectWithOptionalAttrs:

https://github.com/zclconf/go-cty/blob/251ac0b66ee75102f9e2dd88a1ae1743d3dd421f/cty/convert/conversion_collection.go#L517-L535

So on one run, that function receives this as the expected type:

cty.Object(map[string]cty.Type{"expiration":cty.Object(map[string]cty.Type{"date":cty.String, "days":cty.Number, "expired_object_delete_marker":cty.Bool}), "transition":cty.Set(cty.ObjectWithOptionalAttrs(map[string]cty.Type{"date":cty.String, "days":cty.Number, "storage_class":cty.String}, []string{"", "date"}))})

vs this:

cty.Object(map[string]cty.Type{"expiration":cty.Object(map[string]cty.Type{"date":cty.String, "days":cty.Number, "expired_object_delete_marker":cty.Bool}), "transition":cty.Set(cty.Object(map[string]cty.Type{"date":cty.String, "days":cty.Number, "storage_class":cty.String}))})

The difference is in the "transition":cty.Set(...).

I'm not sure where the problem is coming from yet, just where the error occurs. Since this is a bug in the upstream library with an experimental feature, fixing it is a somewhat lower priority than other current bug work, but hopefully this explanation helps the next engineer to look at this.

@mildwonkey mildwonkey added explained a Terraform Core team member has described the root cause of this issue in code upstream and removed waiting for reproduction unable to reproduce issue without further information labels Dec 17, 2020
@mildwonkey mildwonkey removed their assignment Feb 1, 2021
@pst
Copy link

pst commented Feb 26, 2021

I understand module_variable_optional_attrs is an experimental feature. But it is provided as an experiment so that people can use it and provide feedback. Randomly having 3 out of 4 Terraform runs fail makes the feature impossible to try, and hence provide feedback on.

Since this issue is still affecting v0.14.7 and the majority of the contributors to the upstream library seem to be Hashicorp employees, I was wondering if there was any chance this would get attention to be fixed soon?

Please consider this a friendly plea, not a demand in any way.

To hopefully contribute to getting this resolved: My testing suggests the error only happens for a optional(list(object({...}))). See example below. If config_map_generator was not commented out, this configuration would cause the random errors.

configuration = {
    apps = {
      namespace = "test"

      common_annotations = {
        test_annotation = true
      }

      common_labels = {
        test_label = true
      }

      crds = ["test1", "test2"]

      // lists of objects cause the issue
      #config_map_generator = [
      #  {
      #    name      = "test"
      #    namespace = "argocd"
      #    literals = [
      #      "KEY1=local.var1"
      #    ]
      #  }
      #]

    }
    ops = {}
    loc = {}
  }

Here's the variable definition for completeness.

variable "configuration" {
  type = map(object({
    common_annotations = optional(map(string))
    common_labels      = optional(map(string))
    components         = optional(list(string))
    config_map_generator = optional(list(object({
      name      = optional(string)
      namespace = optional(string)
      behavior  = optional(string)
      envs      = optional(list(string))
      files     = optional(list(string))
      literals  = optional(list(string))
      options = optional(object({
        labels                   = optional(map(string))
        annotations              = optional(map(string))
        disable_name_suffix_hash = optional(bool)
      }))
    })))
    crds       = optional(list(string))
    generators = optional(list(string))
    generator_options = optional(object({
      labels                   = optional(map(string))
      annotations              = optional(map(string))
      disable_name_suffix_hash = optional(bool)
    }))
    images = optional(list(object({
      name     = optional(string)
      new_name = optional(string)
      new_tag  = optional(string)
      digest   = optional(string)
    })))
    name_prefix = optional(string)
    namespace   = optional(string)
    name_suffix = optional(string)
    patches = optional(list(object({
      path  = optional(string)
      patch = optional(string)
      target = optional(object({
        group               = optional(string)
        version             = optional(string)
        kind                = optional(string)
        name                = optional(string)
        namespace           = optional(string)
        label_selector      = optional(string)
        annotation_selector = optional(string)
      }))
    })))
    replicas = optional(list(object({
      name  = optional(string)
      count = optional(number)
    })))
    secret_generator = optional(list(object({
      name      = optional(string)
      namespace = optional(string)
      behavior  = optional(string)
      type      = optional(string)
      envs      = optional(list(string))
      files     = optional(list(string))
      literals  = optional(list(string))
      options = optional(object({
        labels                   = optional(map(string))
        annotations              = optional(map(string))
        disable_name_suffix_hash = optional(bool)
      }))
    })))
    transformers = optional(list(string))
    vars = optional(list(object({
      name = optional(string)
      obj_ref = optional(object({
        api_version = optional(string)
        group       = optional(string)
        version     = optional(string)
        kind        = optional(string)
        name        = optional(string)
        namespace   = optional(string)
      }))
      field_ref = optional(object({
        field_path = optional(string)
      }))
    })))
    variant = optional(string)
  }))
  description = "Map with per workspace cluster configuration."
  default     = { apps = {}, ops = {}, loc = {} }
}

@alisdair
Copy link
Member

alisdair commented Mar 4, 2021

Thanks for these reproduction cases! I believe this bug will be fixed if this upstream PR is merged and released: zclconf/go-cty#88.

pst added a commit to kbst/terraform-kubestack that referenced this issue Mar 8, 2021
This change supports both the current `map(map(string)` but
also `map(map(object({...})))` to unlock the ability to move
towards a configuration schema that's not limited to maps of
strings only.

Moving the configuration to ojbects is currently blocked
by an issue with making object attributes optional.

hashicorp/terraform#27295
@pst
Copy link

pst commented Mar 22, 2021

I can confirm 0.15.0-beta2 fixes the issue for my use-case.

@ghost
Copy link

ghost commented Apr 16, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code upstream
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants