(aws_eks): (EKS cluster_logging property makes cluster unmanageable) #20779

dborysenko · 2022-06-17T23:16:50Z

Describe the bug

EKS cluster built with cluster_logging property set prevents the cluster from being modified. In my particular tests, I can't add or delete CIDRs to/from endpoint_access cluster property.

Expected Behavior

cluster_logging should not prevent cluster settings from being modified.

Current Behavior

cluster_logging makes cluster unmanageable

Reproduction Steps

Create EKS cluster using CDK code:

from aws_cdk import (Stack, aws_eks as eks, aws_ec2 as ec2)
from constructs import Construct


class PublicCidrBugStack(Stack):

  def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)
    vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id="vpc-myvpcid")

    api_access = ["1.2.3.4/32"]
    public_api_access = eks.EndpointAccess.PUBLIC_AND_PRIVATE.only_from(*api_access)
    cluster = eks.Cluster(
        self,
        "publicCidrBug",
        cluster_name="public-cidr-bug",
        version=eks.KubernetesVersion.V1_21,
        vpc=vpc,
        default_capacity=0,
        endpoint_access=public_api_access,
        vpc_subnets=[
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT),
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC)
        ],
        cluster_logging=[eks.ClusterLoggingTypes.AUTHENTICATOR],
        alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_4_1),
        tags={"mytesttag": "tag"})

    cluster.add_auto_scaling_group_capacity(
        "systemASG",
        instance_type=ec2.InstanceType("t3.large"),
        min_capacity=1,
        max_capacity=5,
        vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT))

Now try to add another CIDR to endpoint_access list:

from aws_cdk import (Stack, aws_eks as eks, aws_ec2 as ec2)
from constructs import Construct


class PublicCidrBugStack(Stack):

  def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)
    vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id="vpc-myvpcid")

    api_access = ["1.2.3.4/32", "5.6.7.8/32"]
    public_api_access = eks.EndpointAccess.PUBLIC_AND_PRIVATE.only_from(*api_access)
    cluster = eks.Cluster(
        self,
        "publicCidrBug",
        cluster_name="public-cidr-bug",
        version=eks.KubernetesVersion.V1_21,
        vpc=vpc,
        default_capacity=0,
        endpoint_access=public_api_access,
        vpc_subnets=[
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT),
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC)
        ],
        cluster_logging=[eks.ClusterLoggingTypes.AUTHENTICATOR],
        alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_4_1),
        tags={"mytesttag": "tag"})

    cluster.add_auto_scaling_group_capacity(
        "systemASG",
        instance_type=ec2.InstanceType("t3.large"),
        min_capacity=1,
        max_capacity=5,
        vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT))

CDK deploy command reports an error:

4:36:56 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | publicCidrBugE3557010
Received response status [FAILED] from custom resource. Message returned: Only one type of update can be allowed.

Logs: /aws/lambda/PublicCidrBugStack-awscdkaw-OnEventHandler42BEBAE0-zc6hdyL2awp8

at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 0264cd3c-50b0-409d-912a-46785c043c81)

Moreover, now when you try to remove cluster_logging property, you'll get below error:

4:48:43 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | publicCidrBugE3557010
Received response status [FAILED] from custom resource. Message returned: The type for cluster update was not provided.

Logs: /aws/lambda/PublicCidrBugStack-awscdkaw-OnEventHandler42BEBAE0-zc6hdyL2awp8

at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 0065d963-a150-4c46-b5cb-a0df59717c58)

Combination of above bugs effectively makes your cluster unmanageable. You can't make changes to the cluster nor you can remove logging setting. I did not find a way out of this loop.

Possible Solution

Only solution is not to use cluster_logging. But once you did there is no way out.

Additional Information/Context

No response

CDK CLI Version

2.28.0 (build ba233f0)

Framework Version

No response

Node.js Version

v16.15.0

OS

MacOS 12.4

Language

Python

Language Version

Python 3.9.12

Other information

No response

The text was updated successfully, but these errors were encountered:

otaviomacedo · 2022-07-07T13:55:20Z

This is bad. I'm marking this as a p1. Unfortunately, EKS is not in our current list of prioritized modules, which means we won't be able to address it immediately.

But for anyone interested in picking this up, here's my hypothesis: the update-cluster-config method allows you to pass parameters for VPC configuration and logging, but not both at the same time (even though this is not spelled out in the docs). And that's exactly what the custom resource is trying to do when you update access and there is already a logging configuration.

We should probably split this condition in two:

aws-cdk/packages/@aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts

Line 139 in d644c00

if (updates.updateLogging || updates.updateAccess) {

one for the update logging and one for update access.

watany-dev · 2022-07-22T23:53:48Z

I'll do some research on this.

dborysenko · 2022-11-21T20:29:09Z

Hey @watany-dev, @TheRealAmazonKendra
I saw there was an attempt to implement a fix for the issue, but it was closed and never merged. Can you please take a look one more time?
Thanks.

pahud · 2023-01-26T22:49:12Z

With #22957 merged, I think this issue should be fixed. Can you verify again?

github-actions · 2023-01-29T00:19:59Z

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

dborysenko · 2023-03-02T03:34:25Z

@pahud
The issue is still there. Steps to reproduce:

Create EKS cluster with no logging enabled
Enable AUDIT logging using cdk
Disable AUDIT logging using cdk

Step 3 produces error similar to below:
{ "Status": "FAILED", "Reason": "The type for cluster update was not provided.\n\nLogs: /aws/lambda/MainStage-ClusterName-a-OnEventHandler42BEBAE0-vZTl1Rt7zyYI\n\n at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)\n at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)\n at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\n at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\n at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)\n at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)\n at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)\n at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10\n at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)\n at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)", "StackId": "arn:aws:cloudformation:us-west-2:MYACCOUNT:stack/MainStage-ClusterName/eba1fb00-62d9-11ed-b293-02fb7902dcc9", "RequestId": "5fccdf75-7e8c-4667-9fff-3363726856f1", "PhysicalResourceId": "my-cluster-name", "LogicalResourceId": "cdvrdevpe010786640F" }

cdk version: 2.66.1

pahud · 2023-03-02T22:42:54Z

reopening this issue as I can reproduce it with the code below:

import { App, Stack, StackProps,
  aws_eks as eks,
  aws_ec2 as ec2 } from 'aws-cdk-lib';
  import { KubectlV24Layer as KubectlLayer } from '@aws-cdk/lambda-layer-kubectl-v24';
  
  import { Construct } from 'constructs';
  
  export class EksTsStack extends Stack {
    constructor(scope: Construct, id: string, props: StackProps = {}) {
      super(scope, id, props);
  
      const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true });
      const cluster = new eks.Cluster(this, 'Cluster', {
        vpc,
        version: eks.KubernetesVersion.V1_24,
        kubectlLayer: new KubectlLayer(this, 'KubectlLayer'),
        // clusterLogging: [
        //   eks.ClusterLoggingTypes.AUDIT,
        // ],
      })
    }
  }

pahud · 2023-03-02T22:55:32Z

I think the bug could be below:

When the logging is changed from string 'true' to undefinied, we probably should set parsed.logging.clusterLogging[0].enabled = false

aws-cdk/packages/@aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts

Lines 295 to 297 in 9a07ab0

    
           if (typeof (parsed.logging?.clusterLogging[0].enabled) === 'string') { 
        
             parsed.logging.clusterLogging[0].enabled = parsed.logging.clusterLogging[0].enabled === 'true'; 
        
           }

I am leaving this a p2 bug. Any PR submission would be highly appreciated!

pahud · 2023-03-19T16:25:19Z

I have created a PR #24688 for this.

github-actions · 2023-03-21T17:27:47Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

dborysenko added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 17, 2022

dborysenko changed the title ~~(aws_eks): (EKS cluster_logging property make cluster unmanageable)~~ (aws_eks): (EKS cluster_logging property makes cluster unmanageable) Jun 17, 2022

github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Jun 17, 2022

github-actions bot assigned otaviomacedo Jun 17, 2022

otaviomacedo added p1 and removed needs-triage This issue or PR still needs to be triaged. labels Jul 7, 2022

otaviomacedo removed their assignment Jul 7, 2022

otaviomacedo added the effort/small Small work item – less than a day of effort label Jul 7, 2022

watany-dev mentioned this issue Jul 28, 2022

fix(eks): cluster built with clusterLogging property set prevents the cluster from being modified #21362

Closed

4 tasks

pahud added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed p1 labels Jan 26, 2023

github-actions bot closed this as completed Feb 3, 2023

pahud self-assigned this Mar 2, 2023

pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Mar 2, 2023

pahud reopened this Mar 2, 2023

pahud removed investigating This issue is being investigated and/or work is in progress to resolve the issue. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Mar 2, 2023

pahud removed effort/small Small work item – less than a day of effort closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. labels Mar 2, 2023

pahud removed their assignment Mar 2, 2023

pahud added effort/medium Medium work item – several days of effort effort/small Small work item – less than a day of effort and removed effort/medium Medium work item – several days of effort labels Mar 2, 2023

cgarvis added p1 and removed p2 labels Mar 13, 2023

pahud mentioned this issue Mar 19, 2023

fix(eks): fail to update cluster by disabling logging props #24688

Merged

5 tasks

madeline-k closed this as completed in #24688 Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(aws_eks): (EKS cluster_logging property makes cluster unmanageable) #20779

(aws_eks): (EKS cluster_logging property makes cluster unmanageable) #20779

dborysenko commented Jun 17, 2022

otaviomacedo commented Jul 7, 2022

watany-dev commented Jul 22, 2022

dborysenko commented Nov 21, 2022

pahud commented Jan 26, 2023

github-actions bot commented Jan 29, 2023

dborysenko commented Mar 2, 2023

pahud commented Mar 2, 2023

pahud commented Mar 2, 2023

pahud commented Mar 19, 2023

github-actions bot commented Mar 21, 2023