Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws_eks): (EKS cluster_logging property makes cluster unmanageable) #20779

Closed
dborysenko opened this issue Jun 17, 2022 · 10 comments · Fixed by #24688
Closed

(aws_eks): (EKS cluster_logging property makes cluster unmanageable) #20779

dborysenko opened this issue Jun 17, 2022 · 10 comments · Fixed by #24688
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/small Small work item – less than a day of effort p1

Comments

@dborysenko
Copy link

Describe the bug

EKS cluster built with cluster_logging property set prevents the cluster from being modified. In my particular tests, I can't add or delete CIDRs to/from endpoint_access cluster property.

Expected Behavior

cluster_logging should not prevent cluster settings from being modified.

Current Behavior

cluster_logging makes cluster unmanageable

Reproduction Steps

Create EKS cluster using CDK code:

from aws_cdk import (Stack, aws_eks as eks, aws_ec2 as ec2)
from constructs import Construct


class PublicCidrBugStack(Stack):

  def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)
    vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id="vpc-myvpcid")

    api_access = ["1.2.3.4/32"]
    public_api_access = eks.EndpointAccess.PUBLIC_AND_PRIVATE.only_from(*api_access)
    cluster = eks.Cluster(
        self,
        "publicCidrBug",
        cluster_name="public-cidr-bug",
        version=eks.KubernetesVersion.V1_21,
        vpc=vpc,
        default_capacity=0,
        endpoint_access=public_api_access,
        vpc_subnets=[
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT),
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC)
        ],
        cluster_logging=[eks.ClusterLoggingTypes.AUTHENTICATOR],
        alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_4_1),
        tags={"mytesttag": "tag"})

    cluster.add_auto_scaling_group_capacity(
        "systemASG",
        instance_type=ec2.InstanceType("t3.large"),
        min_capacity=1,
        max_capacity=5,
        vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT))

Now try to add another CIDR to endpoint_access list:

from aws_cdk import (Stack, aws_eks as eks, aws_ec2 as ec2)
from constructs import Construct


class PublicCidrBugStack(Stack):

  def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)
    vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id="vpc-myvpcid")

    api_access = ["1.2.3.4/32", "5.6.7.8/32"]
    public_api_access = eks.EndpointAccess.PUBLIC_AND_PRIVATE.only_from(*api_access)
    cluster = eks.Cluster(
        self,
        "publicCidrBug",
        cluster_name="public-cidr-bug",
        version=eks.KubernetesVersion.V1_21,
        vpc=vpc,
        default_capacity=0,
        endpoint_access=public_api_access,
        vpc_subnets=[
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT),
            ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC)
        ],
        cluster_logging=[eks.ClusterLoggingTypes.AUTHENTICATOR],
        alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_4_1),
        tags={"mytesttag": "tag"})

    cluster.add_auto_scaling_group_capacity(
        "systemASG",
        instance_type=ec2.InstanceType("t3.large"),
        min_capacity=1,
        max_capacity=5,
        vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT))

CDK deploy command reports an error:

4:36:56 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | publicCidrBugE3557010
Received response status [FAILED] from custom resource. Message returned: Only one type of update can be allowed.

Logs: /aws/lambda/PublicCidrBugStack-awscdkaw-OnEventHandler42BEBAE0-zc6hdyL2awp8

at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 0264cd3c-50b0-409d-912a-46785c043c81)

Moreover, now when you try to remove cluster_logging property, you'll get below error:

4:48:43 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | publicCidrBugE3557010
Received response status [FAILED] from custom resource. Message returned: The type for cluster update was not provided.

Logs: /aws/lambda/PublicCidrBugStack-awscdkaw-OnEventHandler42BEBAE0-zc6hdyL2awp8

at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 0065d963-a150-4c46-b5cb-a0df59717c58)

Combination of above bugs effectively makes your cluster unmanageable. You can't make changes to the cluster nor you can remove logging setting. I did not find a way out of this loop.

Possible Solution

Only solution is not to use cluster_logging. But once you did there is no way out.

Additional Information/Context

No response

CDK CLI Version

2.28.0 (build ba233f0)

Framework Version

No response

Node.js Version

v16.15.0

OS

MacOS 12.4

Language

Python

Language Version

Python 3.9.12

Other information

No response

@dborysenko dborysenko added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 17, 2022
@dborysenko dborysenko changed the title (aws_eks): (EKS cluster_logging property make cluster unmanageable) (aws_eks): (EKS cluster_logging property makes cluster unmanageable) Jun 17, 2022
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Jun 17, 2022
@otaviomacedo
Copy link
Contributor

This is bad. I'm marking this as a p1. Unfortunately, EKS is not in our current list of prioritized modules, which means we won't be able to address it immediately.

But for anyone interested in picking this up, here's my hypothesis: the update-cluster-config method allows you to pass parameters for VPC configuration and logging, but not both at the same time (even though this is not spelled out in the docs). And that's exactly what the custom resource is trying to do when you update access and there is already a logging configuration.

We should probably split this condition in two:

if (updates.updateLogging || updates.updateAccess) {

one for the update logging and one for update access.

@otaviomacedo otaviomacedo added p1 and removed needs-triage This issue or PR still needs to be triaged. labels Jul 7, 2022
@otaviomacedo otaviomacedo removed their assignment Jul 7, 2022
@otaviomacedo otaviomacedo added the effort/small Small work item – less than a day of effort label Jul 7, 2022
@watany-dev
Copy link
Contributor

I'll do some research on this.

@dborysenko
Copy link
Author

Hey @watany-dev, @TheRealAmazonKendra
I saw there was an attempt to implement a fix for the issue, but it was closed and never merged. Can you please take a look one more time?
Thanks.

@pahud
Copy link
Contributor

pahud commented Jan 26, 2023

With #22957 merged, I think this issue should be fixed. Can you verify again?

@pahud pahud added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed p1 labels Jan 26, 2023
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Jan 29, 2023
@github-actions github-actions bot closed this as completed Feb 3, 2023
@dborysenko
Copy link
Author

@pahud
The issue is still there. Steps to reproduce:

  1. Create EKS cluster with no logging enabled
  2. Enable AUDIT logging using cdk
  3. Disable AUDIT logging using cdk

Step 3 produces error similar to below:
{ "Status": "FAILED", "Reason": "The type for cluster update was not provided.\n\nLogs: /aws/lambda/MainStage-ClusterName-a-OnEventHandler42BEBAE0-vZTl1Rt7zyYI\n\n at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)\n at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)\n at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\n at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\n at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)\n at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)\n at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)\n at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10\n at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)\n at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)", "StackId": "arn:aws:cloudformation:us-west-2:MYACCOUNT:stack/MainStage-ClusterName/eba1fb00-62d9-11ed-b293-02fb7902dcc9", "RequestId": "5fccdf75-7e8c-4667-9fff-3363726856f1", "PhysicalResourceId": "my-cluster-name", "LogicalResourceId": "cdvrdevpe010786640F" }

cdk version: 2.66.1

@pahud pahud self-assigned this Mar 2, 2023
@pahud pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Mar 2, 2023
@pahud
Copy link
Contributor

pahud commented Mar 2, 2023

reopening this issue as I can reproduce it with the code below:

import { App, Stack, StackProps,
  aws_eks as eks,
  aws_ec2 as ec2 } from 'aws-cdk-lib';
  import { KubectlV24Layer as KubectlLayer } from '@aws-cdk/lambda-layer-kubectl-v24';
  
  import { Construct } from 'constructs';
  
  export class EksTsStack extends Stack {
    constructor(scope: Construct, id: string, props: StackProps = {}) {
      super(scope, id, props);
  
      const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true });
      const cluster = new eks.Cluster(this, 'Cluster', {
        vpc,
        version: eks.KubernetesVersion.V1_24,
        kubectlLayer: new KubectlLayer(this, 'KubectlLayer'),
        // clusterLogging: [
        //   eks.ClusterLoggingTypes.AUDIT,
        // ],
      })
    }
  }

image

@pahud pahud reopened this Mar 2, 2023
@pahud
Copy link
Contributor

pahud commented Mar 2, 2023

I think the bug could be below:

When the logging is changed from string 'true' to undefinied, we probably should set parsed.logging.clusterLogging[0].enabled = false

if (typeof (parsed.logging?.clusterLogging[0].enabled) === 'string') {
parsed.logging.clusterLogging[0].enabled = parsed.logging.clusterLogging[0].enabled === 'true';
}

I am leaving this a p2 bug. Any PR submission would be highly appreciated!

@pahud pahud removed investigating This issue is being investigated and/or work is in progress to resolve the issue. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Mar 2, 2023
@pahud pahud removed effort/small Small work item – less than a day of effort closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. labels Mar 2, 2023
@pahud pahud removed their assignment Mar 2, 2023
@pahud pahud added effort/medium Medium work item – several days of effort effort/small Small work item – less than a day of effort and removed effort/medium Medium work item – several days of effort labels Mar 2, 2023
@cgarvis cgarvis added p1 and removed p2 labels Mar 13, 2023
@pahud
Copy link
Contributor

pahud commented Mar 19, 2023

I have created a PR #24688 for this.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/small Small work item – less than a day of effort p1
Projects
None yet
5 participants