Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(aws-eks): proxy support and allow assigning a security group to all cluster handler functions #17200

Merged
merged 22 commits into from Nov 5, 2021

Conversation

ryparker
Copy link
Contributor

@ryparker ryparker commented Oct 27, 2021

Summary

This PR is intended for CDK EKS users who require all traffic to be routed through a proxy. Currently if a user does not allow internet connections to the VPC without going through a proxy, then deploying an EKS cluster will result in a timeout error:

Received response status [FAILED] from custom resource. Message returned: Error: 2021-10-20T14:20:47.028Z d86e3ef4-45ce-4130-988f-c4663f7f8c80 Task timed out after 60.06 seconds

Fixes: #12469, SIM D29159517
Related to but does not resolve: https://github.com/aws/aws-cdk/issues/12171

⚙️ Changes

Expand each list item for additional details.

Corrected "Cluster Handler" docs to clarify that 2 lambdas are created (onEventHandler, isCompleteHandler)

Our docs currently describe the "Cluster Handler" as one Lambda function that interacts with the EKS API. However this is not accurate. The "Cluster Handler" actually creates two Lambdas for the Custom Resource, onEventHandler and isCompleteHandler, both interact with the AWS API.

Passes the clusterHandlerEnvironment to both Cluster Handler Lambdas

The clusterHandlerEnvironment is the recommended method of passing a proxy url (i.g. http_proxy: 'http://my-proxy.com:3128') to the Cluster Handler.

Currently the clusterHandlerEnvironment is only passed to the Cluster Handler's onEventHandler Lambda. The onEventHandler was believed to be the only Cluster Handler Lambda that interacts with the AWS EKS API, however this is not entirely true. Both the onEventHandler and isCompleteHandler call the AWS EKS API.

Following the execution process of isCompleteHandler when creating an EKS cluster:

  1. index.isComplete() (this is the Lambda handler)
  2. common.isComplete()
  3. cluster.isCreateComplete()
  4. cluster.isActive()
  5. Request to EKS API (results in timeout because proxy is not used)

This change allows the user to pass proxy urls as environment variables to both Lambdas using clusterHandlerEnvironment.

Renames the prop onEventLayer -> proxyAgentLayer, and provides the layer to both Cluster Handler Lambdas

The proxy-agent layer is now used in both onEventHandler and isCompleteHandler lambdas in order to support proxy configurations. Because of this change, i've deprecated the original onEventLayer and created a new prop proxyAgentLayer since we will now be passing this prop into more than just the onEventHandler Lambda.

The onEventLayer prop was introduced a few weeks ago (sept 24) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent.

This prop follows the same user customization we allow with the kubectl handler.

Another suitable name for this prop could have been clusterHandlerLayer but I chose proxyAgentLayer because it represents what the layer is used for, instead of describing where it's used. This also follows the convention of the pre-existing kubectlLayer prop.

Adds the EKS cluster prop clusterHandlerSecurityGroup

If a proxy address is provided to the Cluster Handler Lambdas, but the proxy instance is not open to the world, then the dynamic IPs of the Cluster Handler Lambdas will be denied access. To solve this, i've implemented a new Cluster prop clusterHandlerSecurityGroup. This clusterHandlerSecurityGroup prop will allow the user to pass a Security Group to both Lambda functions and the Custom Resource provider.

This is very similar to how we already allow users to pass Security Groups to the Kubectl Handler


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Oct 27, 2021

@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Oct 27, 2021
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Oct 27, 2021
@ryparker ryparker changed the title fix(aws-eks): add proxy support and allow assigning a security group to all cluster handler functions fix(aws-eks): proxy support and allow assigning a security group to all cluster handler functions Oct 28, 2021
@ryparker

This comment has been minimized.

@Nr18
Copy link

Nr18 commented Nov 3, 2021

@ryparker I am looking at the kubectl-handler lambda function. And what I noticed that it uses:

    # "log in" to the cluster
    subprocess.check_call([ 'aws', 'eks', 'update-kubeconfig',
        '--role-arn', role_arn,
        '--name', cluster_name,
        '--kubeconfig', kubeconfig
    ])

To get the cluster configuration file that will later be used by the kubectl commands. When the kubectl-handler lambda function is running in a VPC without internet egress the same proxy should be applied to this function right?

Or the kubeconfig needs to be pulled in in a different way then through the AWS EKS API?

@ryparker ryparker added the pr/do-not-merge This PR should not be merged at this time. label Nov 3, 2021
@ryparker
Copy link
Contributor Author

ryparker commented Nov 4, 2021

Attempted to deploy an isolated EKS cluster with clusterHandlerEnvironment defined with an https_proxy and a clusterHandlerSecurityGroup defined with a Security Group that allows traffic to the proxy instance.

Snippet of cluster initialization (full code here):

 const cluster = new Cluster(stack, 'HelloEks', {
    vpc: props.vpc,
    vpcSubnets: [{ subnetType: SubnetType.PRIVATE_ISOLATED }],
    version: KubernetesVersion.V1_21,
    securityGroup: props.clusterSecurityGroup,
    placeClusterHandlerInVpc: true,
    clusterHandlerSecurityGroup: props.clusterSecurityGroup,
    clusterHandlerEnvironment: {
      // Set the https_proxy environment variable to the proxy server's URL.
      https_proxy: props.proxyUrl,
    },
    kubectlEnvironment: {
      https_proxy: props.proxyUrl,
    },
  });

Shortly after creating the 5 Cluster Handler Lambdas, and waiting for the cluster to become Active, then the deployment failed with:

CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [eab14edf-98a8-4e45-afd3-4e8d888a63c8]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.

Comparing the logs to a successful EKS deployment, the only difference I noticed was in the ProviderframeworkisComplete Lambda. I noticed a 2nd CloudFormation response being sent, first successful, then failed. See here:

CleanShot 2021-11-04 at 03 01 44@2x

On a successful EKS deploy (that does not use clusterHandlerSecurityGroup ) the same Lambda would finish without the failure:

CleanShot 2021-11-04 at 03 05 30@2x

The VPC flow logs showed many REJECT logs:

2021-11-03T21:48:26.000-07:00 | 2 495634646531 eni-0169c5dced029b689 45.134.26.232 10.0.161.48 48428 44234 6 1 40 1636001306 1636001366 REJECT OK
-- | --

CleanShot 2021-11-04 at 02 54 36@2x

Note that 10.0.161.48 is the private IP of my proxy instance (EC2)

It seems like the EKS custom resource fails when attempting to submit a response back to cloudformation (after the cluster's status is Active). This behavior makes it seem as if the ProviderFramework does not have access to the CloudFormation VPC endpoint even though i've created a VPC interface endpoint for the CloudFormation service.

This can be reproduced by deploying this sample project

@ryparker ryparker removed the pr/do-not-merge This PR should not be merged at this time. label Nov 4, 2021
@ryparker
Copy link
Contributor Author

ryparker commented Nov 4, 2021

Merging as is. The PR should not change default behavior. The failed test results I mentioned above are due to an error in configuring connections between VPC resources and are not specific to EKS or the changes made in this PR.

Will continue to test and possibly submit a follow up PR if changes are necessary.

@ryparker
Copy link
Contributor Author

ryparker commented Nov 4, 2021

@ryparker I am looking at the kubectl-handler lambda function. And what I noticed that it uses:

    # "log in" to the cluster
    subprocess.check_call([ 'aws', 'eks', 'update-kubeconfig',
        '--role-arn', role_arn,
        '--name', cluster_name,
        '--kubeconfig', kubeconfig
    ])

To get the cluster configuration file that will later be used by the kubectl commands. When the kubectl-handler lambda function is running in a VPC without internet egress the same proxy should be applied to this function right?

Or the kubeconfig needs to be pulled in in a different way then through the AWS EKS API?

The kubectl-handler uses Python. The AWS SDK for Python already has built-in support for detecting and using a proxy address if it is provided as an environment variable such as http_proxy. As long as you pass the proxy address variable into the Cluster's kubectlEnvironment prop, it should be used.

packages/@aws-cdk/aws-eks/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-eks/README.md Outdated Show resolved Hide resolved
@@ -40,7 +40,14 @@ export interface ClusterResourceProviderProps {
*
* If not defined, a default layer will be used.
*/
readonly onEventLayer?: lambda.ILayerVersion;
readonly proxyAgentLayer?: lambda.ILayerVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the name change...?

Copy link
Contributor Author

@ryparker ryparker Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the name change is because the proxy-agent layer is now being used in both onEventHandler and isCompleteHandler Lambdas. Because of this change i've deprecated the original onEventLayer and created a new prop proxyAgentLayer since we will now be passing this prop into more than just the onEventHandler Lambda.

The onEventLayer prop was introduced a few weeks ago (sept 24) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent.

This prop follows the same user customization we allow with the kubectl handler. However if we'd rather not allow users to override that layer then we may still want to deprecate the onEventLayer prop since it is no longer being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're over optimizing here. Can we just leave everything as is as far as layers go and simply add the NodeProxyAgentLayer to the lambda's that needed and thats it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I'll remove the new prop and leave the onEventLayer prop. I'll also reconnect the onEventLayer to the onEventLambda.

@@ -66,6 +73,9 @@ export class ClusterResourceProvider extends NestedStack {
private constructor(scope: Construct, id: string, props: ClusterResourceProviderProps) {
super(scope as CoreConstruct, id);

// Allow user to override the layer. Layer must contain `proxy-agent` node_module which is required to proxy AWS SDK requests.
const proxyAgentLayer = props.proxyAgentLayer ? props.proxyAgentLayer : new NodeProxyAgentLayer(this, 'NodeProxyAgentLayer');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this should be configurable. The lambda function code now must have this layer in order to work, so we should simply be adding it in addition to the externally provided layer.

packages/@aws-cdk/aws-eks/lib/cluster.ts Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Nov 4, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 991bb0d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 7bbd10d into master Nov 5, 2021
@mergify mergify bot deleted the eks-cluster-handler-security-group-prop branch November 5, 2021 00:42
@mergify
Copy link
Contributor

mergify bot commented Nov 5, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this pull request Feb 21, 2022
…ll cluster handler functions (aws#17200)

## Summary

This PR is intended for CDK EKS users who require all traffic to be routed through a proxy. Currently if a user does not allow internet connections to the VPC without going through a proxy, then deploying an EKS cluster will result in a timeout error:

```sh
Received response status [FAILED] from custom resource. Message returned: Error: 2021-10-20T14:20:47.028Z d86e3ef4-45ce-4130-988f-c4663f7f8c80 Task timed out after 60.06 seconds
```
Fixes: aws#12469, SIM D29159517
Related to but does not resolve: `aws#12171

## ⚙️ Changes

_Expand each list item for additional details._

<details>
<summary><strong>Corrected "Cluster Handler" docs to clarify that 2 lambdas are created (<code>onEventHandler</code>, <code>isCompleteHandler</code>)</strong></summary>
<br />

Our docs [currently describe the "Cluster Handler" as one Lambda function that interacts with the EKS API](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler). However this is not accurate. The "Cluster Handler" actually creates [two Lambdas](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-provider.ts#L69-L96) for the Custom Resource, `onEventHandler` and `isCompleteHandler`, both interact with the AWS API.

</details>

<details>
<summary><strong>Passes the <code>clusterHandlerEnvironment</code> to both Cluster Handler Lambdas</strong></summary>
<br />

The `clusterHandlerEnvironment` is the [recommended method](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler) of passing a proxy url (i.g. `http_proxy: 'http://my-proxy.com:3128'`) to the Cluster Handler. 

Currently the `clusterHandlerEnvironment` is only passed to the Cluster Handler's `onEventHandler` Lambda. [The `onEventHandler` was believed to be the only Cluster Handler Lambda that interacts with the AWS EKS API](aws#12469 (comment)), however this is not entirely true. Both the `onEventHandler` and `isCompleteHandler` call the AWS EKS API.

Following the execution process of `isCompleteHandler` when creating an EKS cluster:

1. [`index.isComplete()` (this is the Lambda handler)](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/index.ts#L48)
2. [`common.isComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/common.ts#L59)
3. [`cluster.isCreateComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L56)
4. [`cluster.isActive()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L196)
5. [Request to EKS API](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L198) (results in timeout because proxy is not used)

This change allows the user to pass proxy urls as environment variables to **both** Lambdas using `clusterHandlerEnvironment`.

</details>

<details>
<summary><strong>Renames the prop <code>onEventLayer</code> -> <code>proxyAgentLayer</code>, and provides the layer to both Cluster Handler Lambdas</strong></summary>
<br />

The proxy-agent layer is now used in both `onEventHandler` and `isCompleteHandler` lambdas in order to support proxy configurations. Because of this change, i've deprecated the original `onEventLayer` and created a new prop `proxyAgentLayer` since we will now be passing this prop into more than just the `onEventHandler` Lambda.

The `onEventLayer` prop was introduced [a few weeks ago (sept 24)](aws#16657) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent. 

This prop follows the [same user customization we allow with the kubectl handler](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer). 

Another suitable name for this prop could have been `clusterHandlerLayer` but I chose `proxyAgentLayer` because it represents **what** the layer is used for, instead of describing **where** it's used. This also follows the convention of the pre-existing [`kubectlLayer` prop](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer).

</details>

<details>
<summary><strong>Adds the EKS cluster prop <code>clusterHandlerSecurityGroup</code></strong></summary>
<br />

If a proxy address is provided to the Cluster Handler Lambdas, but the proxy instance is not open to the world, then the dynamic IPs of the Cluster Handler Lambdas will be denied access. To solve this, i've implemented a new Cluster prop `clusterHandlerSecurityGroup`. This `clusterHandlerSecurityGroup` prop will allow the user to pass a Security Group to both Lambda functions and the Custom Resource provider. 

This is very similar to how we [already allow users to pass Security Groups to the Kubectl Handler](https://github.com/aws/aws-cdk/blob/7f194000697b85deb410ae0d7f7d4ac3c2654bcc/packages/%40aws-cdk/aws-eks/lib/kubectl-provider.ts#L83)

</details>

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(aws-eks): Construct Library custom resources doesn't use proxy properly
5 participants