Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-eks): Construct Library custom resources doesn't use proxy properly #12469

Closed
oleksii-boiko-ua opened this issue Jan 12, 2021 · 12 comments · Fixed by #16609 or #17200
Closed

(aws-eks): Construct Library custom resources doesn't use proxy properly #12469

oleksii-boiko-ua opened this issue Jan 12, 2021 · 12 comments · Fixed by #16609 or #17200
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p1

Comments

@oleksii-boiko-ua
Copy link

oleksii-boiko-ua commented Jan 12, 2021

I'm trying to test new feature related to provision within the VPC lambda functions related to EKS configuration like ClusterHandler. Lambdas are placed to vpc and it's great. But I got error connecting to EKS api via proxy

Reproduction Steps

self.cluster = aws_eks.Cluster(
            scope=self,
            id='cluster',
            cluster_name="cluster-" + environment,
            endpoint_access=aws_eks.EndpointAccess.PUBLIC_AND_PRIVATE,
            default_capacity=0,
            vpc=vpc,
            vpc_subnets=[aws_ec2.SubnetSelection(subnets=[subnet_a_eks, subnet_b_eks, subnet_c_eks])],
            # issue with 3 subnet
            place_cluster_handler_in_vpc=True,
            version=cluster_version,
            cluster_handler_environment={
                "http_proxy": "http://login:pass@proxy.cloud.local:8080/"
            },
            kubectl_environment={
                "http_proxy": "http://login:pass@proxy.cloud.local:8080/"
            },
            security_group=eks_control_plane_sg,
            role=eks_control_plane_role,
        )

What did you expect to happen?

successful cluster creation

What actually happened?

Cloudwatch log of ProviderframeworkonEvent. function:
2021-01-09T14:37:27.905Z e785fa78-c5f8-471c-a495-59d75389a6c6 INFO [provider-framework] submit response to cloudformation { "Status": "FAILED", "Reason": "Error: connect ETIMEDOUT 63.32.73.253:443\n at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1107:14)", "StackId": "arn:aws:cloudformation:eu-west-1:accountid:stack/eks-stack-develop-cdk/276db000-5285-11eb-ab35-0615947f7f49", "RequestId": "ce8a03d7-fdf3-4def-acb4-6219fb352732", "PhysicalResourceId": "AWSCDK::CustomResourceProviderFramework::CREATE_FAILED", "LogicalResourceId": "clusterC5B25D0D" }

Proxy is fine i tested it against same endpoint ( strange that it calls ec2 and nor eks api

[ec2-user@ip-10-60-233-255 ~]$ curl -vk https://ec2-63-32-73-253.eu-west-1.compute.amazonaws.com.
* Rebuilt URL to: https://ec2-63-32-73-253.eu-west-1.compute.amazonaws.com./
* Uses proxy env variable https_proxy == 'http://user:password@proxy.cloud.local:8080/'
*   Trying 10.60.249.170...
* TCP_NODELAY set
* Connected to proxy.cloud.local (10.60.249.170) port 8080 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to ec2-63-32-73-253.eu-west-1.compute.amazonaws.com:443
* Proxy auth using Basic with user 'user'
> CONNECT ec2-63-32-73-253.eu-west-1.compute.amazonaws.com:443 HTTP/1.1
> Host: ec2-63-32-73-253.eu-west-1.compute.amazonaws.com:443
> Proxy-Authorization: Basic token
> User-Agent: curl/7.61.1
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 200 Connection established
< 
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* CONNECT phase completed!
* CONNECT phase completed!
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=lambda.eu-west-1.amazonaws.com
*  start date: Dec 23 00:00:00 2020 GMT
*  expire date: Jan 21 23:59:59 2022 GMT
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x17f8190)
> GET / HTTP/2
> Host: ec2-63-32-73-253.eu-west-1.compute.amazonaws.com
> User-Agent: curl/7.61.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 403 
< date: Tue, 12 Jan 2021 13:31:59 GMT
< content-length: 127
< x-amzn-requestid: c6c6badc-56de-4e6a-8266-0d7971505c84
< 
<MissingAuthenticationTokenException>
  <Message>Missing Authentication Token</Message>
</MissingAuthenticationTokenException>
* Connection #0 to host proxy.cloud.local left intact

Environment

  • **CDK CLI Version :1.83
  • Framework Version:
  • **Node.js Version: v14.13.0
  • **OS :macOS
  • **Language (Version): Python (3.9)

Other

I noticed that 5 lambda functions are created but only 1 of them "OnEventHandler" receives proxy configuration, but looks like it only one which interacts with api


This is 🐛 Bug Report

@oleksii-boiko-ua oleksii-boiko-ua added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 12, 2021
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Jan 12, 2021
@oleksii-boiko-ua
Copy link
Author

oleksii-boiko-ua commented Jan 12, 2021

@iliapolo could you take a look when have time please, could it be related to tls check, can we somehow disable it?

@iliapolo
Copy link
Contributor

@alexey-boyko

I noticed that 5 lambda functions are created but only 1 of them "OnEventHandler" receives proxy configuration, but looks like it only one which interacts with api

Yes, all functions should be connected to the VPC but only the one interacting with the EKS API receives the cluster_handler_environment. This is the intended behavior.

could it be related to tls check, can we somehow disable it?

Not sure what you mean by that, disable what?

Are you able to create an EKS cluster via the proxy using the SDK? we use the createCluster API call, I think this would be the ultimate test to isolate the problem, i'm not sure yet this is a CDK issue.

@iliapolo iliapolo added guidance Question that needs advice or information. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed bug This issue is a bug. labels Jan 12, 2021
@oleksii-boiko-ua
Copy link
Author

thanks i will try it right now to test via api call, but i noticed that call is made from ProviderframeworkonEvent lambda function, which doesn't have proxy set during deploy, only OnEventHandler lambda function

@iliapolo
Copy link
Contributor

@alexey-boyko The ProviderframeworkonEvent only invokes the OnEventHandler and responds to CFN with the status. You should look at the CloudWatch of the OnEventHandler function.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 13, 2021
@oleksii-boiko-ua
Copy link
Author

oleksii-boiko-ua commented Jan 14, 2021

@iliapolo unfortunately i can't make it work( i tried to create lambda function in vpc with code below, but anyway i got same timeout as using OnEventHandler Task timed out after 3.00 seconds. I saw many issue about aws-sdk-javascript and proxy( people experiencing the same) looks like it bypass proxy. But when i use python3 and boto3 i can successfully create or list cluster

const AWS = require('aws-sdk')
// Set the region 
AWS.config.update({region: 'eu-west-1'});
const eks = new AWS.EKS()

var proxy = require('proxy-agent');
AWS.config.update({
  httpOptions: { agent: proxy('http://login:password@proxy.cloud.local:8080/') }
});

exports.handler = async function(event) {
    var params = { 
      maxResults: '5',
    }

    console.log("Checkpoint 1");

    let cluster

    try {
      cluster = await eks.listClusters(params).promise();
      console.log(cluster)
    } catch (e) {
      console.log(e)
    }

    console.log("Checkpoint 2");
    
    return {
        statusCode: 200,
        body: JSON.stringify(cluster || {message: 'Nothing'})
    }
}

@oleksii-boiko-ua
Copy link
Author

anything i can try, can i increase log level?, also i wonder why timeout to ec2-63-32-73-253.eu-west-1.compute.amazonaws.com. not eks.eu-west-1.compute.amazonaws.com, and in cloudwatch only logs from ProviderframeworkonEven

@iliapolo
Copy link
Contributor

@alexey-boyko The ProviderframeworkonEvent function invokes the OnEventHandler, if you only see logs from ProviderframeworkonEvent function, it means that the framework couldn't invoke the function.

Do the subnets you are using have outgoing internet access? If not, you will need to configure VPC endpoints.

See #12171 for more details. This will allow the function calling EKS api's to be invoked, from that point on your proxy configuration should be applied.

As for the proxy configuration itself, I'm afraid I don't have any insight on this...I suggest we take it step by step by first making sure that the OnEventHandler function actually gets invoked.

Thanks

@iliapolo iliapolo added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 17, 2021
@oleksii-boiko-ua
Copy link
Author

hey @iliapolo thanks a lot for answer, "Do the subnets you are using have outgoing internet access? If not, you will need to configure VPC endpoints." we don't have internet, we use proxy for accessing aws api's which don't have vpc endpoints, for rest we use vpc endpoints. I have several lambda function which use proxy to call eks api, but they use python3 and boto3,
I don't know what I can test more to debug this issue((( i can't even make it work with lambda function above( Did someone test it via proxy? also why step function logs are disabled by default? anyway maybe i need to investigate more. thanks

@iliapolo
Copy link
Contributor

for rest we use vpc endpoints

Makes sense, so just make sure you configure all the necessary endpoints as mentioned here.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 18, 2021
@oleksii-boiko-ua
Copy link
Author

oleksii-boiko-ua commented Jan 18, 2021

@iliapolo i was missing lambda and step function endpoints, now OnEventHandler lambda gets invoked but anyway i receive not verbose error(( so it hard to understand what is happening
OnEventHandler lambda log:

2021-01-18T16:32:48.849Z	cbee5717-0921-4d2c-acbc-08b5e857e3d0	INFO	onCreate: creating cluster with options: 
{
    "resourcesVpcConfig": {
        "endpointPrivateAccess": true,
        "securityGroupIds": [
            "sg-0fccfa9d6ds9"
        ],
        "endpointPublicAccess": true,
        "subnetIds": [
            "subnet-086ds54e9073",
            "subnet-0d3c5as3c0869f91"
        ]
    },
    "roleArn": "arn:aws:iam::accountId:role/test-develop-cdk-control-plane",
    "name": "cluster-develop-cdk",
    "version": "1.18"
}

2021-01-18T16:33:48.795Z cbee5717-0921-4d2c-acbc-08b5e857e3d0 Task timed out after 60.06 seconds

Copy
2021-01-18T16:33:48.795Z cbee5717-0921-4d2c-acbc-08b5e857e3d0 Task timed out after 60.06 seconds

ProviderframeworkonEvent log:

2021-01-18T16:33:48.817Z 71f60ed5-face-4ef4-842a-8f02e491d969 INFO [provider-framework] user function response: { "StatusCode": 200, "FunctionError": "Unhandled", "ExecutedVersion": "$LATEST", "Payload": "{\"errorMessage\":\"2021-01-18T16:33:48.795Z cbee5717-0921-4d2c-acbc-08b5e857e3d0 Task timed out after 60.06 seconds\"}" } object

Copy
2021-01-18T16:33:48.817Z	71f60ed5-face-4ef4-842a-8f02e491d969	INFO	[provider-framework] user function response: 
{
    "StatusCode": 200,
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST",
    "Payload": "{\"errorMessage\":\"2021-01-18T16:33:48.795Z cbee5717-0921-4d2c-acbc-08b5e857e3d0 Task timed out after 60.06 seconds\"}"
}

It happens anyway even if i add proxy variable or not

@iliapolo
Copy link
Contributor

@alexey-boyko

It happens anyway even if i add proxy variable or not

I guess that makes in case the proxy isn't being applied correctly. I did a little digging and it seems that unlike python, nodejs doesn't use any global env variables to configures proxy passes. Looks like every http client implements this independently.

We might need to do add explicit support for this in the library. I'll try adding some more information soon.

@iliapolo iliapolo added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jan 20, 2021
@peterwoodworth peterwoodworth added bug This issue is a bug. and removed needs-triage This issue or PR still needs to be triaged. guidance Question that needs advice or information. labels Jun 25, 2021
@iliapolo iliapolo removed their assignment Jun 27, 2021
@peterwoodworth peterwoodworth added p2 and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jun 28, 2021
@kellertk kellertk added p1 and removed p2 labels Sep 17, 2021
@ryparker ryparker linked a pull request Sep 22, 2021 that will close this issue
@mergify mergify bot closed this as completed in #16609 Sep 23, 2021
mergify bot pushed a commit that referenced this issue Sep 23, 2021
## Summary

Currently when a user wants to route all of the EKS lambda's SDK requests through a proxy then they are [instructed to configure an env var named `HTTP_PROXY` or `http_proxy`](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler).

e.g.
```ts
const cluster = new eks.Cluster(this, 'hello-eks', {
  version: eks.KubernetesVersion.V1_21,
  clusterHandlerEnvironment: {
    'http_proxy': 'http://proxy.myproxy.com'
  }
});
```

However the JS SDK [requires further configuration to enable proxy support](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-proxies.html).

This PR:
- Adds a `package.json` with the dependency 'proxy-agent' to the `cluster-resource-handler/` lambda bundle
- Uses `NodeJSFunction` to install lambda dependencies and bundle.
- Adds a condition that checks the environment for `HTTP_PROXY` or `http_proxy` values. If present then configures the aws-sdk to use that proxy (using `proxy-agent`).

Note: I placed the `proxy-agent` in the `devDependencies` of `package.json`. If the dependency is placed in the `dependencies` section then the CDK builder [throws an error: `NPM Package cluster-resources-handler inside jsii package '@aws-cdk/aws-eks', can only have devDependencies`](https://github.com/aws/aws-cdk/blob/7dae114b7aac46321b8d8572e6837428b4c633b2/tools/pkglint/lib/rules.ts#L1332)

Fixes: SIM D29159517, #12469

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

mergify bot pushed a commit that referenced this issue Sep 28, 2021
## Summary

Currently when a user wants to route all of the EKS lambda's `aws-sdk-js` requests through a proxy then they are [instructed to configure an env var named `HTTP_PROXY` or `http_proxy`](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler).

e.g.
```ts
const cluster = new eks.Cluster(this, 'hello-eks', {
  version: eks.KubernetesVersion.V1_21,
  clusterHandlerEnvironment: {
    'http_proxy': 'http://proxy.myproxy.com'
  }
});
```

However the JS SDK [requires further configuration to enable proxy support](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-proxies.html).

This PR:

**The below changes have been refactored to avoid use of `NodeJsFunction`. See the PR comments below for [reasoning](#16657 (comment)) and [updated changes](#16657 (comment)
- ~~Adds a `package.json` with the dependency ['http-proxy-agent'](https://github.com/TooTallNate/node-http-proxy-agent) to the `cluster-resource-handler/` lambda bundle~~
- ~~Uses `NodeJSFunction` to install lambda dependencies and bundle.~~
- Adds a condition that checks the environment for `HTTP_PROXY` or `http_proxy` values. If present then configures the aws-sdk to use that proxy (using `http-proxy-agent`).

~~Note: I placed the `http-proxy-agent` in the `devDependencies` of `package.json`. If the dependency is placed in the `dependencies` section then the CDK builder [throws an error: `NPM Package cluster-resources-handler inside jsii package '@aws-cdk/aws-eks', can only have devDependencies`](https://github.com/aws/aws-cdk/blob/7dae114b7aac46321b8d8572e6837428b4c633b2/tools/pkglint/lib/rules.ts#L1332)~~

Fixes: SIM D29159517, #12469

Tested this using squid proxy on an ec2 instance within the same VPC as the EKS cluster.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
mergify bot pushed a commit that referenced this issue Nov 5, 2021
…ll cluster handler functions (#17200)

## Summary

This PR is intended for CDK EKS users who require all traffic to be routed through a proxy. Currently if a user does not allow internet connections to the VPC without going through a proxy, then deploying an EKS cluster will result in a timeout error:

```sh
Received response status [FAILED] from custom resource. Message returned: Error: 2021-10-20T14:20:47.028Z d86e3ef4-45ce-4130-988f-c4663f7f8c80 Task timed out after 60.06 seconds
```
Fixes: #12469, SIM D29159517
Related to but does not resolve: `#12171

## ⚙️ Changes

_Expand each list item for additional details._

<details>
<summary><strong>Corrected "Cluster Handler" docs to clarify that 2 lambdas are created (<code>onEventHandler</code>, <code>isCompleteHandler</code>)</strong></summary>
<br />

Our docs [currently describe the "Cluster Handler" as one Lambda function that interacts with the EKS API](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler). However this is not accurate. The "Cluster Handler" actually creates [two Lambdas](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-provider.ts#L69-L96) for the Custom Resource, `onEventHandler` and `isCompleteHandler`, both interact with the AWS API.

</details>

<details>
<summary><strong>Passes the <code>clusterHandlerEnvironment</code> to both Cluster Handler Lambdas</strong></summary>
<br />

The `clusterHandlerEnvironment` is the [recommended method](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler) of passing a proxy url (i.g. `http_proxy: 'http://my-proxy.com:3128'`) to the Cluster Handler. 

Currently the `clusterHandlerEnvironment` is only passed to the Cluster Handler's `onEventHandler` Lambda. [The `onEventHandler` was believed to be the only Cluster Handler Lambda that interacts with the AWS EKS API](#12469 (comment)), however this is not entirely true. Both the `onEventHandler` and `isCompleteHandler` call the AWS EKS API.

Following the execution process of `isCompleteHandler` when creating an EKS cluster:

1. [`index.isComplete()` (this is the Lambda handler)](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/index.ts#L48)
2. [`common.isComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/common.ts#L59)
3. [`cluster.isCreateComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L56)
4. [`cluster.isActive()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L196)
5. [Request to EKS API](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L198) (results in timeout because proxy is not used)

This change allows the user to pass proxy urls as environment variables to **both** Lambdas using `clusterHandlerEnvironment`.

</details>

<details>
<summary><strong>Renames the prop <code>onEventLayer</code> -> <code>proxyAgentLayer</code>, and provides the layer to both Cluster Handler Lambdas</strong></summary>
<br />

The proxy-agent layer is now used in both `onEventHandler` and `isCompleteHandler` lambdas in order to support proxy configurations. Because of this change, i've deprecated the original `onEventLayer` and created a new prop `proxyAgentLayer` since we will now be passing this prop into more than just the `onEventHandler` Lambda.

The `onEventLayer` prop was introduced [a few weeks ago (sept 24)](#16657) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent. 

This prop follows the [same user customization we allow with the kubectl handler](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer). 

Another suitable name for this prop could have been `clusterHandlerLayer` but I chose `proxyAgentLayer` because it represents **what** the layer is used for, instead of describing **where** it's used. This also follows the convention of the pre-existing [`kubectlLayer` prop](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer).

</details>

<details>
<summary><strong>Adds the EKS cluster prop <code>clusterHandlerSecurityGroup</code></strong></summary>
<br />

If a proxy address is provided to the Cluster Handler Lambdas, but the proxy instance is not open to the world, then the dynamic IPs of the Cluster Handler Lambdas will be denied access. To solve this, i've implemented a new Cluster prop `clusterHandlerSecurityGroup`. This `clusterHandlerSecurityGroup` prop will allow the user to pass a Security Group to both Lambda functions and the Custom Resource provider. 

This is very similar to how we [already allow users to pass Security Groups to the Kubectl Handler](https://github.com/aws/aws-cdk/blob/7f194000697b85deb410ae0d7f7d4ac3c2654bcc/packages/%40aws-cdk/aws-eks/lib/kubectl-provider.ts#L83)

</details>

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
eladb pushed a commit to cdklabs/decdk that referenced this issue Jan 18, 2022
## Summary

Currently when a user wants to route all of the EKS lambda's `aws-sdk-js` requests through a proxy then they are [instructed to configure an env var named `HTTP_PROXY` or `http_proxy`](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler).

e.g.
```ts
const cluster = new eks.Cluster(this, 'hello-eks', {
  version: eks.KubernetesVersion.V1_21,
  clusterHandlerEnvironment: {
    'http_proxy': 'http://proxy.myproxy.com'
  }
});
```

However the JS SDK [requires further configuration to enable proxy support](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-proxies.html).

This PR:

**The below changes have been refactored to avoid use of `NodeJsFunction`. See the PR comments below for [reasoning](aws/aws-cdk#16657 (comment)) and [updated changes](aws/aws-cdk#16657 (comment)
- ~~Adds a `package.json` with the dependency ['http-proxy-agent'](https://github.com/TooTallNate/node-http-proxy-agent) to the `cluster-resource-handler/` lambda bundle~~
- ~~Uses `NodeJSFunction` to install lambda dependencies and bundle.~~
- Adds a condition that checks the environment for `HTTP_PROXY` or `http_proxy` values. If present then configures the aws-sdk to use that proxy (using `http-proxy-agent`).

~~Note: I placed the `http-proxy-agent` in the `devDependencies` of `package.json`. If the dependency is placed in the `dependencies` section then the CDK builder [throws an error: `NPM Package cluster-resources-handler inside jsii package '@aws-cdk/aws-eks', can only have devDependencies`](https://github.com/aws/aws-cdk/blob/7dae114b7aac46321b8d8572e6837428b4c633b2/tools/pkglint/lib/rules.ts#L1332)~~

Fixes: SIM D29159517, aws/aws-cdk#12469

Tested this using squid proxy on an ec2 instance within the same VPC as the EKS cluster.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
## Summary

Currently when a user wants to route all of the EKS lambda's `aws-sdk-js` requests through a proxy then they are [instructed to configure an env var named `HTTP_PROXY` or `http_proxy`](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler).

e.g.
```ts
const cluster = new eks.Cluster(this, 'hello-eks', {
  version: eks.KubernetesVersion.V1_21,
  clusterHandlerEnvironment: {
    'http_proxy': 'http://proxy.myproxy.com'
  }
});
```

However the JS SDK [requires further configuration to enable proxy support](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-proxies.html).

This PR:

**The below changes have been refactored to avoid use of `NodeJsFunction`. See the PR comments below for [reasoning](aws#16657 (comment)) and [updated changes](aws#16657 (comment)
- ~~Adds a `package.json` with the dependency ['http-proxy-agent'](https://github.com/TooTallNate/node-http-proxy-agent) to the `cluster-resource-handler/` lambda bundle~~
- ~~Uses `NodeJSFunction` to install lambda dependencies and bundle.~~
- Adds a condition that checks the environment for `HTTP_PROXY` or `http_proxy` values. If present then configures the aws-sdk to use that proxy (using `http-proxy-agent`).

~~Note: I placed the `http-proxy-agent` in the `devDependencies` of `package.json`. If the dependency is placed in the `dependencies` section then the CDK builder [throws an error: `NPM Package cluster-resources-handler inside jsii package '@aws-cdk/aws-eks', can only have devDependencies`](https://github.com/aws/aws-cdk/blob/7dae114b7aac46321b8d8572e6837428b4c633b2/tools/pkglint/lib/rules.ts#L1332)~~

Fixes: SIM D29159517, aws#12469

Tested this using squid proxy on an ec2 instance within the same VPC as the EKS cluster.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…ll cluster handler functions (aws#17200)

## Summary

This PR is intended for CDK EKS users who require all traffic to be routed through a proxy. Currently if a user does not allow internet connections to the VPC without going through a proxy, then deploying an EKS cluster will result in a timeout error:

```sh
Received response status [FAILED] from custom resource. Message returned: Error: 2021-10-20T14:20:47.028Z d86e3ef4-45ce-4130-988f-c4663f7f8c80 Task timed out after 60.06 seconds
```
Fixes: aws#12469, SIM D29159517
Related to but does not resolve: `aws#12171

## ⚙️ Changes

_Expand each list item for additional details._

<details>
<summary><strong>Corrected "Cluster Handler" docs to clarify that 2 lambdas are created (<code>onEventHandler</code>, <code>isCompleteHandler</code>)</strong></summary>
<br />

Our docs [currently describe the "Cluster Handler" as one Lambda function that interacts with the EKS API](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler). However this is not accurate. The "Cluster Handler" actually creates [two Lambdas](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-provider.ts#L69-L96) for the Custom Resource, `onEventHandler` and `isCompleteHandler`, both interact with the AWS API.

</details>

<details>
<summary><strong>Passes the <code>clusterHandlerEnvironment</code> to both Cluster Handler Lambdas</strong></summary>
<br />

The `clusterHandlerEnvironment` is the [recommended method](https://docs.aws.amazon.com/cdk/api/latest/docs/aws-eks-readme.html#cluster-handler) of passing a proxy url (i.g. `http_proxy: 'http://my-proxy.com:3128'`) to the Cluster Handler. 

Currently the `clusterHandlerEnvironment` is only passed to the Cluster Handler's `onEventHandler` Lambda. [The `onEventHandler` was believed to be the only Cluster Handler Lambda that interacts with the AWS EKS API](aws#12469 (comment)), however this is not entirely true. Both the `onEventHandler` and `isCompleteHandler` call the AWS EKS API.

Following the execution process of `isCompleteHandler` when creating an EKS cluster:

1. [`index.isComplete()` (this is the Lambda handler)](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/index.ts#L48)
2. [`common.isComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/common.ts#L59)
3. [`cluster.isCreateComplete()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L56)
4. [`cluster.isActive()`](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L196)
5. [Request to EKS API](https://github.com/aws/aws-cdk/blob/0cabb9f2d2f50c03337cd6f35bf47fc54ada3a21/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/cluster.ts#L198) (results in timeout because proxy is not used)

This change allows the user to pass proxy urls as environment variables to **both** Lambdas using `clusterHandlerEnvironment`.

</details>

<details>
<summary><strong>Renames the prop <code>onEventLayer</code> -> <code>proxyAgentLayer</code>, and provides the layer to both Cluster Handler Lambdas</strong></summary>
<br />

The proxy-agent layer is now used in both `onEventHandler` and `isCompleteHandler` lambdas in order to support proxy configurations. Because of this change, i've deprecated the original `onEventLayer` and created a new prop `proxyAgentLayer` since we will now be passing this prop into more than just the `onEventHandler` Lambda.

The `onEventLayer` prop was introduced [a few weeks ago (sept 24)](aws#16657) so it should not impact many users (if any). The prop would only be used if the user wishes to bundle the layer themselves with a custom proxy agent. 

This prop follows the [same user customization we allow with the kubectl handler](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer). 

Another suitable name for this prop could have been `clusterHandlerLayer` but I chose `proxyAgentLayer` because it represents **what** the layer is used for, instead of describing **where** it's used. This also follows the convention of the pre-existing [`kubectlLayer` prop](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-eks.Cluster.html#kubectllayer).

</details>

<details>
<summary><strong>Adds the EKS cluster prop <code>clusterHandlerSecurityGroup</code></strong></summary>
<br />

If a proxy address is provided to the Cluster Handler Lambdas, but the proxy instance is not open to the world, then the dynamic IPs of the Cluster Handler Lambdas will be denied access. To solve this, i've implemented a new Cluster prop `clusterHandlerSecurityGroup`. This `clusterHandlerSecurityGroup` prop will allow the user to pass a Security Group to both Lambda functions and the Custom Resource provider. 

This is very similar to how we [already allow users to pass Security Groups to the Kubectl Handler](https://github.com/aws/aws-cdk/blob/7f194000697b85deb410ae0d7f7d4ac3c2654bcc/packages/%40aws-cdk/aws-eks/lib/kubectl-provider.ts#L83)

</details>

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. p1
Projects
None yet
5 participants