CrashLoopBackOff due to EC2MetadataError #573

janavenkat · 2022-12-14T16:04:32Z

Related issue #455

For security reasons I changed the EC2 instance metadata hop limit from 2 to 1, this causing ingress controller crashes because of getting access denied from aws instance metadata endpoint.

By reading the repo document

This ingress controller uses the EC2 instance metadata of the worker node where it's currently running to find the additional details about the cluster provisioned by Kubernetes on top of AWS. This information is used to manage AWS resources for each ingress objects of the cluster.

Am using EKS cluster and ingress controller setup with IAM role for service account. Is there any way to disable the ingress controller not to request the EC2 instance metadata?

szuecs · 2022-12-19T21:39:31Z

@janavenkat what is the security impact by having the hop limit to 2?
For me this sounds not really relevant and likely won't fix.

janavenkat · 2022-12-27T10:56:06Z

@szuecs thank you for the response.

https://youtu.be/_VcmdlV6xaY?t=875 this is the recommendation from AWS to set hop limit to 1
controller needs to connect instance metadata? if I understand the docs correctly

This ingress controller uses the EC2 instance metadata of the worker node where it's currently running to find the additional details about the cluster provisioned by Kubernetes on top of AWS. This information is used to manage AWS resources for each ingress objects of the cluster.

I didn't provisioned cluster by using Kubernetes on top of AWS.

jbilliau-rcd · 2022-12-30T17:28:53Z

We having the exact same issue, but on only one cluster out of 80+, not sure why. Debug logs:

time="2022-12-30T17:23:58Z" level=info msg="starting /kube-ingress-aws-controller v0.14.0"
--
Fri, Dec 30 2022 12:23:58 pm | time="2022-12-30T17:23:58Z" level=debug msg=aws.NewAdapter
Fri, Dec 30 2022 12:23:58 pm | time="2022-12-30T17:23:58Z" level=debug msg=aws.ec2metadata.GetMetadata
Fri, Dec 30 2022 12:23:58 pm | 2022/12/30 17:23:58 DEBUG: Request ec2metadata/GetToken Details:
Fri, Dec 30 2022 12:23:58 pm | ---[ REQUEST POST-SIGN ]-----------------------------
Fri, Dec 30 2022 12:23:58 pm | PUT /latest/api/token HTTP/1.1
Fri, Dec 30 2022 12:23:58 pm | Host: 169.254.169.254
Fri, Dec 30 2022 12:23:58 pm | User-Agent: aws-sdk-go/1.44.102 (go1.19.3; linux; amd64)
Fri, Dec 30 2022 12:23:58 pm | Content-Length: 0
Fri, Dec 30 2022 12:23:58 pm | X-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600
Fri, Dec 30 2022 12:23:58 pm | Accept-Encoding: gzip
Fri, Dec 30 2022 12:23:58 pm |  
Fri, Dec 30 2022 12:23:58 pm |  
Fri, Dec 30 2022 12:23:58 pm | -----------------------------------------------------
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Send Request ec2metadata/GetToken failed, attempt 0/3, error RequestError: send request failed
Fri, Dec 30 2022 12:26:02 pm | caused by: Put "http://169.254.169.254/latest/api/token": read tcp 10.150.25.8:42566->169.254.169.254:80: read: connection reset by peer
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Request ec2metadata/GetMetadata Details:
Fri, Dec 30 2022 12:26:02 pm | ---[ REQUEST POST-SIGN ]-----------------------------
Fri, Dec 30 2022 12:26:02 pm | GET /latest/meta-data/instance-id HTTP/1.1
Fri, Dec 30 2022 12:26:02 pm | Host: 169.254.169.254
Fri, Dec 30 2022 12:26:02 pm | User-Agent: aws-sdk-go/1.44.102 (go1.19.3; linux; amd64)
Fri, Dec 30 2022 12:26:02 pm | Accept-Encoding: gzip
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm | -----------------------------------------------------
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Response ec2metadata/GetMetadata Details:
Fri, Dec 30 2022 12:26:02 pm | ---[ RESPONSE ]--------------------------------------
Fri, Dec 30 2022 12:26:02 pm | HTTP/1.1 401 Unauthorized
Fri, Dec 30 2022 12:26:02 pm | Connection: close
Fri, Dec 30 2022 12:26:02 pm | Content-Type: text/plain
Fri, Dec 30 2022 12:26:02 pm | Date: Fri, 30 Dec 2022 17:26:02 GMT
Fri, Dec 30 2022 12:26:02 pm | Server: EC2ws
Fri, Dec 30 2022 12:26:02 pm | Content-Length: 0
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm | -----------------------------------------------------
Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Validate Response ec2metadata/GetMetadata failed, attempt 0/3, error EC2MetadataError: failed to make EC2Metadata request
Fri, Dec 30 2022 12:26:02 pm |  
Fri, Dec 30 2022 12:26:02 pm | status code: 401, request id:
Fri, Dec 30 2022 12:26:02 pm | time="2022-12-30T17:26:02Z" level=fatal msg="EC2MetadataError: failed to make EC2Metadata request\n\n\tstatus code: 401, request id: "

We using an explicit IAM role though, so not sure why it needs to connect to ec2 instance metadata....doesnt it only need to do that when using the worker node IAM role, in cases where you ARENT using an explicit, controller-only role via OIDC?

szuecs · 2023-01-05T20:00:23Z

@jbilliau-rcd we use the metadata to auto-detect the vpcId and clusterId, call stack:

https://github.com/zalando-incubator/kube-ingress-aws-controller/blob/master/aws/adapter.go#L812

kube-ingress-aws-controller/aws/adapter.go

Line 243 in 1718cd1

adapter.manifest, err = buildManifest(adapter, clusterID, vpcID)

kube-ingress-aws-controller/controller.go

Line 300 in 1718cd1

    
           awsAdapter, err = aws.NewAdapter(clusterID, controllerID, vpcID, debugFlag, disableInstrumentedHttpClient)

kube-ingress-aws-controller/controller.go

Line 284 in 1718cd1

if err = loadSettings(); err != nil {

kube-ingress-aws-controller/controller.go

Line 96 in 1718cd1

func loadSettings() error {

you can pass these flags to omit auto detection:

kube-ingress-aws-controller/controller.go

Lines 149 to 151 in 1718cd1

    
           kingpin.Flag("cluster-id", "ID of the Kubernetes cluster used to lookup cluster related resources tagged with `kubernetes.io/cluster/<cluster-id>` tags. Auto discovered from the EC2 instance where the controller is running if not specified."). 
        
           	StringVar(&clusterID) 
        
           kingpin.Flag("vpc-id", "VPC ID for where the cluster is running. Used to lookup relevant subnets. Auto discovered from the EC2 instance where the controller is running if not specified.").

szuecs · 2024-05-07T16:06:07Z

@jbilliau-rcd did this happen again to you?

You showed a connection reset by peer, which likely means some AWS internal issue happened at the time.

Fri, Dec 30 2022 12:26:02 pm | 2022/12/30 17:26:02 DEBUG: Send Request ec2metadata/GetToken failed, attempt 0/3, error RequestError: send request failed
Fri, Dec 30 2022 12:26:02 pm | caused by: Put "http://169.254.169.254/latest/api/token": read tcp 10.150.25.8:42566->169.254.169.254:80: read: connection reset by peer

jbilliau-rcd · 2024-05-07T17:37:02Z

Hmmm nope, mustve been transient, we run this controller on 170 clusters and all seem healthy.

szuecs · 2024-05-07T19:29:36Z

Most likely we can not do anything here

szuecs closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrashLoopBackOff due to EC2MetadataError #573

CrashLoopBackOff due to EC2MetadataError #573

janavenkat commented Dec 14, 2022

szuecs commented Dec 19, 2022

janavenkat commented Dec 27, 2022

jbilliau-rcd commented Dec 30, 2022

szuecs commented Jan 5, 2023

szuecs commented May 7, 2024

jbilliau-rcd commented May 7, 2024

szuecs commented May 7, 2024

CrashLoopBackOff due to EC2MetadataError #573

CrashLoopBackOff due to EC2MetadataError #573

Comments

janavenkat commented Dec 14, 2022

szuecs commented Dec 19, 2022

janavenkat commented Dec 27, 2022

jbilliau-rcd commented Dec 30, 2022

szuecs commented Jan 5, 2023

szuecs commented May 7, 2024

jbilliau-rcd commented May 7, 2024

szuecs commented May 7, 2024