Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kops on a disconnected environment #16453

Open
dormullor opened this issue Apr 5, 2024 · 5 comments
Open

Kops on a disconnected environment #16453

dormullor opened this issue Apr 5, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@dormullor
Copy link

dormullor commented Apr 5, 2024

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.26.3

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.26.4

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Manage your own security group and allow egress traffic only for internal communication ( block 0.0.0.0/0 and allow vpc cidr)

 kops update cluster **** --yes --lifecycle-overrides SecurityGroup=Ignore,SecurityGroupRule=Ignore

5. What happened after the commands executed?
exceed timeout

6. What did you expect to happen?
When ssh into the master node, the nodeup process exit's with the following error :

Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.305209    1035 s3context.go:192] unable to get bucket location from region "us-east-1"; scanning all regions: RequestError: send request failed
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: caused by: Get "https://s3.dualstack.us-east-1.amazonaws.com/r*****?location=": dial tcp 52.217.230.168:443: i/o timeout
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374846    1035 s3context.go:298] Querying S3 for bucket location for ****
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374904    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-3"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374911    1035 s3context.go:303] Doing GetBucketLocation in "us-west-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.374930    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.375066    1035 s3context.go:303] Doing GetBucketLocation in "ca-central-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378346    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-3"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378520    1035 s3context.go:303] Doing GetBucketLocation in "us-east-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378718    1035 s3context.go:303] Doing GetBucketLocation in "eu-south-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378767    1035 s3context.go:303] Doing GetBucketLocation in "us-west-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378885    1035 s3context.go:303] Doing GetBucketLocation in "eu-central-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378406    1035 s3context.go:303] Doing GetBucketLocation in "ap-south-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378418    1035 s3context.go:303] Doing GetBucketLocation in "eu-north-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378439    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378454    1035 s3context.go:303] Doing GetBucketLocation in "ap-northeast-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378472    1035 s3context.go:303] Doing GetBucketLocation in "us-east-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378481    1035 s3context.go:303] Doing GetBucketLocation in "sa-east-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378490    1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-1"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.378498    1035 s3context.go:303] Doing GetBucketLocation in "ap-southeast-2"
Apr  5 08:03:24 ip-172-20-10-182 nodeup[1035]: I0405 08:03:24.379255    1035 s3context.go:303] Doing GetBucketLocation in "eu-west-2"
Apr  5 08:03:29 ip-172-20-10-182 nodeup[1035]: W0405 08:03:29.375004    1035 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://****/******/cluster-completed.spec": Could not retrieve location for AWS bucket *****

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2024-04-05T07:23:05Z"
  name: ********
spec:
  additionalPolicies: {}
  api:
    loadBalancer:
      class: Classic
      securityGroupOverride: sg-*****
      type: Public
  assets:
    containerRegistry: *******.dkr.ecr.us-east-1.amazonaws.com/kops
    fileRepository: https://s3.us-east-1.amazonaws.com/******
  authorization:
    rbac: {}
  cloudProvider: aws
  configBase: s3://*****/******
  containerd:
    configOverride: |2
            version = 2
            [plugins]
              [plugins."io.containerd.grpc.v1.cri"]
                sandbox_image = "*****.dkr.ecr.us-east-1.amazonaws.com/kops/pause:3.9@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097"
              [plugins."io.containerd.grpc.v1.cri".registry.mirrors."*******.dkr.ecr.us-east-1.amazonaws.com"]
                endpoint = ["https://******.dkr.ecr.us-east-1.amazonaws.com"]
                [plugins."io.containerd.grpc.v1.cri".registry.configs."******.dkr.ecr.us-east-1.amazonaws.com".auth]
                  username = "AWS"
                  password = "******"
                [plugins."io.containerd.grpc.v1.cri".containerd]
                  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
                    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
                      runtime_type = "io.containerd.runc.v2"
                      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                        SystemdCgroup = true
  dnsZone: *****
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-1
      name: master-1
    name: main
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: true
  kubelet:
    anonymousAuth: false
  kubernetesVersion: 1.26.4
  masterPublicName: api.*****
  networkCIDR: 172.20.0.0/16
  networkID: vpc-*****
  networking:
    calico: {}
  nodeTerminationHandler:
    enableSpotInterruptionDraining: false
    enabled: false
  nonMasqueradeCIDR: 100.64.0.0/10
  sshKeyName: *****
  subnets:
  - cidr: 172.20.10.0/24
    id: subnet-*****
    name: us-east-1b
    type: Public
    zone: us-east-1b
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-05T07:23:08Z"
  labels:
    kops.k8s.io/cluster: *****
  name: master-1
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      kops.k8s.io/kops-controller-pki: ""
      node-role.kubernetes.io/control-plane: ""
      node.kubernetes.io/exclude-from-external-load-balancers: ""
    taints:
    - node-role.kubernetes.io/control-plane=:NoSchedule
  machineType: m5.xlarge
  manager: CloudGroup
  maxSize: 1
  minSize: 1
  role: Master
  securityGroupOverride: ******
  subnets:
  - us-east-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-05T07:23:08Z"
  labels:
    kops.k8s.io/cluster: *****
  name: node
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240126
  kubelet:
    anonymousAuth: false
    nodeLabels:
      node-role.kubernetes.io/node: ""
  machineType: c6i.2xlarge
  manager: CloudGroup
  maxSize: 2
  minSize: 2
  nodeLabels:
    nvidia.com/gpu.deploy.dcgm-exporter: "true"
    nvidia.com/gpu.deploy.device-plugin: "true"
  packages:
  - nfs-common
  role: Node
  securityGroupOverride: sg-*****
  subnets:
  - us-east-1b

I have created a VPC endpoint for S3 with an Interface type, but all of the DNS records do not include the dualstack.

*.vpce-*****.s3.us-east-1.vpce.amazonaws.com
*.vpce-*****-us-east-1b.s3.us-east-1.vpce.amazonaws.com
s3.us-east-1.amazonaws.com
*.s3.us-east-1.amazonaws.com
*.s3-accesspoint.us-east-1.amazonaws.com
*.s3-control.us-east-1.amazonaws.com
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2024
@zetaab
Copy link
Member

zetaab commented Apr 6, 2024

Its not clear for me how this is kops bug?

@dormullor
Copy link
Author

There is no way to setup kops for disconnected env... i can open a feature request if you want to

@zetaab
Copy link
Member

zetaab commented Apr 7, 2024

there is way to install kops in disconnected environment. However, you must copy all assets first. It can be installed without any internet connectivity, you just need to have connectivity to single object storage.

https://kops.sigs.k8s.io/operations/asset-repository/

also you need to use kops channel: none (I cannot see this in your spec at all.. so its not none in that case. Default value is stable)

@zetaab
Copy link
Member

zetaab commented Apr 7, 2024

@dormullor
Copy link
Author

dormullor commented Apr 12, 2024

@zetaab Although I have added all assets files and containers into s3 and ECR and configured kops to use it, when looking at the nodeup logs I can see an error when trying to retrieve the s3 cluster-completed.spec even if I configure a s3 vpc endpoint.

That's because kops using the s3://bucket-name schema and the s3 vpc endpoint use the full s3 DNS name (bucket-name.s3.us-east-1.amazonaws.com).

As a result, kops cannot be used in a disconnected environment on AWS

W0412 06:49:07.558115    1040 main.go:133] got error running nodeup (will retry in 30s): error loading Cluster "s3://kops-state-****/*****/cluster-completed.spec": file does not exist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants