Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod crashes when setting HCLOUD_NETWORK and network: false #630

Open
redimp opened this issue Apr 4, 2024 · 9 comments
Open

Pod crashes when setting HCLOUD_NETWORK and network: false #630

redimp opened this issue Apr 4, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@redimp
Copy link

redimp commented Apr 4, 2024

TL;DR

Despite of network: false the hcloud-cloud-controller-manager tries to start node-route-controller. The node-route-controller fails due to the missing CIDR.

Expected behavior

hcloud-cloud-controller-manager starting up and configuring the nodes metadata.

Observed behavior

hcloud-cloud-controller-manager pod crashes with

E0404 08:31:13.192689       1 controllermanager.go:321] Error starting "node-route-controller"
F0404 08:31:13.192717       1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil> (invalid CIDR address:

Minimal working example

command:

helm upgrade --install hccm \
    --version 1.19.0 \
    -n kube-system \
    -f hccm-values.yaml \
    hcloud/hcloud-cloud-controller-manager

hccm-values.yaml:

networking:
  enabled: false
  network:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

Remark: The same happens when configuring

env:
# ...
  HCLOUD_NETWORK:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

as described in the README.md.

Log output

Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0404 08:31:09.676489       1 serving.go:348] Generated self-signed cert in-memory
W0404 08:31:09.676594       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0404 08:31:10.690398       1 metrics.go:69] Starting metrics server at :8233
I0404 08:31:13.018003       1 cloud.go:123] Hetzner Cloud k8s cloud controller v1.19.0 started
W0404 08:31:13.018036       1 main.go:75] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0404 08:31:13.018060       1 controllermanager.go:168] Version: v0.0.0-master+$Format:%H$
I0404 08:31:13.024573       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0404 08:31:13.024619       1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0404 08:31:13.024657       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0404 08:31:13.024681       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 08:31:13.024898       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0404 08:31:13.025064       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0404 08:31:13.025905       1 secure_serving.go:213] Serving securely on [::]:10258
I0404 08:31:13.027380       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0404 08:31:13.051293       1 controllermanager.go:524] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery: metrics.k8s.io/v1beta1
I0404 08:31:13.051766       1 controllermanager.go:337] Started "cloud-node-controller"
I0404 08:31:13.051958       1 controllermanager.go:337] Started "cloud-node-lifecycle-controller"
I0404 08:31:13.052000       1 node_controller.go:165] Sending events to api server.
I0404 08:31:13.052081       1 node_controller.go:174] Waiting for informer caches to sync
I0404 08:31:13.052165       1 node_lifecycle_controller.go:113] Sending events to api server
I0404 08:31:13.052269       1 controllermanager.go:337] Started "service-lb-controller"
I0404 08:31:13.052355       1 controller.go:231] Starting service controller
I0404 08:31:13.052382       1 shared_informer.go:311] Waiting for caches to sync for service
I0404 08:31:13.125572       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 08:31:13.125587       1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0404 08:31:13.125821       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0404 08:31:13.192689       1 controllermanager.go:321] Error starting "node-route-controller"
F0404 08:31:13.192717       1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil> (invalid CIDR address: )

Additional information

  • HelmChart version 1.19.0
  • k3s version v1.29.2+k3s1 running with --kubelet-arg="cloud-provider=external"
@redimp redimp added the bug Something isn't working label Apr 4, 2024
@redimp
Copy link
Author

redimp commented Apr 4, 2024

Without setting HCLOUD_NETWORK the hcloud-cloud-controller-manager is unable to receive the node adress:

I0404 08:26:48.044310       1 node_controller.go:431] Initializing node k3s-controlplane1 with cloud provider
E0404 08:26:48.247486       1 node_controller.go:240] error syncing 'k3s-controlplane1': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane1" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.2, requeuing
I0404 08:26:48.247561       1 node_controller.go:431] Initializing node k3s-controlplane2 with cloud provider
E0404 08:26:48.688221       1 node_controller.go:240] error syncing 'k3s-controlplane2': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane2" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.3, requeuing
I0404 08:26:48.688270       1 node_controller.go:431] Initializing node k3s-controlplane3 with cloud provider
E0404 08:26:48.954460       1 node_controller.go:240] error syncing 'k3s-controlplane3': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane3" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.4, requeuing

And for the sake of completeness, with hccm-values.yaml

---
networking:
  enabled: true
  clusterCIDR: 10.42.0.0/16
  network:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

the hcloud-cloud-controller-manager starts and adds the metadata as expected.

This is not a solution for us, since
a) we don't want the hccm to manage the routes and
b) we want to use robots: true.

@apricote
Copy link
Member

apricote commented Apr 4, 2024

Just to clarify, you mentioned "HelmChart version 3.3.0" in the original issue. We do not have a helm chart with that version, the current version is 1.19.0.

@redimp
Copy link
Author

redimp commented Apr 4, 2024

Sorry, that was a copy n paste error. I'm using 1.19.0 as in the helm command line.

@apricote
Copy link
Member

apricote commented Apr 4, 2024

I am unable to reproduce this with hccm 1.19.0 and the values file you provided.

While trying to reproduce I noticed that you also need to provide the k3s flag --disable-cloud-controller, as otherwise k3s will start its own cloud-controller-manager that conflicts with hccm. You will then see these error messages:

Error getting instance metadata for node addresses: hcloud/instancesv2.InstanceMetadata: failed to convert provider id to server id: providerID does not have one of the the expected prefixes (hcloud://, hrobot://, hcloud://bm-): k3s://hetzner-k3s

I installed k3s with:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--kubelet-arg=cloud-provider=external --disable-cloud-controller" INSTALL_K3S_VERSION="v1.29.2+k3s1" sh -

Then created a secret for hccm:

kubectl create secret generic -n kube-system hcloud --from-literal=token=$HCLOUD_TOKEN --from-literal=network=hetzner-k3s

And installed the chart the same way you did with the first hccm-values.yaml in the original description.

Could you post the output of the two following commands here?

  • kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
  • kubectl get node k3s-controlplane1 -o yaml

@redimp
Copy link
Author

redimp commented Apr 4, 2024

My bad. I must been lost in values.

The described behaviour

E0404 11:02:01.187593       1 controllermanager.go:321] Error starting "node-route-controller"
F0404 11:02:01.187624       1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil>
 (invalid CIDR address: )

happens with the values.yaml

env:
  HCLOUD_TOKEN:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: token
  HCLOUD_NETWORK:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

networking:
  enabled: false

robot:
  enabled: false

Note: k3s is running with --disable-cloud-controller.

kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: hccm
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2024-04-04T11:19:10Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: hcloud-cloud-controller-manager
  namespace: kube-system
  resourceVersion: "3440"
  uid: 62e7b715-e99d-4878-8133-d01cd17a95be
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app.kubernetes.io/instance: hccm
      app.kubernetes.io/name: hcloud-cloud-controller-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: hccm
        app.kubernetes.io/name: hcloud-cloud-controller-manager
    spec:
      containers:
      - command:
        - /bin/hcloud-cloud-controller-manager
        - --allow-untagged-cloud
        - --cloud-provider=hcloud
        - --route-reconciliation-period=30s
        - --webhook-secure-port=0
        - --leader-elect=false
        env:
        - name: HCLOUD_NETWORK
          valueFrom:
            secretKeyRef:
              key: network
              name: hcloud
        - name: HCLOUD_TOKEN
          valueFrom:
            secretKeyRef:
              key: token
              name: hcloud
        - name: ROBOT_PASSWORD
          valueFrom:
            secretKeyRef:
              key: robot-password
              name: hcloud
              optional: true
        - name: ROBOT_USER
          valueFrom:
            secretKeyRef:
              key: robot-user
              name: hcloud
              optional: true
        image: hetznercloud/hcloud-cloud-controller-manager:v1.19.0
        imagePullPolicy: IfNotPresent
        name: hcloud-cloud-controller-manager
        ports:
        - containerPort: 8233
          name: metrics
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: Default
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: hcloud-cloud-controller-manager
      serviceAccountName: hcloud-cloud-controller-manager
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
status:
  conditions:
  - lastTransitionTime: "2024-04-04T11:19:10Z"
    lastUpdateTime: "2024-04-04T11:19:11Z"
    message: ReplicaSet "hcloud-cloud-controller-manager-6f454fcfbf" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2024-04-04T11:19:19Z"
    lastUpdateTime: "2024-04-04T11:19:19Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

kubectl get node k3s-controlplane1 -o yaml

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.0.0.2
    etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-04T11:08:33Z"
    etcd.k3s.cattle.io/node-address: 10.0.0.2
    etcd.k3s.cattle.io/node-name: k3s-controlplane1-ba0bd5a4
    k3s.io/node-args: '["server","--data-dir","/var/lib/rancher/k3s","--disable","traefik","--disable","servicelb","--flannel-backend","none","--disable-network-policy","--embedded-registry","true","--write-kubeconfig-mode","0600","--tls-san","lbctrl.iquestria.cso.ninja","--disable-cloud-controller","--token","********","--tls-san","k3s-controlplane1","--tls-san","10.0.0.2","--node-ip","10.0.0.2","--node-external-ip","x.x.x.x","--kubelet-arg","cloud-provider=external"]'
    k3s.io/node-config-hash: QNU4YAKJZSOORINBMHYXXYIO754HSV5OGAWEWZC56NJR74RX56AQ====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/4344eae0657f7fc0c99af34fc51358389f500f18c9bb80f5a55c130de07565d2"}'
    node.alpha.kubernetes.io/ttl: "0"
    p2p.k3s.cattle.io/node-address: /ip4/10.0.0.2/tcp/5001/p2p/QmWjS45ca9RZuoMnavYUhNHH4wD7V4SXVHRhzcn1tCWNdi
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-04-04T11:07:10Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: arm64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: arm64
    kubernetes.io/hostname: k3s-controlplane1
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/etcd: "true"
    node-role.kubernetes.io/master: "true"
    p2p.k3s.cattle.io/enabled: "true"
  name: k3s-controlplane1
  resourceVersion: "4135"
  uid: c1b6d78b-55dc-47f8-9ba0-557b81a452a7
spec:
  podCIDR: 10.42.0.0/24
  podCIDRs:
  - 10.42.0.0/24
  taints:
  - effect: NoSchedule
    key: node.cloudprovider.kubernetes.io/uninitialized
    value: "true"
status:
  addresses:
  - address: 10.0.0.2
    type: InternalIP
  - address: k3s-controlplane1
    type: Hostname
  allocatable:
    cpu: "4"
    ephemeral-storage: "55192664021"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 7934528Ki
    pods: "110"
  capacity:
    cpu: "4"
    ephemeral-storage: 56735880Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 7934528Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2024-04-04T11:10:25Z"
    lastTransitionTime: "2024-04-04T11:10:25Z"
    message: Cilium is running on this node
    reason: CiliumIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2024-04-04T11:22:30Z"
    lastTransitionTime: "2024-04-04T11:07:22Z"
    message: Node is a voting member of the etcd cluster
    reason: MemberNotLearner
    status: "True"
    type: EtcdIsVoter
  - lastHeartbeatTime: "2024-04-04T11:20:46Z"
    lastTransitionTime: "2024-04-04T11:07:10Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-04-04T11:20:46Z"
    lastTransitionTime: "2024-04-04T11:07:10Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-04-04T11:20:46Z"
    lastTransitionTime: "2024-04-04T11:07:10Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-04-04T11:20:46Z"
    lastTransitionTime: "2024-04-04T11:10:20Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - quay.io/cilium/cilium@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746
    sizeBytes: 195832613
  - names:
    - quay.io/cilium/operator-generic@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088
    sizeBytes: 24175419
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 253243
  nodeInfo:
    architecture: arm64
    bootID: b44ffa8e-82e2-4740-b6ab-bf53631f8310
    containerRuntimeVersion: containerd://1.7.11-k3s2
    kernelVersion: 6.1.0-18-arm64
    kubeProxyVersion: v1.29.2+k3s1
    kubeletVersion: v1.29.2+k3s1
    machineID: e7c1065f9ccd42ce8d0c10c61a494f91
    operatingSystem: linux
    osImage: Debian GNU/Linux 12 (bookworm)
    systemUUID: 2376c8c9-a1c5-4485-8bea-efcfa76fb865

with

networking:
  enabled: false
  network:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

robot:
  enabled: false

There is no env: HCLOUD_NETWORK set:

kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: hccm
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2024-04-04T11:10:32Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: hcloud-cloud-controller-manager
  namespace: kube-system
  resourceVersion: "2171"
  uid: e97fe5ed-db35-4eaf-a290-371b87780a2c
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app.kubernetes.io/instance: hccm
      app.kubernetes.io/name: hcloud-cloud-controller-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: hccm
        app.kubernetes.io/name: hcloud-cloud-controller-manager
    spec:
      containers:
      - command:
        - /bin/hcloud-cloud-controller-manager
        - --allow-untagged-cloud
        - --cloud-provider=hcloud
        - --route-reconciliation-period=30s
        - --webhook-secure-port=0
        - --leader-elect=false
        env:
        - name: HCLOUD_TOKEN
          valueFrom:
            secretKeyRef:
              key: token
              name: hcloud
        - name: ROBOT_PASSWORD
          valueFrom:
            secretKeyRef:
              key: robot-password
              name: hcloud
              optional: true
        - name: ROBOT_USER
          valueFrom:
            secretKeyRef:
              key: robot-user
              name: hcloud
              optional: true
        image: hetznercloud/hcloud-cloud-controller-manager:v1.19.0
        imagePullPolicy: IfNotPresent
        name: hcloud-cloud-controller-manager
        ports:
        - containerPort: 8233
          name: metrics
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: Default
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: hcloud-cloud-controller-manager
      serviceAccountName: hcloud-cloud-controller-manager
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-04-04T11:10:33Z"
    lastUpdateTime: "2024-04-04T11:10:37Z"
    message: ReplicaSet "hcloud-cloud-controller-manager-584f6fc4f4" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2024-04-04T11:13:22Z"
    lastUpdateTime: "2024-04-04T11:13:22Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

@redimp
Copy link
Author

redimp commented Apr 4, 2024

I appreciate the help.

For the sake of completeness, without HCLOUD_NETWORK being hccm is not able to fetch the metadata.

[...]
I0404 11:13:24.431083       1 controllermanager.go:337] Started "cloud-node-lifecycle-controller"
I0404 11:13:24.431122       1 node_lifecycle_controller.go:113] Sending events to api server
I0404 11:13:24.512098       1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0404 11:13:24.512144       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 11:13:24.512166       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0404 11:13:24.531534       1 shared_informer.go:318] Caches are synced for service
I0404 11:13:24.531581       1 node_controller.go:431] Initializing node k3s-controlplane1 with cloud provider
E0404 11:13:24.964475       1 node_controller.go:240] error syncing 'k3s-controlplane1': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane1" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.2, requeuing
I0404 11:13:24.964549       1 node_controller.go:431] Initializing node k3s-controlplane2 with cloud provider
E0404 11:13:25.149436       1 node_controller.go:240] error syncing 'k3s-controlplane2': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane2" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.3, requeuing
I0404 11:13:25.149485       1 node_controller.go:431] Initializing node k3s-controlplane3 with cloud provider
E0404 11:13:25.317226       1 node_controller.go:240] error syncing 'k3s-controlplane3': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane3" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.4, requeuing

@apricote
Copy link
Member

apricote commented Apr 5, 2024

Thanks for the detailed responses :)

I can reproduce the issue with these values from your comment yesterday:

env:
  HCLOUD_TOKEN:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: token
  HCLOUD_NETWORK:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

networking:
  enabled: false

robot:
  enabled: false

The core issue is, that hccm & the Helm Chart always assume that users with Networks also want to use the Routing functionality. This is not always true and there are cases where you want the InternalIP on the Node but no routes. This is not natively supported in the Helm Chart right now as you have discovered.

You can set the env variable HCLOUD_NETWORK_ROUTES_ENABLED=false to disable just the routes controller.

These values should work (or just yours with the env variable added):

env:
  HCLOUD_NETWORK_ROUTES_ENABLED:
    value: "false"

networking:
  enabled: true

@redimp
Copy link
Author

redimp commented Apr 5, 2024

Thank you. Will test that.

With HCLOUD_NETWORK_ROUTES_ENABLED=false can we configure ROBOT_ENABLED=true so that the dedicated nodes are handled by the hcloud-cloud-controller-manager, too?

@apricote
Copy link
Member

apricote commented Apr 5, 2024

Yes, should work 👍 You will have to do some magic to get the private IPs for the Robot Servers in, as that is not automatically supported in HCCM right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants