Calico and HCC #641

medicol69 · 2024-05-08T09:14:12Z

TL;DR

This is more of an inquiry, since it's not that clear from the documentation, does the hetzner cloud controller work with the Calico CNI when using the private interfaces on Hetzner? Thanks

Expected behavior

this is an inquiry on the documentation.

apricote · 2024-05-10T07:30:03Z

When you use the private networks from Hetzner Cloud with hcloud-cloud-controller-manager and enable the routes-controller (default), then you should be able to use Calico without any additional overlay networks. You can configure this in Calico with CALICO_NETWORKING_BACKEND=none

I have never personally tested this configuration though.

simonostendorf · 2024-05-30T19:26:01Z

I am also interested in this topic, if you have any knowledge @medicol69 please let me now :)

DeprecatedLuke · 2024-06-02T02:19:04Z

Yes, it works fine with calico. To run a quick test use hetzner-k3s.

Important warning when running cloud together when baremetal with private networking. Calico requires a /24 vlan address per node which means when you're creating a subnet make sure the vlan subnet is at minimum a /23 (1 nodes max) or ideally /17 (127 nodes max) allocating first half to cloud instances and the second half to baremetal instances.

medicol69 · 2024-06-03T07:49:43Z

thanks, but I don't think that the hetzner private network interfaces are stable enough to use them in production. If anyone got them to work and give out an example of how to use it in prod I'm all ears.

DeprecatedLuke · 2024-06-03T15:15:36Z

I am currently running it just fine with calico and even have ceph working over vlan with pretty good performance. You cannot advertise nodeip with internal so define hostendpoint instead for metrics and etcd to be protected. Load balancers also require you to use public net in this case.

simonostendorf · 2024-06-03T19:10:27Z

I am using calico without encapsulation and hccm with routes enabled. Calico uses BPF and replaces kube-proxy.

I think this works well, but I haven't tested it enough to be 100% sure.

If you have any feedback on this configuration, I would love to discuss it :)

calico-tigera-operator-values.yaml

installation:
  cni:
    type: Calico
    ipam:
      type: HostLocal # use podCIDR assigned by kube-controller-manager, that is also used by route-controller in hcloud-cloud-controller-manager
  calicoNetwork:
    bgp: Enabled
    linuxDataplane: BPF
    hostPorts: Disabled
    ipPools:
      - name: default-ipv4
        cidr: 10.0.0.0/16
        encapsulation: None
        blockSize: 24
        natOutgoing: Enabled
        nodeSelector: all()
defaultFelixConfiguration:
  enabled: true
  bpfEnabled: true
  bpfExternalServiceMode: DSR
  bpfKubeProxyIptablesCleanupEnabled: true
kubernetesServiceEndpoint:
  host: api.my-cluster.domain.tld
  port: 6443

DeprecatedLuke · 2024-06-04T11:19:33Z

I am not sure why, but when using hetzner-k3s the internal network works just fine, however, a manually bootstrapped cluster has an issue with the cloud controller where it does not recognize the internal ip address so it never gets the taint removed and the labels added.

I spent few hours trying to figure out why without being able to find any difference between the two configurations. My only guess is that it is some internal order of configuration where the metadata/private network endpoints are not being parsed in order.

So to recap: allocate at least /16 vlan range and do not use the hcloud controller (will not be able to use the load balancer or resolve labels automatically).

simonostendorf · 2024-06-04T11:23:29Z

I am not sure why, but when using hetzner-k3s the internal network works just fine, however, a manually bootstrapped cluster has an issue with the cloud controller where it does not recognize the internal ip address so it never gets the taint removed and the labels added.

What kubernetes version do you use? Kubernetes 1.29 had a change that the node ip will be left empty if cloud-provider is set to external and --node-ip is not set manually. Maybe this is the case here.

From CHANGELOG-1.29: kubelet , when using --cloud-provider=external, will now initialize the node addresses with the value of --node-ip , if it exists, or waits for the cloud provider to assign the addresses. (https://github.com/kubernetes/kubernetes/pull/121028, [@aojea](https://github.com/aojea))

medicol69 · 2024-06-04T11:29:01Z

I am currently running it just fine with calico and even have ceph working over vlan with pretty good performance. You cannot advertise nodeip with internal so define hostendpoint instead for metrics and etcd to be protected. Load balancers also require you to use public net in this case.

I was thinking on private networking on hetzner, if anyone is doing that in production please share your config, and what are your experiences.

simonostendorf · 2024-06-04T11:33:23Z

I was thinking on private networking on hetzner, if anyone is doing that in production please share your config, and what are your experiences.

I am currently testing this. You can see my calico values above. HCCM configuration is normal with networks enabled.

DeprecatedLuke · 2024-06-04T11:34:18Z

I am not sure why, but when using hetzner-k3s the internal network works just fine, however, a manually bootstrapped cluster has an issue with the cloud controller where it does not recognize the internal ip address so it never gets the taint removed and the labels added.

What kubernetes version do you use? Kubernetes 1.29 had a change that the node ip will be left empty if cloud-provider is set to external and --node-ip is not set manually. Maybe this is the case here.

From CHANGELOG-1.29: kubelet , when using --cloud-provider=external, will now initialize the node addresses with the value of --node-ip , if it exists, or waits for the cloud provider to assign the addresses. (https://github.com/kubernetes/kubernetes/pull/121028, [@aojea](https://github.com/aojea))

I tried both 1.29 and 1.30, here's my init script:

k3sup install --host $SERVER_HOST --ip $PUBLIC_IP --user root --ssh-key=~/.ssh/id_ed25519 --cluster --local-path ~/.kube/config --merge --context $CLUSTER --no-extras --k3s-channel latest --k3s-extra-args "\
--disable local-storage \
--disable metrics-server \
--disable-cloud-controller \
--kubelet-arg='provider-id=hcloud://$PROVIDER_ID' \
--kubelet-arg='cloud-provider=external' \
--flannel-backend=none \
--disable-network-policy \
--write-kubeconfig-mode=644 \
--cluster-domain=$CLUSTER_DOMAIN \
--cluster-cidr=$CLUSTER_CIDR \
--service-cidr=$CLUSTER_SERVICE_CIDR \
--cluster-dns=$CLUSTER_DNS \
--node-name=$SERVER_HOSTNAME \
--node-ip=$PRIVATE_IP \
--node-external-ip=$PUBLIC_IP \
--tls-san=$CLUSTER_LB \
--tls-san=$PRIVATE_IP \
--tls-san=$PUBLIC_IP \
--tls-san=$CLUSTER_DOMAIN \
--node-taint=CriticalAddonsOnly=true:NoExecute \
--etcd-expose-metrics='true' \
--kube-controller-manager-arg='bind-address=0.0.0.0' \
--kube-proxy-arg='metrics-bind-address=0.0.0.0' \
--kube-scheduler-arg='bind-address=0.0.0.0' \
" --print-command

EDIT: added node-ip=$PRIVATE_IP, the configuration before is what I am currently using to get around the issue.

I am currently running it just fine with calico and even have ceph working over vlan with pretty good performance. You cannot advertise nodeip with internal so define hostendpoint instead for metrics and etcd to be protected. Load balancers also require you to use public net in this case.

I was thinking on private networking on hetzner, if anyone is doing that in production please share your config, and what are your experiences.

Yes, it does work including networking and routes out of the box when using hetzner-k3s tool. But I had issues with getting HCCM to recognize the nodes when defining an internal ip as the node network when attempting to bootstrap the cluster manually. However, using the public ip works fine (and routes are still created for internal communication). Robot does not support networking from HCCM.

simonostendorf · 2024-06-04T11:58:22Z

Yes, it does work including networking and routes out of the box when using hetzner-k3s tool. But I had issues with getting HCCM to recognize the nodes when defining an internal ip as the node network when attempting to bootstrap the cluster manually. However, using the public ip works fine (and routes are still created for internal communication). Robot does not support networking from HCCM.

I am using kubeadm only on hcloud nodes (currently no dedicated / robot nodes, maybe i will add them later) and this works fine.

DeprecatedLuke · 2024-06-04T12:21:00Z

Alright, here's the full guide to replicate the issue:
init_master.sh

#!/bin/bash

CLUSTER=$1
CLUSTER_DOMAIN=$2
SERVER_HOST=$3
CLUSTER_PRIVATE_NET=$4
CLUSTER_CIDR=$5
CLUSTER_SERVICE_CIDR=$6
CLUSTER_DNS=$7
CLUSTER_LB=$8

PUBLIC_IP=$(ssh $SERVER_HOST "curl checkip.amazonaws.com")
PRIVATE_IP=$(ssh $SERVER_HOST "ip route get $CLUSTER_PRIVATE_NET | awk '{print \$7}'")
PROVIDER_ID=$(ssh $SERVER_HOST "curl http://169.254.169.254/hetzner/v1/metadata/instance-id")

echo "Public IP: $PUBLIC_IP Private IP: $PRIVATE_IP"

kubectl config delete-cluster $CLUSTER
kubectl config delete-user $CLUSTER

SERVER_HOSTNAME=$(echo $SERVER_HOST | cut -d'.' -f1)

ssh -y $SERVER_HOST "curl https://packages.hetzner.com/hcloud/deb/hc-utils_0.0.4-1_all.deb -o /tmp/hc-utils_0.0.3-1_all.deb -s && apt -y install /tmp/hc-utils_0.0.3-1_all.deb"

k3sup install --host $SERVER_HOST --ip $PUBLIC_IP --user root --ssh-key=~/.ssh/id_ed25519 --cluster --local-path ~/.kube/config --merge --context $CLUSTER --no-extras --k3s-channel latest --k3s-extra-args "\
--disable local-storage \
--disable metrics-server \
--disable-cloud-controller \
--kubelet-arg='provider-id=hcloud://$PROVIDER_ID' \
--kubelet-arg='cloud-provider=external' \
--flannel-backend=none \
--disable-network-policy \
--write-kubeconfig-mode=644 \
--cluster-domain=$CLUSTER_DOMAIN \
--cluster-cidr=$CLUSTER_CIDR \
--service-cidr=$CLUSTER_SERVICE_CIDR \
--cluster-dns=$CLUSTER_DNS \
--node-name=$SERVER_HOSTNAME \
--node-ip=$PRIVATE_IP \
--node-external-ip=$PUBLIC_IP \
--tls-san=$CLUSTER_LB \
--tls-san=$PRIVATE_IP \
--tls-san=$PUBLIC_IP \
--tls-san=$CLUSTER_DOMAIN \
--node-taint=CriticalAddonsOnly=true:NoExecute \
--etcd-expose-metrics='true' \
--kube-controller-manager-arg='bind-address=0.0.0.0' \
--kube-proxy-arg='metrics-bind-address=0.0.0.0' \
--kube-scheduler-arg='bind-address=0.0.0.0' \
" --print-command

kubectl config set-cluster $CLUSTER --server=https://$CLUSTER_LB:6443
k3sup ready --context $CLUSTER <- will fail since no CNI

bash init_master.sh test-cluster cluster.local IP_ADDRESS 10.224.0.0 10.222.0.0/16 10.223.0.0/16 10.223.0.10 IP_ADDRESS

kubectl config set-context test-cluster

Install calico:
helm repo add tiegra https://docs.tigera.io/calico/charts
helm repo update tiegra
helm install cni tiegra/tigera-operator -n tiegra-operator

Create HCCM secret with the network cidr and hcloud token.

Install hcloud:
helm repo add hcloud https://charts.hetzner.cloud
helm repo update hcloud
helm install hccm hcloud/hcloud-cloud-controller-manager -n kube-system --values values.yaml

nodeSelector:
  node-role.kubernetes.io/control-plane: "true"

Observe the following error:

error syncing '*node*': failed to get node modifiers from cloud provider: provided node ip for node "*node*" is not valid: failed to get node address from cloud provider that matches ip: 10.224.0.2, requeuing

edit: the actual name doesn't matter for the hostname since providerid is specified, usually the hostname would be a domain matching the name of the node and the calico step is optional.

medicol69 added the enhancement New feature or request label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calico and HCC #641

Calico and HCC #641

medicol69 commented May 8, 2024

apricote commented May 10, 2024

simonostendorf commented May 30, 2024

DeprecatedLuke commented Jun 2, 2024 •

edited

medicol69 commented Jun 3, 2024

DeprecatedLuke commented Jun 3, 2024

simonostendorf commented Jun 3, 2024 •

edited

DeprecatedLuke commented Jun 4, 2024

simonostendorf commented Jun 4, 2024 •

edited

medicol69 commented Jun 4, 2024

simonostendorf commented Jun 4, 2024

DeprecatedLuke commented Jun 4, 2024 •

edited

simonostendorf commented Jun 4, 2024

DeprecatedLuke commented Jun 4, 2024 •

edited

Calico and HCC #641

Calico and HCC #641

Comments

medicol69 commented May 8, 2024

TL;DR

Expected behavior

apricote commented May 10, 2024

simonostendorf commented May 30, 2024

DeprecatedLuke commented Jun 2, 2024 • edited

medicol69 commented Jun 3, 2024

DeprecatedLuke commented Jun 3, 2024

simonostendorf commented Jun 3, 2024 • edited

DeprecatedLuke commented Jun 4, 2024

simonostendorf commented Jun 4, 2024 • edited

medicol69 commented Jun 4, 2024

simonostendorf commented Jun 4, 2024

DeprecatedLuke commented Jun 4, 2024 • edited

simonostendorf commented Jun 4, 2024

DeprecatedLuke commented Jun 4, 2024 • edited

DeprecatedLuke commented Jun 2, 2024 •

edited

simonostendorf commented Jun 3, 2024 •

edited

simonostendorf commented Jun 4, 2024 •

edited

DeprecatedLuke commented Jun 4, 2024 •

edited

DeprecatedLuke commented Jun 4, 2024 •

edited