Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New node was not registered in LB targets #521

Open
nmuntyanov opened this issue Sep 25, 2023 · 6 comments
Open

New node was not registered in LB targets #521

nmuntyanov opened this issue Sep 25, 2023 · 6 comments

Comments

@nmuntyanov
Copy link

nmuntyanov commented Sep 25, 2023

Hello guys!
I have an issue with hcloud controller, newly added node was not added into LB targets.

I0925 08:40:41.339617       1 node_controller.go:431] Initializing node pool1-44283f2f892250c3 with cloud provider
I0925 08:40:41.339753       1 load_balancers.go:283] "update Load Balancer" op="hcloud/loadBalancers.UpdateLoadBalancer" service="ingress-nginx-controller" nodes=["pool1-1bebb963a31e0dcc","pool1-224cc2e7d3d7112","pool1-70ba3df099972c15","pool1-44283f2f892250c3"]
I0925 08:40:41.506786       1 node_controller.go:502] Successfully initialized node pool1-44283f2f892250c3 with cloud provider
I0925 08:40:41.506851       1 node_controller.go:431] Initializing node pool1-44283f2f892250c3 with cloud provider
I0925 08:40:41.509142       1 event.go:307] "Event occurred" object="pool1-44283f2f892250c3" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"
I0925 08:40:41.647178       1 node_controller.go:502] Successfully initialized node pool1-44283f2f892250c3 with cloud provider
I0925 08:40:41.647316       1 event.go:307] "Event occurred" object="pool1-44283f2f892250c3" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"
I0925 08:40:50.603418       1 route_controller.go:216] action for Node "fluix-sand-control-plane" with CIDR "10.244.0.0/24": "keep"
I0925 08:40:50.603468       1 route_controller.go:216] action for Node "pool1-1bebb963a31e0dcc" with CIDR "10.244.12.0/24": "keep"
I0925 08:40:50.603491       1 route_controller.go:216] action for Node "pool1-224cc2e7d3d7112" with CIDR "10.244.18.0/24": "keep"
I0925 08:40:50.603509       1 route_controller.go:216] action for Node "pool1-70ba3df099972c15" with CIDR "10.244.7.0/24": "keep"
I0925 08:40:50.603527       1 route_controller.go:216] action for Node "pool1-44283f2f892250c3" with CIDR "10.244.19.0/24": "add"
I0925 08:40:50.603552       1 route_controller.go:290] route spec to be created: &{ pool1-44283f2f892250c3 false [{InternalIP 10.0.4.3} {Hostname pool1-44283f2f892250c3}] 10.244.19.0/24 false}
I0925 08:40:50.603642       1 route_controller.go:304] Creating route for node pool1-44283f2f892250c3 10.244.19.0/24 with hint 0f76b027-4be6-4144-af9d-67566c6a5e3c, throttled 20µs
I0925 08:40:52.468300       1 route_controller.go:325] Created route for node pool1-44283f2f892250c3 10.244.19.0/24 with hint 0f76b027-4be6-4144-af9d-67566c6a5e3c after 1.864673235s
I0925 08:40:52.468348       1 route_controller.go:387] Patching node status pool1-44283f2f892250c3 with true previous condition was:nil

new node (pool1-44283f2f892250c3 ) was successfully initialized but LB wasn't updated.

Once i've restarted controller's pod - it has updated LB and added new node.

Controller version: v1.18.0
Controller arguments:

--allow-untagged-cloud                                                                                                                                                                                                                
--cloud-provider=hcloud                                                                                                                                                                                                               
--route-reconciliation-period=30s                                                                                                                                                                                                     
--webhook-secure-port=0                                                                                                                                                                                                               
--allocate-node-cidrs=true                                                                                                                                                                                                            
--cluster-cidr=10.244.0.0/16                                                                                                                                                                                                         --leader-elect=false

Nginx service annotations:

load-balancer.hetzner.cloud/hostname: {{ LB DNS }}
load-balancer.hetzner.cloud/ipv6-disabled: true 
load-balancer.hetzner.cloud/name: {{LB NAME }}
load-balancer.hetzner.cloud/protocol: tcp 
load-balancer.hetzner.cloud/use-private-ip: true
load-balancer.hetzner.cloud/uses-proxyprotocol: true
@nmuntyanov
Copy link
Author

v1.17.2 version works well

@apricote
Copy link
Member

apricote commented Oct 4, 2023

Hey @nmuntyanov,

adding Nodes to the Load Balancer happens asynchronously to the initialization. By default it adds any Nodes that are initialized & marked as Ready. Maybe your Node was not marked as Ready yet, for example because the CNI was not initialized?

The logs do not contain anything about the Service controller, so the cause is hard to figure out. Can you reproduce the issue? If yes, it would be great if you can run the ccm with flag --v=4 to get some more logs from the controller.

@nmuntyanov
Copy link
Author

nmuntyanov commented Oct 17, 2023

@apricote

I1017 16:46:26.870848       1 reflector.go:378] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: forcing resync
I1017 16:46:27.151016       1 controller.go:716] Syncing backends for all LB services.
I1017 16:46:27.151047       1 node_controller.go:431] Initializing node pool1-7e9ea74aec95871f with cloud provider
I1017 16:46:27.151055       1 controller.go:755] Running updateLoadBalancerHosts(len(services)==1, workers==1)
I1017 16:46:27.151141       1 controller.go:741] nodeSyncService started for service ingress-nginx/ingress-nginx-controller
I1017 16:46:27.151155       1 controller.go:795] Updating backends for load balancer ingress-nginx/ingress-nginx-controller with node set: map[pool1-187c0d729458c3b4:{} pool1-1b3b8f8fc14e8cc:{} pool1-2959c973ec8cab12:{} pool1-545a7f4c14e9eb04:{} pool1-5a3db78e45a2cbcf:{} pool1-76770d9e480a80cb:{} pool1-7a6cd031d6849d91:{} pool1-7e9ea74aec95871f:{}]
I1017 16:46:27.151254       1 load_balancers.go:283] "update Load Balancer" op="hcloud/loadBalancers.UpdateLoadBalancer" service="ingress-nginx-controller" nodes=["pool1-1b3b8f8fc14e8cc","pool1-545a7f4c14e9eb04","pool1-7a6cd031d6849d91","pool1-187c0d729458c3b4","pool1-2959c973ec8cab12","pool1-5a3db78e45a2cbcf","pool1-76770d9e480a80cb","pool1-7e9ea74aec95871f"]
I1017 16:46:27.614234       1 route_controller.go:216] action for Node "pool1-187c0d729458c3b4" with CIDR "10.244.41.0/24": "keep"
I1017 16:46:27.614277       1 route_controller.go:216] action for Node "pool1-2959c973ec8cab12" with CIDR "10.244.39.0/24": "keep"
I1017 16:46:27.614293       1 route_controller.go:216] action for Node "pool1-5a3db78e45a2cbcf" with CIDR "10.244.36.0/24": "keep"
I1017 16:46:27.614306       1 route_controller.go:216] action for Node "pool1-76770d9e480a80cb" with CIDR "10.244.38.0/24": "keep"
I1017 16:46:27.614319       1 route_controller.go:216] action for Node "pool1-7e9ea74aec95871f" with CIDR "10.244.43.0/24": "add"
I1017 16:46:27.614331       1 route_controller.go:216] action for Node "control-plane" with CIDR "10.244.0.0/24": "keep"
I1017 16:46:27.614349       1 route_controller.go:216] action for Node "pool1-1b3b8f8fc14e8cc" with CIDR "10.244.40.0/24": "keep"
I1017 16:46:27.614363       1 route_controller.go:216] action for Node "pool1-545a7f4c14e9eb04" with CIDR "10.244.37.0/24": "keep"
I1017 16:46:27.614382       1 route_controller.go:216] action for Node "pool1-7a6cd031d6849d91" with CIDR "10.244.42.0/24": "keep"
I1017 16:46:27.614404       1 route_controller.go:290] route spec to be created: &{ pool1-7e9ea74aec95871f false [{InternalIP 10.0.8.11} {Hostname pool1-7e9ea74aec95871f}] 10.244.43.0/24 false}
I1017 16:46:27.614488       1 route_controller.go:304] Creating route for node pool1-7e9ea74aec95871f 10.244.43.0/24 with hint fad9a939-eb7c-444c-87be-19b6b2c4db4b, throttled 15.8µs
I1017 16:46:27.954789       1 node_controller.go:543] Adding node label from cloud provider: beta.kubernetes.io/instance-type=cax31
I1017 16:46:27.954817       1 node_controller.go:544] Adding node label from cloud provider: node.kubernetes.io/instance-type=cax31
I1017 16:46:27.954825       1 node_controller.go:555] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=fsn1-dc14
I1017 16:46:27.954832       1 node_controller.go:556] Adding node label from cloud provider: topology.kubernetes.io/zone=fsn1-dc14
I1017 16:46:27.954839       1 node_controller.go:566] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=fsn1
I1017 16:46:27.954846       1 node_controller.go:567] Adding node label from cloud provider: topology.kubernetes.io/region=fsn1
I1017 16:46:27.964645       1 node_controller.go:502] Successfully initialized node pool1-7e9ea74aec95871f with cloud provider
I1017 16:46:27.964711       1 node_controller.go:431] Initializing node pool1-7e9ea74aec95871f with cloud provider
I1017 16:46:27.964889       1 event.go:307] "Event occurred" object="pool1-7e9ea74aec95871f" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"
I1017 16:46:28.140425       1 node_controller.go:543] Adding node label from cloud provider: beta.kubernetes.io/instance-type=cax31
I1017 16:46:28.140452       1 node_controller.go:544] Adding node label from cloud provider: node.kubernetes.io/instance-type=cax31
I1017 16:46:28.140458       1 node_controller.go:555] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=fsn1-dc14
I1017 16:46:28.140463       1 node_controller.go:556] Adding node label from cloud provider: topology.kubernetes.io/zone=fsn1-dc14
I1017 16:46:28.140469       1 node_controller.go:566] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=fsn1
I1017 16:46:28.140475       1 node_controller.go:567] Adding node label from cloud provider: topology.kubernetes.io/region=fsn1
I1017 16:46:28.147629       1 node_controller.go:502] Successfully initialized node pool1-7e9ea74aec95871f with cloud provider
I1017 16:46:28.147768       1 event.go:307] "Event occurred" object="pool1-7e9ea74aec95871f" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"
I1017 16:46:28.970406       1 route_controller.go:325] Created route for node pool1-7e9ea74aec95871f 10.244.43.0/24 with hint fad9a939-eb7c-444c-87be-19b6b2c4db4b after 1.355916554s
I1017 16:46:28.970490       1 route_controller.go:378] set node pool1-7a6cd031d6849d91 with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970501       1 route_controller.go:378] set node pool1-187c0d729458c3b4 with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970521       1 route_controller.go:378] set node pool1-5a3db78e45a2cbcf with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970507       1 route_controller.go:378] set node pool1-2959c973ec8cab12 with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970537       1 route_controller.go:378] set node pool1-76770d9e480a80cb with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970547       1 route_controller.go:378] set node control-plane with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970550       1 route_controller.go:378] set node pool1-1b3b8f8fc14e8cc with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970564       1 route_controller.go:378] set node pool1-545a7f4c14e9eb04 with NodeNetworkUnavailable=false was canceled because it is already set
I1017 16:46:28.970573       1 route_controller.go:387] Patching node status pool1-7e9ea74aec95871f with true previous condition was:nil
I1017 16:46:30.195644       1 controller.go:791] It took 3.044484004 seconds to update load balancer hosts for service ingress-nginx/ingress-nginx-controller
I1017 16:46:30.195671       1 controller.go:747] nodeSyncService finished successfully for service ingress-nginx/ingress-nginx-controller
I1017 16:46:30.195686       1 controller.go:780] Finished updateLoadBalancerHosts
I1017 16:46:30.195692       1 controller.go:720] Successfully updated 1 out of 1 load balancers to direct traffic to the updated set of nodes
I1017 16:46:30.195696       1 controller.go:712] It took 3.044698765 seconds to finish syncNodes
I1017 16:46:30.195709       1 controller.go:716] Syncing backends for all LB services.
I1017 16:46:30.195713       1 controller.go:755] Running updateLoadBalancerHosts(len(services)==1, workers==1)
I1017 16:46:30.195759       1 controller.go:780] Finished updateLoadBalancerHosts
I1017 16:46:30.195765       1 controller.go:720] Successfully updated 1 out of 1 load balancers to direct traffic to the updated set of nodes
I1017 16:46:30.195767       1 controller.go:712] It took 5.852e-05 seconds to finish syncNodes

It says "Successfully updated 1 out of 1 load balancers..." but actually is not

Copy link
Contributor

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the stale label Jan 16, 2024
@apricote
Copy link
Member

You did not mention which Node you expected to be added in the latest logs, I will assume pool1-7e9ea74aec95871f as that one is being initialized there.

As we can see in the following line, the Node is passed to our method and passed all of our filters. After that, nothing should stop the node from being assigned to the loadbalancer, except if it already is assigned, but then we should see an error.

I1017 16:46:27.151254       1 load_balancers.go:283] "update Load Balancer" op="hcloud/loadBalancers.UpdateLoadBalancer" service="ingress-nginx-controller" nodes=["pool1-1b3b8f8fc14e8cc","pool1-545a7f4c14e9eb04","pool1-7a6cd031d6849d91","pool1-187c0d729458c3b4","pool1-2959c973ec8cab12","pool1-5a3db78e45a2cbcf","pool1-76770d9e480a80cb","pool1-7e9ea74aec95871f"]

If this is still happening, could you add the environment variable HCLOUD_DEBUG=true to the HCCM container and send the resulting logs to julian.toelle <at> hetzner-cloud.de? This will include any calls made to the Hetzner Cloud API (with your token redacted).

@github-actions github-actions bot removed the stale label Jan 25, 2024
Copy link
Contributor

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added stale and removed stale labels Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants