Delayed NLRI withdrawl for ingress-nginx endpoint removal #32487
Labels
area/bgp
kind/bug
This is a bug in the Cilium logic.
kind/community-report
This was reported by a user in the Cilium community, eg via Slack.
Is there an existing issue for this?
What happened?
The NLRI for the LoadBalancerIP with
externalTrafficPolicy: Local
of the ingress-nginx-controller is not withdrawn when the Endpoint disappears, and appears to delay until the pod is completely gone.Cilium Version
v1.15.4
Kernel Version
5.15.0-56-generic
Kubernetes Version
v1.29.2
Regression
No response
Sysdump
cilium-sysdump-20240511-191215.zip
Relevant log output
No response
Anything else?
To reproduce:
Files for this can be viewed at https://github.com/bewing/cilium-issue-32487
Clone repo with config files:
Create single-node k8s cluster with kind, and deploy Cilium
Once the operator is Ready, examine the IP assigned to the node, and edit
frr.conf
andpeer.yaml
to match, apply the BGP-specific config, and start an FRR instance. Confirm the peering gets established.Install ingress-nginx, confirm BGP announcement of the assigned LoadBalancer IP:
Scale the controller to 0 replicas. Confirm that the endpoint is deleted. See that the route has not been withdrawn, and sticks for multiple seconds:
Once the
ingress-nginx-controller
pod finally finishes Terminating, and goes away, the route will finally get withdrawn:During the period before the route is withdrawn, but the controller is in the Terminating state, clients who are directed to the controller will get connection refused.
Cilium Users Document
Code of Conduct
The text was updated successfully, but these errors were encountered: