New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selector-less service with secondary IPs not working properly on Rocky 8/9 with latest kube-proxy #124587
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig network |
This seem to be a routing setup problem (which is not a K8s responsibility)
So, are packets trasmitted from the node on NIC-storage with dst to one of your configured endpoint addresses, but with sourceIP taken from NIC-pod? Are you trying from a node (i.e. main netns) or from within a POD? Can you please provide the routing config from that environment (pod or node), like output from |
Yes, that's what I'm seeing from tcpdump
From main netns
I've destroyed that cluster, but I've checked |
Does this mean that all PODs, and the NICs on the nodes have addresses from this range? That is an unusual setup, at least for IPv4. It's nothing wrong with it, in fact I would encourage it for IPv6, but I expect it to not be very well tested. The common way is to have a private cidr for PODs, and node addresses (on the NIC) from a more "official" range. E.g. in KinD PODs have 10.244.0.x addresses, while the nodes have addresses from the Docker network. In your setup (if I got it right), you have not setup egress masquerading I suppose? That would explain how packets can be sent on one nic, while having the source of another. BTW, this is usually possible out-of-the-box for IPv4, but for IPv6 you must set the sysctls:
|
No. This is just CIDR for the primary NIC of the node. Pod (not using hostNetwork) will get an address from the default Pod CIDR, which is non-overlapping with both NICs' CIDRs. In fact, the primary network and the default pod network works perfectly. I've been using the same setup for years, and I double checked that normal services (those with pod selectors) work fine. The service in problem is headless with some secondary NIC addresses, which has just recently been introduced into my setup for some tests. |
From https://kubernetes.io/docs/concepts/services-networking/service/#headless-services:
So, for a headless service This problem can't be fixed by any update in K8s. /remove-kind bug |
Um, that raises another question: how can you use A headless service has no SERVICE-IP. For test I use: apiVersion: v1
kind: Service
metadata:
name: mconnect
spec:
clusterIP: None
ipFamilyPolicy: RequireDualStack
ports:
- port: 5001
name: mconnect
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: mconnect-4
labels:
kubernetes.io/service-name: mconnect
addressType: IPv4
ports:
- name: mconnect
protocol: TCP
port: 5001
endpoints:
- addresses:
- "192.168.3.201"
conditions:
ready: true
nodeName: vm-201
- addresses:
- "192.168.3.202"
conditions:
ready: true
nodeName: vm-202 and
|
I also tested with a service without a selector, but without "clusterIP: None" (which makes it a headless service). Now there is a SERVICE-IP, or
Now I can test:
This works in my env, which is not "Rocky" nor "Ubuntu". I can't install Rocky just for this test, but I downgraded my kernel to I will close this issue as it's not a bug in K8s. /close But if you only have one external endpoint for storage, I suggest you try "clusterIP: None" and use a symbolic address (DN). |
@uablrek: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I've re-created a cluster to reproduce the problem, please allow me to clarify the problem further. I have 2 hosts with cleanly installed Rocky 9.3 minimal.
Then I setup the Kubernetes cluster with my Ansible playbook: Install Calico CNI: Install nmstate so that I can configure secondary IP via YAML: kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/nmstate.io_nmstates.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/namespace.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/service_account.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/role.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/role_binding.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/operator.yaml
cat <<EOF | kubectl create -f -
apiVersion: nmstate.io/v1
kind: NMState
metadata:
name: nmstate
---
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: kube-1-secondary-ip
spec:
nodeSelector:
kubernetes.io/hostname: kube-1
desiredState:
interfaces:
- name: ens5
type: ethernet
state: up
ipv4:
enabled: true
dhcp: false
address:
- ip: "10.87.87.201"
prefix-length: 24
---
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: kube-2-secondary-ip
spec:
nodeSelector:
kubernetes.io/hostname: kube-2
desiredState:
interfaces:
- name: ens5
type: ethernet
state: up
ipv4:
enabled: true
dhcp: false
address:
- ip: "10.87.87.202"
prefix-length: 24
EOF Finally, create a nginx pod and service: apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
ports:
- port: 80
name: http
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: nginx-1
labels:
kubernetes.io/service-name: nginx
addressType: IPv4
ports:
- name: http
port: 80
endpoints:
- addresses:
- "10.87.87.202"
---
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
nodeSelector:
kubernetes.io/hostname: kube-2
hostNetwork: true
containers:
- name: nginx
image: nginx
securityContext:
privileged: true
ports:
- containerPort: 80 Here you can notice that:
Now get nginx service ClusterIP:
Then if I access the service IP (10.103.166.117) from kube-2, it works with no problem since the nginx pod is on the same node:
But if I access the service IP from kube-1, it fails (but works with secondary IP directly):
tcpdump output:
ip route output:
I'll keep this env and am happy to provider further info for diagnose. @uablrek |
Thanks, good info. That looks weird indeed. K8s (kube-proxy) only NAT the dest address, and the rest is delegated to the cni-plugin and the os. That said, it would be very interesting to figure out how this can happen. One possibility is ip "rules". Can you please check:
Another possibility is that the packet is routed twice, but I really don't understand how that can happen. What proxy-mode are you using? If you use proxy-mode=ipvs, please check:
|
There is another mystery: even though the src is wrong, the connect should succeed since the nodes have connectivity on both networks. That would be asymmetric routing, but should work never the less. Can you please use What I am aiming at is to see if the packets goes to the default gw, rather than to |
@uablrek Sorry for my delayed reply, I was on holidays. IP rules:
I'm using the default iptables mode of kube-proxy.
|
So I created another cluster with similar setup, only just this time the host OS is debian 12. Like I said, the above scenario worked in this setup. Below is some output:
So we can see both IN and OUT packets in tcpdump. But on Rocky 9 setup, there's only one direction packets:
So I think the original problem can be divided into two parts:
|
Yep, it's an OS config thing. Turns out Rocky set |
As I said, |
OK, I see. Thanks a lot for your help @uablrek . It really clarified the problem for me. And would you mind the trouble to give me a direction on how to fix this? So that packets of this service would go in and out through the secondary NIC only |
I can't reproduce the asymmetric routing in my env. The routing tables you included last in #124587 (comment) should direct packets on kube-1 to dest 10.87.87.202 via
Not directed via
If I can attend the sig/network meeting on May 9, I will ask if anybody can explain how this can happen. But in any case, I think all will agree that this is not a K8s bug. |
Yes, that's the strange part of it. I'm not very good at iptables, but is it possible that iptables has chosen a wrong public IP?
That would be great then, thanks again! |
I succeeded to get asymmetric routing 😄 , but I had to trash the iptables rules setup by
You see one hit on the Now, if I deliberately remove the
Then I get asymmetric routing! |
So, this can still be a bug in |
Hmm, that's interesting, coz my
Packet count of the masquerade rule it always 0 (both Debian and Rocky cluster). |
Hm, please check:
If this is 0, weird things happen. (but I tested in my env to set it to 0, but didn't get asymmetric routing...) |
The call to the
Please check addresses and counter in your env |
The value is already 1:
I can see packet count incremented in other rules, but not the masquerade one of chain KUBE-POSTROUTING. See diff below: And the diff is almost identical if I change the nginx pod back to pod network and the service back to selector-based one. |
The problem seem to be that your "KUBE-SVC-*" don't have a I don't know why, but please provide your kube-proxy configmap, and the service manifest |
I found it 😄 You MUST provide This allows My kube-proxy conf has this item:
If I remove it, the KUBE-MARK-MASQ rule does not appear in the "KUBE-SVC-*" chains, and I get asymmetric routing. So, this is not a bug in K8s, I was right about that. But it was a hard problem to find. |
To sum up:
(there are other possibilities than setting |
Yes, I can confirm that you're right. After setting a proper clusterCIDR, the asymmetric routing problem went away.
Yes, I've misused "headless" here. I'll change the issue title. Thank you for your patience, really helped me out of this problem. |
What happened?
I have a 4-node (1 control-plane and 3 workers) cluster setup. Each node have two NICs, one for Pod network (inside 192.168.16.0/20), the other one for storage (inside 10.87.87.0/24).
I have a headless service, an operator would dynamically update its endpoints with storage NIC IPs. I cannot access this headless service (like
nc -vz <SERVICE-IP> <PORT>
) on nodes properly. But I can access other services on nodes.What did you expect to happen?
I should be able to access this headless service with secondary IPs just like other services from any node.
How can we reproduce it (as minimally and precisely as possible)?
nc -vz <SERVICE-IP> 80
from any node. It will or will not succeed, depending on whether the service resolves to current node or nodeAnything else we need to know?
I've tried several combinations to rule out possible causes:
So it seems to me that it's not a CNI issue, but rather related to kube-proxy (or its combination with OS).
I also dug a little bit with tcpdump. It seems to me the source node did transmitted packets, but the source IP is the primary NIC's, not the secondary one's. This can be the real cause, but don't know what caused this.
And I can confirm that:
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: