-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec: Fix unencrypted traffic when IPsec is used with L7 egress proxy #31955
base: main
Are you sure you want to change the base?
Conversation
/ci-ipsec-upgrade result: https://github.com/cilium/cilium/actions/runs/8683894291/job/23810571749 |
07d7033
to
fbf5048
Compare
fbf5048
to
65989ba
Compare
ci-ipsec-upgrade is green: https://github.com/cilium/cilium/actions/runs/9061173282/job/24892420196 |
6eddac1
to
d2e31c3
Compare
Okay, with the last patch d2e31c3, ci-ipsec-e2e test 4 and 5 passed: https://github.com/cilium/cilium/actions/runs/9093401424 |
6c90196
to
98c70a5
Compare
Signed-off-by: gray <gray.liang@isovalent.com>
Otherwise the further patch for L7 proxy + IPsec fails pod-to-world testcases. To fix #31984, we are going to change the routing for traffic from egress proxy: when tunnel is disabled, the traffic will be routed to cilium_host instead of eth0. This leads to a consequence of changing packets' source IPs: from eth0's IP to cilium_host's IP. The implication requires additional masquerading for pod to world traffic, because these packets now have cilium_host's IP as source, rather than eth0. This patch ensures iptables masquerade rules are installed for pod to world traffic, even when KPR is enabled. Signed-off-by: gray <gray.liang@isovalent.com>
98c70a5
to
f51e4cb
Compare
Current status: All checks are green except ci-clustermesh, ci-e2e, ci-gke. I'll limit the condition of installing from-egress routing to IPsec only and see if it works. |
With 6fc76b6, ci-clustermesh, ci-e2e, ci-gke are back to normal. Let's make it a formal patch. |
1d98e7d
to
6ead26c
Compare
This commit installs "0xb00/0xf00 lookup 2005" routing rule when IPsec is enabled with native routing and envoy. This is a necessary step towards fixing encryption leaks, otherwise egress proxy's return traffic gets no chance to be set IPsec mark. The new routing rule ensures these packets are routed to cilium_host, where we have bpf_host to handle encryption datapath. Signed-off-by: gray <gray.liang@isovalent.com>
…ckets To ensure IPsec encryption for proxy forward packets, we added routing rule to push them to cilium_host. This change caused side effects for to-world traffic. This patch fixes the consequences of side effects. Proxy will perform "SNAT" for to-world packets, but the new source address is decided by routing rules. Previously, to-world packets are routed to eth0, so proxy uses eth0's address for SNAT. Now with new routing rule to push them to cilium_host instead of eth0, proxy uses cilium_host's address for SNAT as the side effect. This change makes to-world packets rely on "external" SNAT, which wasn't required because proxy's SNAT worked perfectly. We need "external" SNAT to change source address of to-world packets from cilium_host's IP to eth0's IP. As IPsec doesn't work with KPR, the "external" SNAT mechanism is iptables. However, due to kernel's implementation details, an skb won't be processed by nat POSTROUTING for twice. When the packet is routed to cilium_host, it's the first time; when forwarded from cilium_net to eth0, it's supposed to be the second time. This is because, After the first POSTROUTING traversal, skb's ct (struct nf_conn*)(skb->_nfct & ~7) has a status IPS_SRC_NAT_DONE to skip the second traversal at all. To avoid setting the IPS_SRC_NAT_DONE flag, this patch adds an iptables rule `-j CT --notrack` for skip the first round iptables ct. Signed-off-by: gray <gray.liang@isovalent.com>
Extend the conformance-ipsec-e2e GHA workflow to additionally check that we don't leak any unencrypted packets during the connectivity test. This aims to complement the validation already performed as part of the connectivity tests by the Cilium CLI. Specifically, we leverage bpftrace to analyze the packets forwarded by the bridge device (used by kind), and report those that are not encrypted. We flag packets with both the source and the destination belonging to the IPv4/6 PodCIDR, and we consider the inner headers if packets are encapsulated. In this case, we additionally skip packets originating or targeting CiliumInternalIP addresses (as these are used for node-to-pod traffic when running in tunnel mode, which is not encrypted by design). Extra checks are finally added to always include packets originating from the L7 and DNS proxies, as their source IP is not that of a pod. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
So that we can install the version we want. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
6ead26c
to
a0f49ac
Compare
All green! Let's review this PR. Once approved I'll drop the last 3 patches which are for temporary leak detection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, neat as usual 🙂
A couple questions below before I can finish my review.
curl -L https://github.com/bpftrace/bpftrace/releases/download/v0.19.1/bpftrace -o bpftrace | ||
install -m 755 bpftrace /usr/local/bin/bpftrace | ||
# apt update && apt install -y bpftrace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to keep this line?
return (option.Config.EnableEnvoyConfig || option.Config.EnableIPSec) && !option.Config.TunnelingEnabled() | ||
func requireFromProxyRoutes() (fromIngressProxy, fromEgressProxy bool) { | ||
fromIngressProxy = (option.Config.EnableEnvoyConfig || option.Config.EnableIPSec) && !option.Config.TunnelingEnabled() | ||
fromEgressProxy = option.Config.EnableIPSec && !option.Config.TunnelingEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the downside of making all this not IPsec dependent? I.e., always redirect so we have two less special cases for IPsec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw CI breakage on envoy=true + ipsec=false. To be specific, this PR would like to install a new routing rule 0xb00 lookup 2005
, if I don't limit the install condition to ipsec=true, ci-e2e becomes broken: #31955 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add details in commit message.
I have to put this into draft because #32331 seems to have changed something about masquerade so my iptables patch c6a38e9 no longer makes sense. In fact it can't pass ci-ipsec-e2e: #32683. The initial observation is this PR will break client-egress-l7/pod-to-world because proxy's tcp-syn to 1.1.1.1 will be dropped due to SKB_DROP_REASON_TCP_INVALID_SYN. Not fully sure what's going on but I'll be back. |
This PR fixes unencrypted traffic among nodes when IPsec is used with L7 egress proxy.
Fixes: #31984