-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stale service conntrack entries causing packet drop #32472
Comments
👋 interesting find!
Could we please discuss this in context of a supported release? Ideally with a repro on latest Line 1567 in f211c17
|
Understood. I'm almost certain that https://sourcegraph.com/github.com/cilium/cilium/-/commit/c7d547f330cc497bad0d6a1d0f7f3785953d9a98?visible=1 will fix it as this is where we are getting the drop. This is inferred by the missing LB4 slot lookup debug messages in the logs output above. I'll continue attempting to reproduce the issue and confirm the hypothesis above. |
I managed to reproduce the issue on
Example CT entries -- we'll reuse
In another terminal, issue curl using source port for the client pod above (e.g. 60590) NOTE: may need to wait for the port binding to be released to reuse the port
Debug output for that flow with CT collision:
|
This is not reproducible on main* using my reproduction steps; toggling the interface state clears the conntrack entries. I'll have to cp potential fix to 1.15 and see if I can reproduce.
|
Confirmed that recent PR fixes the issue:
Can we consider backporting that change? |
I remember cleaning up a few things beforehand, to make that PR possible. Let me have a look, I wasn't expecting to fix an actual bug 😲. Do you have an idea what's going wrong exactly / why the patch is helping? |
Because the external traffic policy is local
this part is returning null:
So because of that we do not try to look for a new backend:
https://sourcegraph.com/github.com/cilium/cilium@v1.15.1/-/blob/bpf/lib/lb.h?L1626-1636 The new logic is much more straightforward:
https://sourcegraph.com/github.com/cilium/cilium@main/-/blob/bpf/lib/lb.h?L1567-1578 |
Oh, ok - I think I see what you're hitting. Missed that the SVC access is coming from So the initial SVC lookup uses |
Changing to the below does fix the issue based on my reproduction scenario:
I'm not sure the full implications of this / how safe it is. |
Sweet, thank you! Could you send a PR for v1.15 with that fix (adding a |
Fixes: cilium#32472 When looking up a loadbalancer service in lxc from-container, incorrect scope_switch value skips the correct return path. Set scope switch value to be consistent with earlier service lookup. Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call. This can lead to packet drops due to failed service lookup when a stale CT entry maps to a backend ID that is no longer present in the service map. Signed-off-by: Mark St John <markstjohn@google.com>
When looking up a loadbalancer service in lxc from-container, scope_switch value for flows matching an existing conntrack entry may skip the correct return path. Set scope switch value to be consistent with earlier service lookup. Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call. This can lead to packet drops due to failed service lookup when a stale CT entry maps to a backend ID that is no longer present in the service map. Fixes: cilium#32472 Signed-off-by: Mark St John <markstjohn@google.com>
Thanks @julianwiedmann -- done. I don't see any BPF unit tests for this function, so let me know if you think it prudent to create a test for this change. |
Good question! I think for your specific test we should be alright, the bugfix is obvious enough. But having more test coverage for I'd have a bit more confidence into e2e tests for the different |
For eTP=local services, we add two entries to the service map - one with only the local backends, and a second entry with all backends. In-cluster access is meant to use the wider-scope entry. Due to incorrect scope switch, in-cluster access to eTP=local services doesn't fall through to the "all backends" entry. As result, a client on a node with no local backend would be unable to fall back, in case its initial backend terminates. Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call. Fixes: cilium#32472 Signed-off-by: Mark St John <markstjohn@google.com>
For eTP=local services, we add two entries to the service map - one with only the local backends, and a second entry with all backends. In-cluster access is meant to use the wider-scope entry. Due to incorrect scope switch, in-cluster access to eTP=local services doesn't fall through to the "all backends" entry. As result, a client on a node with no local backend would be unable to fall back, in case its initial backend terminates. Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call. Fixes: cilium#32472 Signed-off-by: Mark St John <markstjohn@google.com>
For eTP=local services, we add two entries to the service map - one with only the local backends, and a second entry with all backends. In-cluster access is meant to use the wider-scope entry. Due to incorrect scope switch, in-cluster access to eTP=local services doesn't fall through to the "all backends" entry. As result, a client on a node with no local backend would be unable to fall back, in case its initial backend terminates. Note: this change only applies to v1.15 and below, as v1.16 has been refactored to prevent the lb4_lookup_service call. Fixes: #32472 Signed-off-by: Mark St John <markstjohn@google.com>
Closing via #32608. |
Is there an existing issue for this?
What happened?
Intermittent occurrence of
Service backend not found
errors:However, there are backends for the service (and other connections to
34.1.1.100:443
from the same source pod work as expected):Turned on datapath debugging and discovered this is due to a stale service conntrack entries:
Note that the connection from the client is a
SYN
segment, but the conntrack lookup sees it as an open connection:So dumped the service conntrack entry and waited for another request with that connection tuple:
Before:
Observe flow for tuple:
After:
So it looks like even with a service backend miss, there is no feedback to the conntrack state to invalidate the entry. If a client is making frequent short-lived connections to the service, there is a high probability they will reuse the connection tuple as they cycle through the source ports.
One observation is that a
SYN
segment seen for an open CT conntrack entry is OK, as long as the service lookup resolves to an existing backend index:The text was updated successfully, but these errors were encountered: