Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico for Windows does not work with nginx load balancer or nginx-ingress #2236

Closed
SteveCurran opened this issue Mar 30, 2021 · 31 comments
Closed
Labels
action-required Needs Attention 👋 Issues needs attention/assignee/owner triage

Comments

@SteveCurran
Copy link

What happened:
We are testing calico network policy for windows with 1.20.2. Our current configuration is a nginx load balancer directing traffic to nginx-ingress controllers in different AKS clusters. Unfortunately it appears since calico for windows requires and enables WinDSR it prevents network flow from nginx and nginx-ingress. Exposing the service with a public load balancer and making requests directly to the service's load balancer works.

What you expected to happen:
I would like clarification that this is indeed the case and what possible work arounds are available. Having to expose each deployment's endpoint with a public load balancer will be costly. Are there plans to implement windows calico network policies without the use of WinDSR? Does anyone know if WinDSR(DSR) will eventually support the use of ingress controllers?

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment: Mix node Linux/Windows pools

  • Kubernetes version (use kubectl version): 1.20.2
  • Size of cluster (how many worker nodes are in the cluster?) 4
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)
  • Others: nginx-ingress
@ghost ghost added the triage label Mar 30, 2021
@ghost
Copy link

ghost commented Mar 30, 2021

Hi SteveCurran, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

@ghost ghost added the action-required label Apr 1, 2021
@ghost
Copy link

ghost commented Apr 1, 2021

Triage required from @Azure/aks-pm

@miwithro
Copy link
Contributor

miwithro commented Apr 1, 2021

@keikhara

@ghost ghost added action-required and removed action-required labels Apr 1, 2021
@ghost
Copy link

ghost commented Apr 4, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented Apr 9, 2021

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Apr 9, 2021
@keikhara
Copy link
Contributor

keikhara commented Apr 9, 2021

@SteveCurran as responded on yammer, please open a case for this so we can investigate.

@ghost ghost added action-required and removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Apr 9, 2021
@ghost
Copy link

ghost commented Apr 12, 2021

Triage required from @Azure/aks-pm

@AbelHu
Copy link
Member

AbelHu commented Apr 13, 2021

@SteveCurran Calico requires WinDSR so AKS enables WinDSR by default when enabling Calico on Windows nodes. For your request implement windows calico network policies without the use of WinDSR, I think that you can create on one issue in https://github.com/projectcalico/calico

@ghost ghost removed the action-required label Apr 13, 2021
@AbelHu
Copy link
Member

AbelHu commented Apr 14, 2021

@SteveCurran Do you have two or more AKS clusters and the nginx load balancer is in one cluster? Could you share the topology in your environment with us so we can investigate why the nginx load balancer does not work with nginx-ingress controllers in different AKS clusters when WinDSR is enabled?

@SteveCurran
Copy link
Author

@AbelHu Very simple. We have one nginx (1.19.8) outside of cluster, forwards to the service with an azure provisioned public ip. With DSR enabled we get gateway timeout. No problems with clusters without DSR enabled.

@ghost ghost added the action-required label Apr 16, 2021
@ghost
Copy link

ghost commented Apr 16, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented Apr 21, 2021

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Apr 21, 2021
@SteveCurran
Copy link
Author

@AbelHu Microsoft support is now telling us that DSR/AKS does not support the use of in cluster ingress controller similar to nginx. "DSR AKS does not seem to support using Load balancer service type due to current floating IP limitation
When Floating IP / DSR is being implemented, the Front-end IP is configured within the VM, and not in the Load Balancer.
With the Floating IP rule, your application must use the primary IP configuration for outbound SNAT flows. If your application binds to the frontend IP address configured on the loopback interface in the guest OS, Azure's outbound SNAT is not available to rewrite the outbound flow and the flow fails. Our recommendation is to switch to an AGIC instead of the in-cluster Nginx ingress controller."

I am hoping there is a way to turn off DSR in 1.20.+ Otherwise we will have to incur the cost of configuring and using an application gateway for each/all clusters. Our costs will significantly increase due to this limitation. We will have to modify all of our pipelines to accommodate this change.

@AbelHu
Copy link
Member

AbelHu commented May 6, 2021

@SteveCurran, we can disable WinDSR in your subscription if you need to turn off DSR in 1.20.x. This feature is included in AKS RP release v20210429 so we can disable WinDSR in your subscription after v20210429 is available.

@SteveCurran
Copy link
Author

@AbelHu will disabling WinDSR also disable it for linux nodes. DSR on linux has the same issues.

@AbelHu
Copy link
Member

AbelHu commented May 6, 2021

@SteveCurran No. WinDSR is a Windows feature. cc @xuto2 for DSR on Linux.

@ghost ghost added the action-required label May 9, 2021
@ghost
Copy link

ghost commented May 9, 2021

Triage required from @Azure/aks-pm

@AbelHu
Copy link
Member

AbelHu commented May 10, 2021

@SteveCurran if you have not upgraded to v1.20.x, please file a support ticket to ask AKS PG to disable WinDSR in your subscriptions before upgrading.
If you have upgraded to v1.20.x, you also need to file a support ticket but you need to upgrade your clusters or do some update operations (for example, updating the Windows password) to update your Windows nodes without WinDSR after AKS PG disable WinDSR in your subscriptions.

@ghost
Copy link

ghost commented May 12, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented May 17, 2021

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label May 17, 2021
@ghost
Copy link

ghost commented Jun 1, 2021

Issue needing attention of @Azure/aks-leads

1 similar comment
@ghost
Copy link

ghost commented Jun 16, 2021

Issue needing attention of @Azure/aks-leads

@AbelHu
Copy link
Member

AbelHu commented Jun 17, 2021

@SteveCurran Could you try to set --set controller.service.externalTrafficPolicy=Local in your ingress and test it again?
NOTE: You only can enable WinDSR in your cluster with enabling Calico Windows.

If you would like to enable client source IP preservation for requests to containers in your cluster, add --set controller.service.externalTrafficPolicy=Local to the Helm install command.

Reference: https://docs.microsoft.com/en-us/azure/aks/ingress-basic

@ghost ghost added action-required and removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Jun 17, 2021
@ghost
Copy link

ghost commented Jun 19, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented Jun 24, 2021

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jun 24, 2021
@ghost
Copy link

ghost commented Jul 10, 2021

Issue needing attention of @Azure/aks-leads

@AbelHu
Copy link
Member

AbelHu commented Jul 10, 2021

The fix in kubeproxy is done in below PRs. I think that it will be fixed with Windows calico after the new k8s versions are supported in AKS.
1.22: kubernetes/kubernetes#103138
1.21: kubernetes/kubernetes#103140
1.20: kubernetes/kubernetes#103139

@ghost ghost added action-required and removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Jul 10, 2021
@ghost
Copy link

ghost commented Jul 12, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented Jul 17, 2021

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jul 17, 2021
@ghost
Copy link

ghost commented Aug 1, 2021

Issue needing attention of @Azure/aks-leads

@SteveCurran
Copy link
Author

@AbelHu Thanks for fixing this. We can now use nginx-ingress in aks 1.21.2 running windows calico.

@Azure Azure locked as resolved and limited conversation to collaborators Sep 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action-required Needs Attention 👋 Issues needs attention/assignee/owner triage
Projects
None yet
Development

No branches or pull requests

4 participants