Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Cilium #852

Open
2 of 7 tasks
NissesSenap opened this issue Nov 10, 2022 · 5 comments
Open
2 of 7 tasks

Implement Cilium #852

NissesSenap opened this issue Nov 10, 2022 · 5 comments

Comments

@NissesSenap
Copy link
Contributor

NissesSenap commented Nov 10, 2022

Implement Cilium in Azure and AWS

Tasks

  • Linkerd verification
  • Node-local dns
  • monitoring
  • Documentation
  • Disable kube-proxy
    • AWS
    • Azure

Work is ongoing in #798

@NissesSenap
Copy link
Contributor Author

One of our major blockers right now is that we cant get node-local-dns to work without running cilium kubeProxyReplacement: strict.
That means that we can't run cilium together with kube-proxy.

We have written a null_resource that deletes kube-proxy but Azure is "kind enough" to install it for us again.

There is a feature that currently is in preview https://learn.microsoft.com/en-us/azure/aks/configure-kube-proxy where we can disable kube-proxy all together.
Hopefully this will become GA soon.

We are also waiting for the terraform provider to support configuring kube-proxy.
hashicorp/terraform-provider-azurerm#19567

On the other hand, we have verified that linkerd is working as intended on-top of cilium.

@NissesSenap
Copy link
Contributor Author

If we would like to enable a preview feature we could probably do it by using: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/resource_provider_registration

This feature is very low risk since we only disable the usage of kube-proxy in our cluster.

@jimgus
Copy link
Contributor

jimgus commented Dec 22, 2022

Moving to blocked due to: cilium/cilium#22838

@jimgus
Copy link
Contributor

jimgus commented Dec 22, 2022

Information regarding node-local-dns:

Initially we had missed that in order for node-local-dns to work, we need set up a Local Redirect Policy for cilium to be able to route DNS traffic to it. There is a description here on how to do it: https://cloud.yandex.com/en/docs/managed-kubernetes/operations/cilium-node-local-dns

In order to enable local redirect in cilium we have to run cilium with kubeProxyReplacement=strict which means that you run Cilium without kube-proxy

@jimgus
Copy link
Contributor

jimgus commented Dec 22, 2022

We found one problem in AWS related to running without kube-proxy. The ingress-nginx deployment is using hostNetwork: true and we have not been get that working, details can be found here: cilium/cilium#22838

We have experimented with not using hostNetwork but then get problem with that the K8S API Server cannot reach the webhooks, e.g, we get problem like this:

error: ingresses.networking.k8s.io "podinfo" could not be patched: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-public-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s": Address is not allowed

With hostNetwork: true the ingress-nginx pods get the ip address of the node and without that it does not seem to work.

We have made some experiments without host network to check the behaviour:

  1. Change the nginx admission controller to nodePort and annotate it so that external-dns creates a DNS entry containing all IPs of the nodes. Then edit the ValidatingWebhookConfiguration to use an URL instead of a service (e.g. https://ingress-nginx-public-admission.dust.unbox.xenit.io:30317/networking/v1/ingresses)
  2. Use fake services like this: https://dev.azure.com/unboxops/lz-xks/_git/terraform-aws?version=GBcilium_test&path=/fake-svc.yml and reconfigure the web hooks to use those services

Both ways made it possible for the API server to reach the Webhook endpoints but we ran into cert problems due to URL mismatch in both cases as expected

Possible ways forward:

  1. The hostNetwork problem is accepted as a bug and fixed and we can continue using hostNetwork
  2. We do not use node-local-dns in AWS and run Ciliun together with kube-proxy
  3. We try to run Cilium with AWS IPAM

@jimgus jimgus removed their assignment Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants