Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install Maesh on AWS EKS v1.17 due to a CoreDNS issue #773

Closed
0rax opened this issue Oct 28, 2020 · 7 comments · Fixed by #774
Closed

Unable to install Maesh on AWS EKS v1.17 due to a CoreDNS issue #773

0rax opened this issue Oct 28, 2020 · 7 comments · Fixed by #774

Comments

@0rax
Copy link
Contributor

0rax commented Oct 28, 2020

Bug Report

What did you do?

Installed traefik-maesh from Helm on a AWS EKS v1.17 (eks.3) cluster with Calico networing using

helm repo add traefik-mesh https://helm.traefik.io/mesh
helm repo update
helm install traefik-mesh traefik-mesh/traefik-mesh

What did you expect to see?

I was expecting the controller to start and maesh to be working.

What did you see instead?

The traefik-maesh-controller pod went into CrashLoopBackOff due to an issue with the traefik-maesh-prepare container. The issue seems to be linked to the "CoreDNS" version not being compatible with maesh though it should be (CoreDNS 1.3+).

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  13m                   default-scheduler  Successfully assigned default/traefik-mesh-controller-5f48ff8f69-vrbd9 to xxx.compute.internal
  Normal   Pulled     11m (x5 over 13m)     kubelet            Container image "traefik/mesh:v1.4.0" already present on machine
  Normal   Created    11m (x5 over 13m)     kubelet            Created container traefik-mesh-prepare
  Normal   Started    11m (x5 over 13m)     kubelet            Started container traefik-mesh-prepare
  Warning  BackOff    2m51s (x49 over 13m)  kubelet            Back-off restarting failed container

Output of prepare container log: (traefik/mesh:v1.4.0)

2020/10/28 19:16:35 command prepare error: unable to find suitable DNS provider: unsupported CoreDNS version "1.6.6-eksbuild.1"

What is your environment & configuration (arguments, provider, platform, ...)?

  • Kubernetes version: v1.17.9-eks-a84824
  • EKS version: v1.17-eks.3
  • Calico version: v3.16.4
  • Maesh version: v1.4.0
@kevinpollet kevinpollet added area/coredns kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Oct 29, 2020
@kevinpollet kevinpollet modified the milestone: v1.4 Oct 29, 2020
@jspdown
Copy link
Contributor

jspdown commented Oct 29, 2020

@0rax Thanks for your interest in Traefik Mesh!

It appears that the issue comes from one of our dependencies: https://github.com/hashicorp/go-version.
Before patching the DNS configuration we make sure CoreDNS is between >= 1.3 and < 1.8. But go-version constrains considers that a version with a pre-release never matches with a constrain specified without a pre-release.

An issue is already open on their repository to understand why it behave like this: hashicorp/go-version#59

Until this get sorted, we can replace the goversion.NewConstraint(">= 1.3, < 1.8") by a version.GreaterThanOrEqual and version.LessThan. In this type of comparison pre-releases are handled correctly.

@jspdown jspdown added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Oct 29, 2020
@kevinpollet kevinpollet added this to the v1.4 milestone Oct 29, 2020
@0rax
Copy link
Contributor Author

0rax commented Oct 29, 2020

Thank you for your quick answer, seems like an issue that could be easily fixed.

I will try to build a custom version of the docker-image with this fix to properly check Maesh compatibility with my setup.

@jspdown
Copy link
Contributor

jspdown commented Oct 30, 2020

@0rax Could you base your changes on v1.4? Since it's a bug fix it would be great to have it on this version.
Don't hesitate to ping me if you need help on this.

@0rax
Copy link
Contributor Author

0rax commented Oct 30, 2020

It looks like that using this patch on top of refs/tags/v1.4.0 I was able to start traefik-mesh successfully.

diff --git a/pkg/dns/dns.go b/pkg/dns/dns.go
index c62d46d..0416b87 100644
--- a/pkg/dns/dns.go
+++ b/pkg/dns/dns.go
@@ -39,7 +39,11 @@ const (
        traefikMeshBlockTrailer = "#### End Traefik Mesh Block"
 )
 
-var versionCoreDNS17 = goversion.Must(goversion.NewVersion("1.7"))
+var (
+       versionCoreDNS17 = goversion.Must(goversion.NewVersion("1.7"))
+       versionCoreDNS13 = goversion.Must(goversion.NewVersion("1.3"))
+       versionCoreDNS18 = goversion.Must(goversion.NewVersion("1.8"))
+)
 
 // Client holds the client for interacting with the k8s DNS system.
 type Client struct {
@@ -103,7 +107,7 @@ func (c *Client) coreDNSMatch(ctx context.Context) (bool, error) {
                return false, err
        }
 
-       if !versionConstraint.Check(version) {
+       if !(version.GreaterThanOrEqual(versionCoreDNS13) && version.LessThan(versionCoreDNS18)) {
                c.logger.Debugf("CoreDNS version is not supported, must satisfy %q, got %q", versionConstraint, version)
 
                return false, fmt.Errorf("unsupported CoreDNS version %q", version)

Quick note, I just had to create a namespace myself as the current helm chart seems to install it in the default namespace by default, this seams inconsistent with the documentation available here https://doc.traefik.io/traefik-mesh/install/#verify-your-installation where it says to check the installation using the traefik-mesh namespace.


For people interested about how I was able to deploy it after patching the code, I had to launch the following commands:

make
docker tag traefik/mesh:latest XXXXXXX.dkr.ecr.eu-west-3.amazonaws.com/traefik-mesh:v1.4.0-eks
docker push XXXXXXX.dkr.ecr.eu-west-3.amazonaws.com/traefik-mesh:v1.4.0-eks
echo "---
apiVersion: v1
kind: Namespace
metadata:
    name: traefik-mesh" | kubectl apply -f -
helm install traefik-mesh traefik-mesh/traefik-mesh \
    --set controller.image.pullPolicy=IfNotPresent \
    --set controller.image.name=XXXXXXX.dkr.ecr.eu-west-3.amazonaws.com/traefik-mesh \
    --set controller.image.tag=v1.4.0-eks \
    --namespace=traefik-mesh

@jspdown
Copy link
Contributor

jspdown commented Oct 30, 2020

@0rax This patch sounds good 👍

Could you please open a Pull Request to contribute the changes upstream? We will make sure to release a patch version on the v1.4.

Thanks again for your time on this.

@0rax
Copy link
Contributor Author

0rax commented Oct 30, 2020

@jspdown Just pushed it, I took the liberty to rename global variables to something that better match what they do instead of what they are and added a test case reflecting this issue.

@traefiker
Copy link
Contributor

Closed by #774.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants