Traefik doesn't reconnect to Jaeger when connection lost #6093

Pehesi97 · 2019-12-26T11:38:16Z

Do you want to request a feature or report a bug?

Report a bug.

What did you do?

Configured Traefik to output traces to a Jaeger instance running on the same Kubernetes cluster using the following arguments:

- --tracing
- --tracing.jaeger.samplingServerURL=jaeger-agent.tracing:5778/sampling
- --tracing.jaeger.localAgentHostPort=jaeger-agent.tracing:68311

It worked correctly, until my Jaeger Agent pod restarted. Traefik didn't connect to the service again and therefore my traces weren't being written to Jaeger.

What did you expect to see?

Traefik reconnecting to Jaeger Agent when it was available again.

What did you see instead?

Traefik didn't reconnect to Jaeger Agent when it was available again.

Output of `traefik version`: (What version of Traefik are you using?)

Version:      2.0.4
Codename:     montdor
Go version:   go1.13.3
Built:        2019-10-28T20:23:57Z
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

The text was updated successfully, but these errors were encountered:

ddtmachado · 2019-12-31T13:50:23Z

I can confirm the issue on Kubernetes, and probably on any other platform when using a service name as the Jaeger host endpoint.

To me it looks like the issue is caused by Traefik not recreating the Jaeger client because it's an UDP connection (no fail control), for example:

Traefik (actually the imported Jaeger client) resolves the ip address, given by the service name, during the tracing Middleware setup
Kubernetes will update the internal DNS records for the service when a Pod is restarted
As it is an UDP connection, it doesn't matter if the other side is not listening anymore, Traefik will keep sending to the old pod address and won't recreate the Jaeger client

Not sure what we could do in this case though

ct27stf · 2020-03-25T08:35:01Z

the samplingServerURL is a HTTP endpoint
also the jaeger-agent provides port 14271 | HTTP | Healthcheck at / and metrics at /metrics

kevtainer · 2020-05-12T18:25:47Z

jaegertracing/jaeger-client-go#403

I believe this is an issue with the jaeger-client-go library, a fix was proposed but rejected as creating too much overhead.

dtomcej · 2020-05-12T18:42:44Z

@Pehesi97 in this case, isn't jaeger-agent.tracing a service in the tracing namespace?
Similar to (https://github.com/jaegertracing/jaeger-kubernetes/blob/master/all-in-one/jaeger-all-in-one-template.yml#L108)?

If so, the service IP should not have changed, and a restarted pod should not have made any changes to the service IP.

Traefik would never have connected directly to a pod IP, as pods in kubernetes do not normally get a DNS record. There are exceptions, but this does not look like one.

Is it possible that your pod did not pass the readiness checks, and did not get re-added as a service endpoint?

Or is there another reason that your service IP would have changed?

terev · 2020-06-27T21:07:16Z

@dtomcej this can happen because that template deploys a headless service for the agent. this means no proxying is done, it merely provides a dns convenience for round robinning of pods selected by the service. i've submitted a pr to the jaeger-client-go library jaegertracing/jaeger-client-go#520 that should resolve this issue. would anyone mind having a look and possibly giving it a 👍 for more exposure

terev · 2020-08-17T15:49:34Z

fyi if you upgrade your dependency now you should see that this issue is resolved by the linked pr

ldez · 2020-08-20T20:01:23Z

Closed by #7198

traefiker added the status/0-needs-triage label Dec 26, 2019

juliens added area/middleware/tracing kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. and removed status/0-needs-triage labels Dec 26, 2019

mpl added priority/P2 need to be fixed in the future kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Jan 2, 2020

terev mentioned this issue Jun 28, 2020

Add resolved udp connection type, continually resolve dns names in background jaegertracing/jaeger-client-go#520

Merged

rtribotte added this to issues in v2 via automation Jul 6, 2020

kevinpollet mentioned this issue Aug 19, 2020

Update jaeger-client-go dependency to v2.25.0 #7198

Merged

1 task

ldez added this to the 2.3 milestone Aug 20, 2020

ldez closed this as completed Aug 20, 2020

v2 automation moved this from issues to Done Aug 20, 2020

traefik locked and limited conversation to collaborators Sep 20, 2020

traefiker added the status/5-frozen-due-to-age label Sep 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traefik doesn't reconnect to Jaeger when connection lost #6093

Traefik doesn't reconnect to Jaeger when connection lost #6093

Pehesi97 commented Dec 26, 2019

ddtmachado commented Dec 31, 2019

ct27stf commented Mar 25, 2020

kevtainer commented May 12, 2020

dtomcej commented May 12, 2020

terev commented Jun 27, 2020 •

edited

terev commented Aug 17, 2020

ldez commented Aug 20, 2020

Traefik doesn't reconnect to Jaeger when connection lost #6093

Traefik doesn't reconnect to Jaeger when connection lost #6093

Comments

Pehesi97 commented Dec 26, 2019

Do you want to request a feature or report a bug?

What did you do?

What did you expect to see?

What did you see instead?

Output of traefik version: (What version of Traefik are you using?)

What is your environment & configuration (arguments, toml, provider, platform, ...)?

ddtmachado commented Dec 31, 2019

ct27stf commented Mar 25, 2020

kevtainer commented May 12, 2020

dtomcej commented May 12, 2020

terev commented Jun 27, 2020 • edited

terev commented Aug 17, 2020

ldez commented Aug 20, 2020

Output of `traefik version`: (What version of Traefik are you using?)

terev commented Jun 27, 2020 •

edited