Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

When restarting jaeger aegent in docker cpp libraries don't connect anymore #204

Closed
belfo opened this issue Feb 5, 2020 · 16 comments
Closed

Comments

@belfo
Copy link

belfo commented Feb 5, 2020

Requirement - what kind of business use case are you trying to solve?

Beeing able to restart jaeger when required

Problem - what in Jaeger blocks you from solving the requirement?

When the jaeger agent restart (new version...) the connection beetween the cpp library (used inside envoy/nginx) and the agent don't work anymore. Meaning we can't send anymore traces to jaeger.
This force us to restart all nginx/envoy containers in the cluster.

Proposal - what do you suggest to solve the problem or improve the existing situation?

I guess the library resolve the adress at start-up, this should be refreshed time-to-time to ensure it's still valid, maybe a configurable time?.

We configure our app (envoy for example)
reporter:
localAgentHostPort: jaeger-agent:6831

@mdouaihy
Copy link
Contributor

mdouaihy commented Mar 2, 2020

Hi @belfo,

I am afraid I dont understand what's happening. When the agent is restarted, isn't keeping the same IP/Port?

@belfo
Copy link
Author

belfo commented Mar 3, 2020

Hello @mdouaihy ,

not inside a docker, once the conainer restart the ip/hostname is different

Regards

@mdouaihy
Copy link
Contributor

mdouaihy commented Mar 3, 2020

ok I see.

I believe that there is a problem with the use case then.

According to Jaeger, the agent is supposed to be local, especially that the clients are sending UDP packets.

Maybe in your case, if you can't afford having a local agent, you could send directly to the collector.

@yurishkuro, do you have any insights on this?

@yurishkuro
Copy link
Member

@mdouaihy correct. However, I think there was a somewhat similar issue raised in Java client where the domain name -> IP resolution in the UDPSender was done in the constructor, so when the agent restarts, its DNS name remains the same but the IP changes, and UDPSender is unable to report spans. Not sure if this is a similar issue here.

@ecourreges-orange
Copy link
Contributor

How would you go and re-resolve the DNS in case of UDP which is not connected, you don't have the "disconnection" event to help you know when to resolve again?
You would have to resolve every time (which without a DNS cache would probably generate a denial of service, and it could be bad for most normal usages)
or somehow get the DNS TTL to know how often to resolve
or refresh at fixed time interval.

@mdouaihy
Copy link
Contributor

mdouaihy commented Mar 3, 2020

@yurishkuro, it's the same behavior.
I am not sure that this should be changed because

  • it's slow to resolve the IP at each send.
  • I believe that the agent should be launched locally or in a side car with a name/ip that wont change.

@yurishkuro
Copy link
Member

@jpkrohling do you have an idea on the best practice?

@belfo
Copy link
Author

belfo commented Mar 5, 2020

As long as it's a container (in swarm) you can't ensure the ip won't change.
You address a service. but if the service is restarted (for whatever reason) the ip of the service could be different

If it's refreshed every X seconds, it would at least recover by itself (event with a refresh interval of 60s)
or keeping a TCP connection just for the seek of refreshing the resolution?

@yurishkuro
Copy link
Member

I think a refresh is a reasonable approach, with configurable interval.

@jpkrohling
Copy link

jpkrohling commented Mar 6, 2020

What is typically done in Kubernetes is to either have the agent as a sidecar, so it's indeed localhost (same pod), or to have the agent at the node and use the hostIP, which also won't change.

For this reported case, a refresh indeed sounds like the only viable solution. But given that the client should be making other HTTP connections to the agent (like, to get sampling strategies), can't an IP change be detected there and serve as a hint to refresh the UDP parts?

@manuelnp
Copy link

I have been reading through the code and it occurs to me that IPAddress could store the unresolved name and if UDPTransporter fails in emitBatch, it could trigger the resolveAddress again and reconnect the socket with the new ip (if available).

This way, you avoid the active polling for changes in host name/ip. I don't mind contributing with a PR if you think it is a good approach.

@manuelnp
Copy link

manuelnp commented Jul 2, 2020

Forget my last comment, UDP socket will not fail if there is nobody lisening. I'll try to approach with the idea of @jpkrohling, using HTTP connection to detect changes and refresh UDP parts.

@yurishkuro
Copy link
Member

We have an open PR in Go that adds a different UDP connection that tries to redial when the host name gets resolved to a different address.

@mdouaihy
Copy link
Contributor

mdouaihy commented Jul 3, 2020

Hi @yurishkuro, is jaegertracing/jaeger-client-go#409 the concerned PR in the Go client?

@yurishkuro
Copy link
Member

yes

@yurishkuro
Copy link
Member

sorry, no, this one: jaegertracing/jaeger-client-go#520

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants