When restarting jaeger aegent in docker cpp libraries don't connect anymore #204

belfo · 2020-02-05T11:44:56Z

Requirement - what kind of business use case are you trying to solve?

Beeing able to restart jaeger when required

Problem - what in Jaeger blocks you from solving the requirement?

When the jaeger agent restart (new version...) the connection beetween the cpp library (used inside envoy/nginx) and the agent don't work anymore. Meaning we can't send anymore traces to jaeger.
This force us to restart all nginx/envoy containers in the cluster.

Proposal - what do you suggest to solve the problem or improve the existing situation?

I guess the library resolve the adress at start-up, this should be refreshed time-to-time to ensure it's still valid, maybe a configurable time?.

We configure our app (envoy for example)
reporter:
localAgentHostPort: jaeger-agent:6831

mdouaihy · 2020-03-02T22:48:13Z

Hi @belfo,

I am afraid I dont understand what's happening. When the agent is restarted, isn't keeping the same IP/Port?

belfo · 2020-03-03T09:59:45Z

Hello @mdouaihy ,

not inside a docker, once the conainer restart the ip/hostname is different

Regards

mdouaihy · 2020-03-03T11:12:18Z

ok I see.

I believe that there is a problem with the use case then.

According to Jaeger, the agent is supposed to be local, especially that the clients are sending UDP packets.

Maybe in your case, if you can't afford having a local agent, you could send directly to the collector.

@yurishkuro, do you have any insights on this?

yurishkuro · 2020-03-03T17:00:06Z

@mdouaihy correct. However, I think there was a somewhat similar issue raised in Java client where the domain name -> IP resolution in the UDPSender was done in the constructor, so when the agent restarts, its DNS name remains the same but the IP changes, and UDPSender is unable to report spans. Not sure if this is a similar issue here.

ecourreges-orange · 2020-03-03T17:41:24Z

How would you go and re-resolve the DNS in case of UDP which is not connected, you don't have the "disconnection" event to help you know when to resolve again?
You would have to resolve every time (which without a DNS cache would probably generate a denial of service, and it could be bad for most normal usages)
or somehow get the DNS TTL to know how often to resolve
or refresh at fixed time interval.

mdouaihy · 2020-03-03T17:53:46Z

@yurishkuro, it's the same behavior.
I am not sure that this should be changed because

it's slow to resolve the IP at each send.
I believe that the agent should be launched locally or in a side car with a name/ip that wont change.

yurishkuro · 2020-03-03T18:35:43Z

@jpkrohling do you have an idea on the best practice?

belfo · 2020-03-05T11:00:43Z

As long as it's a container (in swarm) you can't ensure the ip won't change.
You address a service. but if the service is restarted (for whatever reason) the ip of the service could be different

If it's refreshed every X seconds, it would at least recover by itself (event with a refresh interval of 60s)
or keeping a TCP connection just for the seek of refreshing the resolution?

yurishkuro · 2020-03-05T23:21:01Z

I think a refresh is a reasonable approach, with configurable interval.

jpkrohling · 2020-03-06T09:06:30Z

What is typically done in Kubernetes is to either have the agent as a sidecar, so it's indeed localhost (same pod), or to have the agent at the node and use the hostIP, which also won't change.

For this reported case, a refresh indeed sounds like the only viable solution. But given that the client should be making other HTTP connections to the agent (like, to get sampling strategies), can't an IP change be detected there and serve as a hint to refresh the UDP parts?

manuelnp · 2020-06-25T11:30:53Z

I have been reading through the code and it occurs to me that IPAddress could store the unresolved name and if UDPTransporter fails in emitBatch, it could trigger the resolveAddress again and reconnect the socket with the new ip (if available).

This way, you avoid the active polling for changes in host name/ip. I don't mind contributing with a PR if you think it is a good approach.

manuelnp · 2020-07-02T13:23:27Z

Forget my last comment, UDP socket will not fail if there is nobody lisening. I'll try to approach with the idea of @jpkrohling, using HTTP connection to detect changes and refresh UDP parts.

yurishkuro · 2020-07-02T15:57:26Z

We have an open PR in Go that adds a different UDP connection that tries to redial when the host name gets resolved to a different address.

mdouaihy · 2020-07-03T15:27:05Z

Hi @yurishkuro, is jaegertracing/jaeger-client-go#409 the concerned PR in the Go client?

yurishkuro · 2020-07-03T17:54:18Z

yes

yurishkuro · 2020-07-03T17:55:04Z

sorry, no, this one: jaegertracing/jaeger-client-go#520

yurishkuro closed this as completed Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When restarting jaeger aegent in docker cpp libraries don't connect anymore #204

When restarting jaeger aegent in docker cpp libraries don't connect anymore #204

belfo commented Feb 5, 2020

mdouaihy commented Mar 2, 2020

belfo commented Mar 3, 2020

mdouaihy commented Mar 3, 2020

yurishkuro commented Mar 3, 2020

ecourreges-orange commented Mar 3, 2020

mdouaihy commented Mar 3, 2020

yurishkuro commented Mar 3, 2020

belfo commented Mar 5, 2020

yurishkuro commented Mar 5, 2020

jpkrohling commented Mar 6, 2020 •

edited

manuelnp commented Jun 25, 2020

manuelnp commented Jul 2, 2020

yurishkuro commented Jul 2, 2020

mdouaihy commented Jul 3, 2020

yurishkuro commented Jul 3, 2020

yurishkuro commented Jul 3, 2020

When restarting jaeger aegent in docker cpp libraries don't connect anymore #204

When restarting jaeger aegent in docker cpp libraries don't connect anymore #204

Comments

belfo commented Feb 5, 2020

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

mdouaihy commented Mar 2, 2020

belfo commented Mar 3, 2020

mdouaihy commented Mar 3, 2020

yurishkuro commented Mar 3, 2020

ecourreges-orange commented Mar 3, 2020

mdouaihy commented Mar 3, 2020

yurishkuro commented Mar 3, 2020

belfo commented Mar 5, 2020

yurishkuro commented Mar 5, 2020

jpkrohling commented Mar 6, 2020 • edited

manuelnp commented Jun 25, 2020

manuelnp commented Jul 2, 2020

yurishkuro commented Jul 2, 2020

mdouaihy commented Jul 3, 2020

yurishkuro commented Jul 3, 2020

yurishkuro commented Jul 3, 2020

jpkrohling commented Mar 6, 2020 •

edited