When restarting jaeger aegent in docker cpp libraries don't connect anymore #204
Comments
Hi @belfo, I am afraid I dont understand what's happening. When the agent is restarted, isn't keeping the same IP/Port? |
Hello @mdouaihy , not inside a docker, once the conainer restart the ip/hostname is different Regards |
ok I see. I believe that there is a problem with the use case then. According to Jaeger, the agent is supposed to be local, especially that the clients are sending UDP packets. Maybe in your case, if you can't afford having a local agent, you could send directly to the collector. @yurishkuro, do you have any insights on this? |
@mdouaihy correct. However, I think there was a somewhat similar issue raised in Java client where the domain name -> IP resolution in the UDPSender was done in the constructor, so when the agent restarts, its DNS name remains the same but the IP changes, and UDPSender is unable to report spans. Not sure if this is a similar issue here. |
How would you go and re-resolve the DNS in case of UDP which is not connected, you don't have the "disconnection" event to help you know when to resolve again? |
@yurishkuro, it's the same behavior.
|
@jpkrohling do you have an idea on the best practice? |
As long as it's a container (in swarm) you can't ensure the ip won't change. If it's refreshed every X seconds, it would at least recover by itself (event with a refresh interval of 60s) |
I think a refresh is a reasonable approach, with configurable interval. |
What is typically done in Kubernetes is to either have the agent as a sidecar, so it's indeed For this reported case, a refresh indeed sounds like the only viable solution. But given that the client should be making other HTTP connections to the agent (like, to get sampling strategies), can't an IP change be detected there and serve as a hint to refresh the UDP parts? |
I have been reading through the code and it occurs to me that IPAddress could store the unresolved name and if UDPTransporter fails in emitBatch, it could trigger the resolveAddress again and reconnect the socket with the new ip (if available). This way, you avoid the active polling for changes in host name/ip. I don't mind contributing with a PR if you think it is a good approach. |
Forget my last comment, UDP socket will not fail if there is nobody lisening. I'll try to approach with the idea of @jpkrohling, using HTTP connection to detect changes and refresh UDP parts. |
We have an open PR in Go that adds a different UDP connection that tries to redial when the host name gets resolved to a different address. |
Hi @yurishkuro, is jaegertracing/jaeger-client-go#409 the concerned PR in the Go client? |
yes |
sorry, no, this one: jaegertracing/jaeger-client-go#520 |
Requirement - what kind of business use case are you trying to solve?
Beeing able to restart jaeger when required
Problem - what in Jaeger blocks you from solving the requirement?
When the jaeger agent restart (new version...) the connection beetween the cpp library (used inside envoy/nginx) and the agent don't work anymore. Meaning we can't send anymore traces to jaeger.
This force us to restart all nginx/envoy containers in the cluster.
Proposal - what do you suggest to solve the problem or improve the existing situation?
I guess the library resolve the adress at start-up, this should be refreshed time-to-time to ensure it's still valid, maybe a configurable time?.
We configure our app (envoy for example)
reporter:
localAgentHostPort: jaeger-agent:6831
The text was updated successfully, but these errors were encountered: