Skip to content
This repository has been archived by the owner on Jul 1, 2022. It is now read-only.

Make UdpSender lazy to be able to recover from early DNS issues #726

Merged
merged 1 commit into from Jul 31, 2020
Merged

Make UdpSender lazy to be able to recover from early DNS issues #726

merged 1 commit into from Jul 31, 2020

Conversation

pschichtel
Copy link
Contributor

this should close #369

Which problem is this PR solving?

Short description of the changes

  • made the UdpSender lazily create the ThriftUdpTransport instance to prevent early DNS errors from causing initialization errors. This does require locking, which has been implemented using the double-checked locking pattern to reduce the lock overhead for (usually) the majority of send calls.

@pschichtel
Copy link
Contributor Author

Not entirely sure what would be the right action for that failing test. It seems to have relied on the UdpSender failing during initialization. If we want to keep it that way, we'd have to eagerly trigger the instantiation of the thrift tranport. I have various ways in my hand on how to do that, but I'd prefer not to unless necessary.

@pschichtel
Copy link
Contributor Author

@yurishkuro @pavolloffay it seems like the project is a little inactive right now, or is this simply the wrong place for such a change request?

@objectiser
Copy link
Contributor

@pschichtel Sorry for the delay in looking at this.

Not entirely sure what would be the right action for that failing test. It seems to have relied on the UdpSender failing during initialization. If we want to keep it that way, we'd have to eagerly trigger the instantiation of the thrift tranport. I have various ways in my hand on how to do that, but I'd prefer not to unless necessary.

I think it would be fine to change the test to check return value is UdpSender.

@yurishkuro
Copy link
Member

I wonder if this is going to fix the more fundamental issue where the agent address may simply change, e.g. due to agent restart.

There was a recent change in the Go client that implemented periodic reconnects: jaegertracing/jaeger-client-go#520

@codecov
Copy link

codecov bot commented Jul 28, 2020

Codecov Report

Merging #726 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #726   +/-   ##
=========================================
  Coverage     88.76%   88.76%           
  Complexity      596      596           
=========================================
  Files            73       73           
  Lines          2242     2242           
  Branches        289      289           
=========================================
  Hits           1990     1990           
  Misses          159      159           
  Partials         93       93           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b0ed16...d42e192. Read the comment docs.

@pschichtel
Copy link
Contributor Author

Updated the commit to fix the test.

@yurishkuro In its current state, this PR will not handle an address change. Once the client has successfully been created, that instance will stay the same. However adding on some kind of periodic rechecking and/or error handling in UdpSender#send should not be hard as a follow-up.

@yurishkuro
Copy link
Member

@jpkrohling do you still think we should introduce a configuration option?

@pschichtel
Copy link
Contributor Author

The change in that unit test makes me wonder... Would this change break some kind of fallback logic somewhere? It seems that was being tested there.

Copy link
Collaborator

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the agent isn't available and the client keeps trying to send data? Will there be multiple stacktraces in the logs, one per batch? If so, then we definitely need to make this opt-in.

@pschichtel
Copy link
Contributor Author

pschichtel commented Jul 29, 2020

What happens when the agent isn't available and the client keeps trying to send data? Will there be multiple stacktraces in the logs, one per batch? If so, then we definitely need to make this opt-in.

What about only printing the first stacktrace or every n-th or once every n minutes ?

@jpkrohling
Copy link
Collaborator

First stacktrace + positive confirmation once it succeeds sounds good to me.

this should close #369

Signed-off-by: Phillip Schichtel <phillip@schich.tel>
@pschichtel
Copy link
Contributor Author

I incorporated the suggestions, however I'm not entirely clear on where to handle the logging. I don't think the UdpSender should just swallow the exceptions, I'd say this should be handled at a higher level. I looked at the HttpSender as well and both of them fail in the same way now with this PR: Exception on every call. So any solution should be at a common code path. Either way I'm not sure this is something that should be done in this PR.

@yurishkuro
Copy link
Member

I expect the Reporter to catch the exceptions, since it processes items off a queue from a background thread, so there's nowhere to re-throw them.

@jpkrohling
Copy link
Collaborator

I looked at the HttpSender as well and both of them fail in the same way now with this PR: Exception on every call. So any solution should be at a common code path. Either way I'm not sure this is something that should be done in this PR.

Agree that this shouldn't be done in this PR. Would you please open an issue with this idea?

@pschichtel
Copy link
Contributor Author

So I think we are done here, right?

@jpkrohling jpkrohling merged commit 27059eb into jaegertracing:master Jul 31, 2020
@pschichtel
Copy link
Contributor Author

Follow up about the logging can be found at #729

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getTracer() fails when the UDP socket hostname isn't resolvable
4 participants