This repository has been archived by the owner on Jul 1, 2022. It is now read-only.
Large percentage of spans captured by jaeger_tracer_reporter_spans_total metric are resulting in error #821
Labels
Comments
Additionally, we tried certain tests on the instance to see if we could reproduce this "bad state" where we see persisting client reporting failures. We tried the following tests on live instances:
The first two tests resulted in 100% of spans reporting |
Another thread about it on Jaeger Slack: https://cloud-native.slack.com/archives/CGG7NFUJ3/p1615574357080000 |
This is the error that we are seeing from the RemoteReporter logs:
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Describe the bug
A percentage of our instances running with Jaeger Tracing enabled are reporting a high percentage of occurrences of the
jaeger_tracer_reporter_spans_total{result="err"}
metric. At times the percentage can be greater than 10% or 30%, but never 100% (which we would expect would be the case in the case of, say, an agent outage or networking failure). The percentage is somewhat consistent, but sometimes we see it go to ~0% after fully restarting our application. The below image shows that behavior in action:Expected behavior
We expect to have a much lower percentage of spans failing to report on our instances, ideally near 0%.
Version (please complete the following information):
What troubleshooting steps did you try?
host.docker.internal
host. Because of this, I don't believe there is any DNS resolution happening during client/agent communications.Previous Gitter Inquiries
More context can be found regarding our environment and the tests that we tried in two small discussions we had on Gitter:
https://gitter.im/jaegertracing/Lobby?at=5eced9bc89941d051a28aa0d
https://gitter.im/jaegertracing/Lobby?at=603fe580d1aee44e2dc0ed11
The text was updated successfully, but these errors were encountered: