Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling Trouble #800

Closed
kevincox opened this issue May 18, 2022 · 8 comments
Closed

Sampling Trouble #800

kevincox opened this issue May 18, 2022 · 8 comments
Assignees
Labels
A-trace Area: issues related to tracing documentation/examples Improvements or additions to documentation or examples

Comments

@kevincox
Copy link

I was trying to use the opentelemetry::sdk::trace::ShouldSample API and am having trouble. I figured that I would at least write it down here in case others have the same issue or this could be used to improve the library. This is likely just a documentation problem, but I really struggled.

My goal was to export traces to Honeycomb with sampling. I managed to do something simple like this:

#[derive(Debug)]
struct Sampler;

impl opentelemetry::sdk::trace::ShouldSample for Sampler {
	fn should_sample(&self,
		parent_context: Option<&opentelemetry::Context>,
		trace_id: opentelemetry::trace::TraceId,
		_name: &str,
		_span_kind: &opentelemetry::trace::SpanKind,
		attributes: &[opentelemetry::KeyValue],
		_links: &[opentelemetry::trace::Link],
		_instrumentation_library: &opentelemetry::sdk::InstrumentationLibrary,
	) -> opentelemetry::sdk::trace::SamplingResult {
		let sample_rate: u128 = attributes.iter()
			.find(|kv| kv.key == opentelemetry::Key::from_static_str("sampleRate"))
			.map(|kv| match kv.value {
				opentelemetry::Value::I64(rate) => rate.try_into().unwrap(),
				ref value => panic!("Unexpected value {:?} for sample rate", value),
			})
			.unwrap_or(1)
			.max(1);

		opentelemetry::sdk::trace::SamplingResult {
			decision: if u128::from_le_bytes(trace_id.to_bytes()) % sample_rate == 0 {
				opentelemetry::sdk::trace::SamplingDecision::RecordAndSample
			} else {
				opentelemetry::sdk::trace::SamplingDecision::Drop
			},
			attributes: Vec::new(),
			trace_state: match parent_context {
				Some(ctx) => opentelemetry::trace::TraceContextExt::span(ctx).span_context().trace_state().clone(),
				None => opentelemetry::trace::TraceState::default(),
			},
		}
	}
}

I'm sure the code could be slightly better, for example there is definitely some bias on the sampling depending on the sampleRate used. However this had a lot of problems:

  1. The docs are super unclear. I mostly copied opentelemetry::sdk::trace::Sampler but it isn't even clear to me what most of the code does.
  2. This only sets the sampleRate attribute on the root span. Child spans and events don't have the value set which means that they are weighted incorrectly in graphs.
  3. This only appears to run on the root span. This means that I can't do different sampling at different levels. (For example children that are sampled less frequently than the root or children that may be traced even if the root isn't)
    • This is quite weird because opentelemetry::sdk::trace::Sampler seems to think that it can base sampling decisions off of the parent span, but my Sampler is never called with anything but the root.

It would be nice if something like this was easy and documented.

@TommyCpp TommyCpp added documentation/examples Improvements or additions to documentation or examples A-trace Area: issues related to tracing bug Something isn't working and removed documentation/examples Improvements or additions to documentation or examples labels May 18, 2022
@TommyCpp
Copy link
Contributor

TommyCpp commented May 18, 2022

Thanks for the feedback. I think the sampling method in the start span only called for span without a parent. For span with a parent, it will respect the sampling result of the parent spans.

Alternatively, you can set the sampling result for individual spans using SpanBuilder::sampling_result.

I can see a benefit to changing the behavior to run sampler for every span but it comes with a performance overhead. Gonna need to investigate it a little further

@TommyCpp TommyCpp removed the bug Something isn't working label May 18, 2022
@hdost
Copy link
Contributor

hdost commented Jun 5, 2022

I believer this may be both a documentation issue AND a bug.
if you look at span creation in the Java SDK
There is only reliance on the sampling decision. Whereas in the Rust SDK we're doing what almost appears as ParentBasedSampling before sampling is even considered.

Additionally when looking at the Sampler We should probably actually change the ParentBased(Box<Sampler>) to ParentBased(Box<dyn ShouldSample>).

I realize that both of these will have performance implications. Before I got on and put the work I figure I would comment first.

hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jun 5, 2022
* Move Sampling header into `Sampler`
* Enrich classes to have links in tracing module

Relates open-telemetry#800
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jun 5, 2022
* Move Sampling header into `Sampler`
* Enrich classes to have links in tracing module

Relates open-telemetry#800

Signed-off-by: Harold Dost <github@hdost.com>
@TommyCpp
Copy link
Contributor

TommyCpp commented Jul 4, 2022

@hdost yeah I think we should sampling at every span creation. Do you want to take this issue as you are already working on it?

hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 7, 2022
* Move Sampling header into `Sampler`
* Enrich classes to have links in tracing module

Relates open-telemetry#800

Signed-off-by: Harold Dost <github@hdost.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 7, 2022
* Move Sampling header into `Sampler`
* Enrich classes to have links in tracing module

Relates open-telemetry#800

Signed-off-by: Harold Dost <github@hdost.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 7, 2022
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 8, 2022
Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 8, 2022
Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 8, 2022
Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
TommyCpp pushed a commit that referenced this issue Jul 9, 2022
* Move Sampling header into `Sampler`
* Enrich classes to have links in tracing module

Relates #800

Signed-off-by: Harold Dost <github@hdost.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 14, 2022
* Sampling should always defer to the existing sampler. The logic before
  this patch reflected something similar to what is described in the
  ParentBased sampler.

Relates open-telemetry#800
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 14, 2022
Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Jul 14, 2022
* Sampling should always defer to the existing sampler. The logic before
  this patch reflected something similar to what is described in the
  ParentBased sampler.

Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
@hdost
Copy link
Contributor

hdost commented Jul 17, 2022

Also currently working on this in #839 , I believe it should be patched but I'm looking into how best to handle the tracing-opentelemetry crate.

hdost added a commit to hdost/opentelemetry-rust that referenced this issue Aug 17, 2022
* Sampling should always defer to the existing sampler. The logic before
  this patch reflected something similar to what is described in the
  ParentBased sampler.
* Maintaining pre-sample to allow for `tracing-opentelemetry` to
  continue working

Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Aug 17, 2022
* Sampling should always defer to the existing sampler. The logic before
  this patch reflected something similar to what is described in the
  ParentBased sampler.
* Maintaining pre-sample to allow for `tracing-opentelemetry` to
  continue working

Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Sep 6, 2022
* Sampling should always defer to the existing sampler. The logic before
  this patch reflected something similar to what is described in the
  ParentBased sampler.
* Maintaining pre-sample to allow for `tracing-opentelemetry` to
  continue working

Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
hdost added a commit to hdost/opentelemetry-rust that referenced this issue Sep 8, 2022
* Sampling should always defer to the existing sampler. The logic before
  this patch reflected something similar to what is described in the
  ParentBased sampler.
* Maintaining pre-sample to allow for `tracing-opentelemetry` to
  continue working

Relates open-telemetry#800

Signed-off-by: Harold Dost <h.dost@criteo.com>
@hdost
Copy link
Contributor

hdost commented Sep 22, 2022

@kevincox we've added some further docs and updated the implementation as of 0.18.0
Going to consider this closed. If there's something missing you can comment.

@kevincox
Copy link
Author

kevincox commented Sep 23, 2022

Do you have a link to these docs? I think I will also need to open another ticket for propagating sample rate information but I can do that in a more targeted ticket?

To summarize the status of the original points:

  1. I couldn't find more docs, but look forward to a link.
  2. Still an issue, don't know how to propagate down the tree. I will open a follow-up ticket/feature request.
  3. This seems to be fixed.

@hdost
Copy link
Contributor

hdost commented Sep 27, 2022

For 1. I added docs #807

For 2. #839 this should have handled it.

@kevincox
Copy link
Author

I don't think 2 is resolved. I have opened a new issue focused on it: #886

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trace Area: issues related to tracing documentation/examples Improvements or additions to documentation or examples
Projects
None yet
Development

No branches or pull requests

3 participants