Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reporting of tracing spans from PromQL engine #2707

Merged
merged 4 commits into from Aug 12, 2022

Conversation

pracucci
Copy link
Collaborator

What this PR does

In the PR #1695 we upgraded the vendored Prometheus, which included a change to the tracing, migrating from opentracing to opentelemetry. This change broke the reporting of tracing spans from PromQL engine, which are particularly useful when investigating slow queries.

I think the clean way to fix it would be migrating Mimir (and so weaveworks/common) to opentelemetry library, but it's risky change to ensure we don't break backward compatibility and I would recommend to do it but carefully.

As a quicker fix, I'm proposing to introduce a bridge to implement the opentelemetry tracer interface but using opentracing (and jaeger client specifically) under the hood. The opentelemetry library provides the bridge for the opposite direction only, so I had to manually write it.

I manually tested it and you can see a difference between before / after.

Before:
Screenshot 2022-08-12 at 10 53 54

After:
Screenshot 2022-08-12 at 10 51 41

Which issue(s) this PR fixes or relates to

Fixes #1954

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci marked this pull request as ready for review August 12, 2022 09:52
@github-actions

This comment has been minimized.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@github-actions

This comment has been minimized.

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about some features that weren't ported. I am not sure how well these bridges should translate between OTEL and OT.

pkg/mimir/tracing.go Outdated Show resolved Hide resolved
pkg/mimir/tracing.go Show resolved Hide resolved
pkg/mimir/tracing.go Outdated Show resolved Hide resolved
pkg/mimir/tracing.go Outdated Show resolved Hide resolved
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@github-actions

This comment has been minimized.

@pracucci
Copy link
Collaborator Author

A few questions about some features that weren't ported. I am not sure how well these bridges should translate between OTEL and OT.

Thanks for the valuable review! The bridge we need should be "good enough", given the limited real usage we're doing right now. Anyway, feedback was good!

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for addressing the feedback

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@github-actions
Copy link
Contributor

Helm <> Jsonnet Diff

⚠️ A difference was detected between the Helm chart and the Jsonnet library.

  1. Use make check-helm-jsonnet-diff to reproduce the output locally.
  2. This test is experimental while we gather feedback about its usefulness.
  3. It does not block your PR from being merged, but we would appreciate you trying to keep feature parity between the Helm chart and Jsonnet library if possible.

If you get stuck on this step and the Mimir maintainers aren't able to help, feel free to merge without making this step pass and tag @Logiraptor so the Mimir maintainers can gather feedback later.

Please see the contribution docs here for more info.

Expand to see the output

Output of https://github.com/grafana/mimir/actions/runs/2847132247

Warning: policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
diff -r -u -N scratch/./helm/07-config/ingester-MimirConfig.yml scratch/./jsonnet/08-config/ingester-MimirConfig.yml
--- scratch/./helm/07-config/ingester-MimirConfig.yml	2022-08-12 13:48:18.849649810 +0000
+++ scratch/./jsonnet/08-config/ingester-MimirConfig.yml	2022-08-12 13:48:25.953662070 +0000
@@ -553,7 +553,7 @@
     max_fetched_chunks_per_query: 2000000 (default)
     max_fetched_series_per_query: 0 (default)
     max_global_exemplars_per_user: 0 (default)
-    max_global_series_per_metric: 20000 (default)
+    max_global_series_per_metric: 0
     max_global_series_per_user: 150000 (default)
     max_label_name_length: 1024 (default)
     max_label_names_per_series: 30 (default)
diff -r -u -N scratch/./helm/07-config/overrides-exporter-MimirConfig.yml scratch/./jsonnet/08-config/overrides-exporter-MimirConfig.yml
--- scratch/./helm/07-config/overrides-exporter-MimirConfig.yml	2022-08-12 13:48:18.853649821 +0000
+++ scratch/./jsonnet/08-config/overrides-exporter-MimirConfig.yml	2022-08-12 13:48:25.961662089 +0000
@@ -498,7 +498,7 @@
     max_fetched_chunks_per_query: 2000000 (default)
     max_fetched_series_per_query: 0 (default)
     max_global_exemplars_per_user: 0 (default)
-    max_global_series_per_metric: 20000 (default)
+    max_global_series_per_metric: 0
     max_global_series_per_user: 150000 (default)
     max_label_name_length: 1024 (default)
     max_label_names_per_series: 30 (default)

@pracucci pracucci merged commit 5bc7aab into main Aug 12, 2022
@pracucci pracucci deleted the fix-promql-engine-tracing branch August 12, 2022 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inner evaluation span missing from query traces
2 participants