-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a Zeebe user, I want to see Open Telemetry support for Zeebe #9742
Comments
Hey @aivinog1! I think client side is mostly done, since we can just leverage the existing gRPC interceptors for it. Let me know if you think these are insufficient. For now we could focus on the gateway side. We do plan to switch the internal cluster communication from the home-grown, Netty based transport, to gRPC. I can't really say when, unfortunately, but it would then greatly simplify OpenTelemetry integration. At any rate, what is your plan here? We currently output Prometheus metrics, which is admittedly vendor specific. We did think about switching generally to Micrometer, and I think this might be a more worthwhile endeavor long term. It means increased integration capabilities with various metrics backend, and it's still compatible with OpenTelemetry (you need to enable the OTLP registry and you're good to go). Regarding tracing, this is a big missing piece, but again shouldn't be too hard to just stick to the gRPC defaults for now. Then it's about defining how fine-grained we would like the traces. But I would propose just sticking to the gateway/client communication for now, as it allows us to focus on the new feature and not worry too much about having to implement, say, tracing support in our custom transport. Hope that makes sense |
Hey @npepinpe! |
Feel free to investigate it, but it's definitely the more complex portion. Especially, how do you deal with aggregated traces/spans? We batch multiple log stream entries (i.e. commands/events) together in a single Raft entry. So if we want to trace the Raft part, an entry may contain multiple commands which would be part of possibly different traces (note: this isn't true right now, they're always implicitly part of the same process instance for example, but I wouldn't rely on this as it's very much implementation detail and not by design). One option is to push down the trace ID/span ID/context to the record level in the record metadata. That's probably OK-ish for the IDs, but once users start putting in context it might become quite heavy. We could alternatively only serialize it when we know the trace will be sampled (this used to be part of OpenTracing, hopefully it was kept in OpenTelemetry). Anyway, don't hesitate to investigate though, I look forward to what you find. Just having an idea of what exactly we want to trace (i.e. when to start/close a trace, when to start/close a span) would be a big step. |
Hey @npepinpe. |
Hey @npepinpe!
mvn clean install -DskipTests -DskipChecks
docker build --no-cache --load --build-arg DISTBALL=dist/target/camunda-zeebe-*.tar.gz --build-arg APP_ENV=dev -t camunda/zeebe:current-test . 1.3.2. Start the environment: 2022/08/30 20:32:10 Activated job 2251799813685421 with variables {}
2022/08/30 20:32:10 Handler completed job 2251799813685421 with variables {} 1.3.7. Open the Jaeger: http://localhost:16686/ So, it would be cool to see what you think about it :) |
Hi @felix-mueller and @aivinog1. Any news on this feature? We would love to be able to send traces from our Brokers and Gateways to our Grafana Tempo instances. Is this something that we could expect to see available in the near future? (BTW, I took over this task from @darox who was giving some information here: #10241 (comment)) |
Thanks for your feedback @aivinog1. In this case we will wait until the OTEL will be officially added to the product. |
Is your feature request related to a problem? Please describe.
This issue originates from my comment. So, the main idea is to add Open Telemetry support for Zeebe, to figure out where time was spent. This task is about investigating and providing MVP and not about a comprehended solution.
Describe the solution you'd like
I think that the best is to stick with the OpenTelemetry SDK Autoconfigure module and disable it by default via environment variables or properties.
Tasks:
Describe alternatives you've considered
We can stick to the manual configuration but we should keep in mind that this will require some sort of autoconfiguration itself.
Additional context
Also, I want to say that the best is to stick with vendor-agnostic implementations and see what the Open Telemetry standard provided itself.
The text was updated successfully, but these errors were encountered: