New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement some means to figure out where Zeebe's time is spent #9282
Comments
During an investigation I added once some metrics for job rate and execution time, for example see here #8551 (comment) Would this help you? I have a branch which contains still the code https://github.com/camunda/zeebe/commits/zell-execution-metrics Furthermore regarding execution time, I think flamegraphs would be useful to you. There is a short description here https://github.com/camunda/zeebe/blob/main/benchmarks/docs/debug/README.md#profiling It might be a bit out dated (not sure) but the idea is to use async-profiler to profile the Broker and then you can see in a nice way where CPU time is spent (possible to divide into separate threads) Is this something you would like to look at? |
The metrics not so much. I am not looking for details on jobs, I am more looking for details on tasks submitted to the actor scheduler. Yes, the flamegraphs would be helpful. I wish we had something more interactive though. I worked with JProfiler in the past, which also had it's flaws but was a little more user-friendly. |
Maybe the IntelliJ profiling tools would be a good option for you? It's at least a little bit more interactive than running the async profiler manually. |
I was asked to elaborate a little bit more on the solution I would like. This is also to be seen as input for the requirements of the actor scheduler of the future (#9142). I don't want to make it too prescriptive, but mainly want to flesh out my ideas:
This way we would see which actors and tasks consume how much time. |
9294: Add actor metrics r=Zelldon a=Zelldon ## Description As discussed here https://camunda.slack.com/archives/C037RS2JHB8/p1651668160788749 add new actor metrics but no new panels for now. Details: - Add counter for actorTask execution - Add histogram to observe actorTask execution Currently starting a benchmark to verify whether metrics are exported as expected. I will create a separate PR for the atomix executors. `@npepinpe` I'm not sure whether it fulfills all requirements for #9282 I will remove my assignment then. <!-- Please explain the changes you made here. --> ## Related issues <!-- Which issues are closed by this PR or are related --> related #9282 Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
Actor metrics have been added via #9294 as discussed here https://camunda.slack.com/archives/C037RS2JHB8/p1651668608010019?thread_ts=1651668160.788749&cid=C037RS2JHB8 who ever wants to use them will add the dashboard panels, regarding to their needs. I will remove my assignment for now. |
We could invest a bit in open tracing and hide that behind a feature flag. We could then also enable it in our benchmarks, to learn a bit more about the system. Related slack thread https://camunda.slack.com/archives/C032560A9GE/p1653025715815249 Jon mentioned that there is some good support from GKE, might be worth to check https://cloud.google.com/trace/docs/setup/java-ot |
Hey @Zelldon! We are thrilled to see Open Telemetry support (since open tracing is deprecated) because our business processes strongly depend on the latency of Zeebe. But, since we are running on the bare metal environment, we want to see Open Telemetry support in the terms of cloud (and environment) agnostic. If this is okay with you, I can start experimenting with this :) But I think that it is worth creating a separate issue for it. |
Hey @Zelldon! This is a kind reminder about the previous message ⬆️ :) Thanks :) |
Hey @aivinog1 sorry but was not sure whether this was a question ? 😅 Sure go ahead and experiment 🤷 I know that we also want to experiment with it. I also started a bit with google cloud and open telemetry, but was not that fruitful. |
@rodrigo-lourenco-lopes The Actor metrics dashboard panel added in #12548 doesn't seem to show any data for our own cluster (Zeebe Team Engineering Automation). Is there anything we still need to do to have this available for SaaS? |
Is your feature request related to a problem? Please describe.
While investigating #8991 I wanted to figure out where Zeebe spends its time. I didn't really find an efficient way to do it. I added some system outs to get at least some insights. From those I could see that Zeebe was busy for some time, then not processing anything for about 20 seconds. I didn't figure out what it did.
I tried using VisualVM and it's sampler. But the information I got from it was not helpful:
Describe the solution you'd like
The text was updated successfully, but these errors were encountered: