-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics on how long waits took to the RunEngine #1682
Comments
The information resulting from my observability investigations is available at https://confluence.diamond.ac.uk/x/aYGJDg with a section specific to Python apps. This was produced as a result of instrumenting BlueAPI, which is obviously somewhat different to the RunEngine in that it is a FastAPI app, but the notes I made should still be applicable to some degree. OpenTelemetry is, as you say, the preferred implementation as it is vendor independent bu supported by all the Tracing/Metrics/Logging backends Diamond is using or it likely to used, based on discussions with Garry and the Cloud team. It's also worth noting that just adding tracing will give the the durations for particular fns/calls as they happen, whereas Metrics gives you the tracking of these values over time to see if they are consistent, so you can get quite a long way by just adding a TracerProvider, declaring a Tracer and then instrumenting the code you're interested in. You can get a bit of info out by just decorationg the appropriate functions with @TRACER.start_as_current_span. There are also async specific approaches, which I haven't looked into, but generally the Opentelemetry site is pretty comprehensive in covering the various different approaches. |
I suspect we want both a log of how long a given Status object took to complete and how long we were actually blocked by it. |
Reviewers: @DominicOram @keithralphs |
Relates to bluesky/ophyd-async#195 |
- Adds opentelemetry-api as dependency - Adds traces on RunEngine methods which take time: wait(), set(), complete(), ...
- Adds opentelemetry-api as dependency - Adds traces on RunEngine methods which take time: wait(), set(), complete(), ...
- Adds opentelemetry-api as dependency - Adds traces on RunEngine methods which take time: wait(), set(), complete(), ...
In general we would like to be able to get tracing and metric information out of the RunEngine.
The specific MX DLS use case is that we would like to have good data on where the plans are waiting for hardware to finish doing things.
Expected Behavior
As a plan runs through information about how long
wait
messages took should be easily externally available This should be invisible to the plan and turned on via some flag before running the plan e.g. in the RunEngine.Possible Solution
From some discussion it seems that https://opentelemetry.io/docs/languages/python/ is the agreed module to use for this. Further discussion may be needed for specifics
The text was updated successfully, but these errors were encountered: