Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics on how long waits took to the RunEngine #1682

Open
DominicOram opened this issue Apr 5, 2024 · 4 comments · May be fixed by #1724
Open

Add metrics on how long waits took to the RunEngine #1682

DominicOram opened this issue Apr 5, 2024 · 4 comments · May be fixed by #1724
Assignees
Labels
hackathon Good issues for the upcoming bluesky hackathon

Comments

@DominicOram
Copy link
Contributor

In general we would like to be able to get tracing and metric information out of the RunEngine.

The specific MX DLS use case is that we would like to have good data on where the plans are waiting for hardware to finish doing things.

Expected Behavior

As a plan runs through information about how long wait messages took should be easily externally available This should be invisible to the plan and turned on via some flag before running the plan e.g. in the RunEngine.

Possible Solution

From some discussion it seems that https://opentelemetry.io/docs/languages/python/ is the agreed module to use for this. Further discussion may be needed for specifics

@DominicOram DominicOram added the hackathon Good issues for the upcoming bluesky hackathon label Apr 5, 2024
@keithralphs
Copy link

The information resulting from my observability investigations is available at https://confluence.diamond.ac.uk/x/aYGJDg with a section specific to Python apps. This was produced as a result of instrumenting BlueAPI, which is obviously somewhat different to the RunEngine in that it is a FastAPI app, but the notes I made should still be applicable to some degree. OpenTelemetry is, as you say, the preferred implementation as it is vendor independent bu supported by all the Tracing/Metrics/Logging backends Diamond is using or it likely to used, based on discussions with Garry and the Cloud team. It's also worth noting that just adding tracing will give the the durations for particular fns/calls as they happen, whereas Metrics gives you the tracking of these values over time to see if they are consistent, so you can get quite a long way by just adding a TracerProvider, declaring a Tracer and then instrumenting the code you're interested in. You can get a bit of info out by just decorationg the appropriate functions with @TRACER.start_as_current_span. There are also async specific approaches, which I haven't looked into, but generally the Opentelemetry site is pretty comprehensive in covering the various different approaches.

@tacaswell
Copy link
Contributor

I suspect we want both a log of how long a given Status object took to complete and how long we were actually blocked by it.

@callumforrester
Copy link
Contributor

Reviewers: @DominicOram @keithralphs

@callumforrester
Copy link
Contributor

Relates to bluesky/ophyd-async#195

@dperl-dls dperl-dls self-assigned this Apr 15, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 15, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 17, 2024
  - Adds opentelemetry-api as dependency
  - Adds traces on RunEngine methods which take time:
    wait(), set(), complete(), ...
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 17, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 17, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
  - Adds opentelemetry-api as dependency
  - Adds traces on RunEngine methods which take time:
    wait(), set(), complete(), ...
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
  - Adds opentelemetry-api as dependency
  - Adds traces on RunEngine methods which take time:
    wait(), set(), complete(), ...
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 18, 2024
dperl-dls added a commit to dperl-dls/bluesky that referenced this issue Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hackathon Good issues for the upcoming bluesky hackathon
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

5 participants