feat(robot-server, api, performance-metrics): get performance metrics to work on robot #15008

DerekMaggio · 2024-04-25T15:10:14Z

Overview

Will close https://opentrons.atlassian.net/browse/EXEC-413

Various fixes to get performance metrics working on the robot

Test Plan

Test Instructions

Factory reset your robot with only the "runs and protocols" setting. (This will clear any saved protocol analysis results)
Pull this branch
Go to the performance-metrics dir
Set an env variable for your host. Makes this way easier

export host=<your-robot-ip>

Run the follow to enable the performance-metrics feature flag

make add-performance-metrics-ff

Push the new code at your robot

make push-no-restart-ot3 && (cd ../api && make push-no-restart-ot3) && (cd ../robot_server && make push-ot3)

Once the robot finishes restarting, ssh into the robot with

(cd ../api && make term)

Go to the /data directory

cd /data

Verify the performance_metrics_data directory exists - ls -l
Go to the directory

cd performance_metrics_data

Check that the directory is empty
From the app, load a protocol and go to start setup
Wait for the robot to finish analyzing the protocol
In the performance_metrics_data there should be 2 files:
1. robot_context_data
2. robot_context_data_headers
Check the content of the files

cat robot_context_data
cat robot_context_data_headers

robot_context_data should contain a single line something like, 2, a big number, another big number
robot_context_data_headers should contain a single line state_id,function_start_time,duration
Cancel the protocol
Create another run for the same protocol and wait for it to finish analyzing
Check the 2 files again. They should NOT have changed (This is because we are loading the cached analysis and not rerunning it)
Cancel the protocol
Create another run with a different protocol and wait for it to finish analyzing
recheck the 2 files. robot_context_data_headers should be the same. robot_context_data should have another line something like, 2, a big number, another big number

TESTS

test on the OT-2
test on the Flex

Changelog

Fixes

There are a lot of little issues that I found while trying to get this to work

api/src/opentrons/util/performance_helpers.py
- The idea of calling tracker.store() inside of track_analysis for sure will not work because the underlying decorated function has not finished running when this is called. This must be deferred to RobotContextTracker to handle. This is the reasoning for removing the conditional store and instead passing the STORE_EACH var to RobotContextTracker
performance-metrics/src/performance_metrics/robot_context_tracker.py
- There are a couple things here
- Somehow, I removed asynchronous function support. This was not failing any of the tests because the calls were still being timed, but the coroutine hadn't been awaited. This sucked to find. But anyways, I added support back for async functions
- Added aforementioned deferred storage
I also found that I was wrapping the incorrect call to analysis. This is why robot_server/protocols/protocol_analyzer is now wrapped

Testing Changes

test_robot_context_tracker.py
- Add testing for the store_each functionality
test_cli.py
- Remove the context_tracker.store() case. This is outside the scope of the cli module.
test_protocol_analyzer.py
- Add validation that analysis results are stored after each call to analyze.

Utility Changes

Pulled boilerplate makefile stuff from other projects to use for building and pushing packages
Added a few utility functions to the makefile to assist with debugging/testing

Review requests

Can someone verify this for me on the Flex?
Any other testing around making sure performance-metrics doesn't affect other code when it is supposed to be disabled?
Would like at least a few reviews, specifically looking for edge cases that this could fail and halt robot execution

Risk assessment

Medium to High: This is the first time performance-metrics is going to actually be able to interact with a robot and its file system.

y3rsh · 2024-04-26T15:51:41Z

As is high risk, l would like this to not merge to edge until 5/2 post code freeze for 7.3.0.
If there is a counterargument that this will in some way enhance the 7.3.0 release please let me know.

DerekMaggio · 2024-04-26T15:53:31Z

Nope, will mark do not merge.

sfoster1

Okay, I see what we're going for here and accept that you want to change it soon, but I really think we should start as we mean to go on: just don't have store_each, and have the analysis tracker explicitly store once in a finally after the analysis ends. store_each would mean a disk hit after every tracked function call and totally kill perf, and I think it's equally clear to just have it at the end of the analyze call.

performance-metrics/src/performance_metrics/robot_context_tracker.py

SyntaxColoring · 2024-04-26T18:13:41Z

performance-metrics/Makefile

+	  http://$(host):31950/settings/log_level/local | jq
+
+.PHONY: add-performance-metrics-ff
+add-performance-metrics-ff:


Nitpick, feel free to ignore: I might call this set-performance-metrics-ff or enable-performance-metrics-ff.

performance-metrics/Makefile

SyntaxColoring · 2024-04-26T19:24:38Z

robot-server/tests/protocols/test_protocol_analyzer.py

+# Enable tracking for the RobotContextTracker
+# This must come before the import of the analyze CLI
+context_tracker = _get_robot_context_tracker()
+
+# Ignore the type error for the next line, as we're setting a private attribute for testing purposes
+context_tracker._should_track = True  # type: ignore[attr-defined]


Okay, so as far as I can tell:

Our performance tracking code is global, for the same reason that logging is global: ease of access.

Our performance tracking code has import-time logic, for performance reasons. (To avoid having to check if running_on_robot: ... every time it's called.)

As we're beginning to see here, both of the above are going to make it very annoying and icky to test that the code under measurement calls the performance tracking code correctly,

To "fix" this, I would support just not having unit tests like this. We just won't check that the code under measurement emits metrics. By analogy, we're also okay with not checking that the code under test emits logs. It's not worthwhile to test because it's noncritical and also not error-prone. And we want it to be cheap and easy to move measurements around as we hone in on good targets.

DerekMaggio · 2024-06-03T13:47:04Z

Closing in favor of, #15289

DerekMaggio added 17 commits April 24, 2024 06:15

refactor(performance-metrics): pull out storage logic into class

8e5aa72

chore: add add method. Move data store tests

e8ebdf2

chore: small fixes

9ee562a

Stop constant from being modified

d8d6ca3

storage format is not being used

6797aca

Make file name match

b405cef

another typo

e5a0779

linting

e7a4d28

hopefully fix whitespace issue on windows

9789604

feat: Add storing analysis context on track

700770d

WIP

cb16dd7

fix: move store_each functionality to robot_context_tracker

eccf62a

fix: turns out async functions were not being handled at all

416d3b2

test: only test store each functionality

995ef45

chore: helps if I actually wrap the correct function

9eca61e

chore: formatting and linting

f8bdbf0

missed this

f1b122c

DerekMaggio changed the base branch from edge to EXEC-408-add-storing-analysis-context-on-track April 25, 2024 20:29

DerekMaggio added 3 commits April 26, 2024 08:16

Add some makefile changes

4497565

I just want the .gitignore to the be the same

472cc36

formatting

d693337

DerekMaggio marked this pull request as ready for review April 26, 2024 15:43

DerekMaggio requested review from a team as code owners April 26, 2024 15:43

DerekMaggio requested review from y3rsh, sfoster1, SyntaxColoring, b-cooper, shlokamin and a team April 26, 2024 15:45

DerekMaggio requested a review from mjhuff April 26, 2024 15:46

DerekMaggio changed the title ~~Getting performance metrics to work on robot~~ feat(robot-server, api, performance-metrics): get performance metrics to work on robot Apr 26, 2024

DerekMaggio marked this pull request as draft April 26, 2024 15:53

DerekMaggio self-assigned this Apr 26, 2024

DerekMaggio added the DO NOT MERGE Indicates a PR should not be merged, even if there's a shiny green merge button available label Apr 26, 2024

DerekMaggio changed the base branch from EXEC-408-add-storing-analysis-context-on-track to edge April 26, 2024 16:01

DerekMaggio added 2 commits April 26, 2024 09:55

Merge branch 'edge' into getting-performance-metrics-to-work-on-robot

57adce3

chore: bad conflict resolution skills

cb27ed1

sfoster1 requested changes Apr 26, 2024

View reviewed changes

SyntaxColoring reviewed Apr 26, 2024

View reviewed changes

For review

c85c295

y3rsh removed the DO NOT MERGE Indicates a PR should not be merged, even if there's a shiny green merge button available label May 31, 2024

DerekMaggio closed this Jun 3, 2024

This was referenced Jun 3, 2024

chore(api, performance-metrics): clean up performance-metrics tracking #15289

Open

chore: configure project to go on robot #15316

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(robot-server, api, performance-metrics): get performance metrics to work on robot #15008

feat(robot-server, api, performance-metrics): get performance metrics to work on robot #15008

DerekMaggio commented Apr 25, 2024 •

edited

y3rsh commented Apr 26, 2024

DerekMaggio commented Apr 26, 2024

sfoster1 left a comment

SyntaxColoring Apr 26, 2024

SyntaxColoring Apr 26, 2024 •

edited

DerekMaggio commented Jun 3, 2024

feat(robot-server, api, performance-metrics): get performance metrics to work on robot #15008

feat(robot-server, api, performance-metrics): get performance metrics to work on robot #15008

Conversation

DerekMaggio commented Apr 25, 2024 • edited

Overview

Test Plan

Changelog

Review requests

Risk assessment

y3rsh commented Apr 26, 2024

DerekMaggio commented Apr 26, 2024

sfoster1 left a comment

Choose a reason for hiding this comment

SyntaxColoring Apr 26, 2024

Choose a reason for hiding this comment

SyntaxColoring Apr 26, 2024 • edited

Choose a reason for hiding this comment

DerekMaggio commented Jun 3, 2024

DerekMaggio commented Apr 25, 2024 •

edited

SyntaxColoring Apr 26, 2024 •

edited