New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(protocol-engine): implement pause command #8161
Conversation
Codecov Report
@@ Coverage Diff @@
## edge #8161 +/- ##
==========================================
+ Coverage 87.31% 88.19% +0.87%
==========================================
Files 429 430 +1
Lines 22572 24633 +2061
==========================================
+ Hits 19708 21724 +2016
- Misses 2864 2909 +45
Continue to review full report at Codecov.
|
460c00d
to
64c749d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me! I'll ✅ after I smoke-test on a robot.
With this PR, a protocol-engine protocol may pause itself and be resumed by an
POST .../actions { "actionType": "resume" }
HTTP request.
Edit: Moved this comment onto #8041.
I don't have a strong use case for this yet, but I've been thinking we should have three separate actions:
- A "pause from the outside" action. Issued by HTTP clients, and by pressing the OT-2's front button.
- A resume action that undoes (1).
- A resume action that undoes pauses that are built into the protocol.
Because I think an HTTP client should be able to naively undo its own pause and be confident that the robot will be left exactly how it was before, without knowing anything deep about the protocol.
For example, say you're writing a client to issue HTTP commands to the OT-2 to coordinate it with other equipment. You want to implement a "stop everything" button. You also want to implement a button that resumes from the "stop everything."
If the OT-2 was in a protocol-issued pause state when the user clicked "stop everything," you probably wouldn't want the OT-2 to start moving on its own when they click resume. You'd expect the OT-2 to go back to doing exactly what it was before—and what it was doing was waiting.
Currently, to work around this, you'd have to know that the robot is currently on a protocol-issued pause, and, as a special case, avoid resuming it.
Edit: And I haven't thought about how this interacts with the door opening and closing.
def __init__(self) -> None: | ||
"""Construct a command translator""" | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
Not necessarily for this PR, but would decoy
still work if we made all these methods @staticmethod
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, Decoy doesn't do a whole lot with the spec object other than try to figure out if a given spy should be sync or async. So in practice @staticmethod
doesn't really make a difference. I think SessionView
in robot-server
is all static, and we mock it out in the router tests.
Thinking about it, I don't have a test case in Decoy covering @staticmethod
and inspect.signature
support, so there's a chance that's not quite right, but inspect.signature
support is only really needed in specific cases, like not blowing up FastAPI's DI system
Edit: Yeah so inspect.signature
support for @staticmethod
was definitely broken: mcous/decoy#51
Co-authored-by: Max Marrone <max@opentrons.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Tested on a robot with a python protocol and it works as expected! The only non-intuitive thing was that both a start
and a resume
action resumes a paused protocol. But that's a UX question and easy to change later so I'm good with this for now.
) | ||
|
||
data = PauseData(message="hello world") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get confused whether/when to decoy.verify
that a method wasn't called before our test call (before subject.execute
in this case). Is it that in this case we know all dependencies are decoys so we are sure that a pause would not have been called before, and so we don't need to verify that it was not called before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, the key is that we haven't interacted with our test subject yet. Usually, if I'm putting a times=0
check (or assertNotCalled
or whatever), I'm doing it because something in the test has interacted with the subject, so it's conceivable that something could go wrong.
Maybe, for example, you need to call a setup method of the subject before you call the specific method you're testing. But that also might be a sign that the interaction is too complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Makes sense.
But that also might be a sign that the interaction is too complicated
Good point
class Params4(BaseModel): | ||
wait: Union[float, Literal[True]] = Field( | ||
class DelayCommandParams(BaseModel): | ||
wait: Union[Literal[True], float] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this try to cast a wait
value to the first type in the Union. What will happen if we specify a delay of 1 second.. or actually, any non-zero value?
I know it doesn't apply to this PR but something to keep an eye out for when we implement delay
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good call. I changed this because { "wait": true }
in JSON was getting cast to wait=1
in Python, but I didn't think to check that it worked the other way once I reordered it.
If it breaks in the way you theorize (makes a lot of sense to me that it would break this way), I think maybe if we switch the order back and go with a StrictFloat
instead we might get the behavior we need
Oops, good callout. That wasn't intentional on my part, but makes sense why it's happening. That behavior feels like continuing pressure to change the runner / engine relationship, to me.
If it's cool with y'all, would like to punt that HTTP layer stuff to another ticket. Ideally, I would like:
|
Ya, we haven't really had the design discussion about the endpoints and action rules. Definitely worth a separate ticket. |
Tested a modified
And both
Yep, broadly makes sense to me.
Yep. Well, I think Protocol Engine should be the thing determining if an action is valid. Since Protocol Engine is the single source of truth of protocol state, it should also be the single source of truth for protocol state transitions. But maybe that's already what you have in mind.
Yep.
I find myself wondering if we need separate Like, maybe creating a protocol session automatically sets it up paused at step 0. And the "Start run" button just unpauses it from that state. Instead of clients doing this: if protocol.paused:
show_resume_button()
elif protocol.loaded:
show_start_button()
else:
show_pause_button() They'd do this: if protocol.paused:
if protocol.next_step == protocol.steps[0]:
show_start_button() # Never before started.
else:
show_resume_button() # Started and then paused mid-run.
else:
show_pause_button() And the start button and resume button would be implemented through the same underlying |
Overview
This PR builds on the play/pause logic of #8152 by adding a
Pause
command to the ProtocolEngine, the engine's SyncClient, and the JSON CommandTranslator.With this PR, a protocol-engine protocol may pause itself and be resumed by an
POST .../actions { "actionType": "resume" }
HTTP request. Closes #7918.Changelog
I tried make a commit at each iteration of my TDD loop, so if you're curious about that, I recommend checking out the diff one commit at a time. Roughly, this flow was:
opentrons.protocol_api_experimental.ProtocolContext.pause
and tests, shaking out changes toSyncClient.pause
SyncClient.pause
, shaking outPause
,PauseData
, andPauseResult
command value objectsPauseImplementation
, shaking outopentrons.protocol_engine.execution.RunControlHandler
CommandTranslator
logic for translatingPause
commandsCommandTranslator
intoopentrons.file_runner
RunControlHandler.pause
, shaking outprotocol_engine.state.CommandView.get_is_running
CommandView.get_is_running
, completing the featureReview requests
As usual, the
enableProtocolEngine
feature flag has to be on for this one. I've updated the Postman collection with a new JSON protocol:Simple Test Protocol With Pause.json
.Smoke test plan
This is the test procedure I ran on my robot.
While #8151 is outstanding, it remains easier to test this with JSON protocols than Python protocols, but the behavior should be exactly the same.POST /protocols
withfiles: Simple Test Protocol With Pause.json
testosaur_v3.py
to have actx.pause()
in therePOST /sessions
POST /sessions/:id/actions { actionType: "start" }
pause
action should have a status ofrunning
while the robot is waitingPOST /sessions/:id/actions { actionType: "resume" }
{ actionType: "pause" }
and{ actionType: "resume" }
requests if you wantRisk assessment
Low, but keep in mind this is foundational work for important future functionality.