Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(protocol-engine): implement pause command #8161

Merged
merged 7 commits into from Aug 2, 2021
Merged

Conversation

mcous
Copy link
Contributor

@mcous mcous commented Jul 26, 2021

Overview

This PR builds on the play/pause logic of #8152 by adding a Pause command to the ProtocolEngine, the engine's SyncClient, and the JSON CommandTranslator.

With this PR, a protocol-engine protocol may pause itself and be resumed by an POST .../actions { "actionType": "resume" } HTTP request. Closes #7918.

Changelog

I tried make a commit at each iteration of my TDD loop, so if you're curious about that, I recommend checking out the diff one commit at a time. Roughly, this flow was:

  1. Add opentrons.protocol_api_experimental.ProtocolContext.pause and tests, shaking out changes to SyncClient.pause
  2. Add SyncClient.pause, shaking out Pause, PauseData, and PauseResult command value objects
  3. Implement PauseImplementation, shaking out opentrons.protocol_engine.execution.RunControlHandler
  4. Implement CommandTranslator logic for translating Pause commands
    • I should've done this first or second, because this is an entry point of the feature, but I forgot!
    • It ended up being fine, though, because I still hadn't gotten to any actual pause logic
    • I also took this opportunity to move the CommandTranslator into opentrons.file_runner
  5. Implement RunControlHandler.pause, shaking out protocol_engine.state.CommandView.get_is_running
  6. Implement CommandView.get_is_running, completing the feature

Review requests

As usual, the enableProtocolEngine feature flag has to be on for this one. I've updated the Postman collection with a new JSON protocol: Simple Test Protocol With Pause.json.

Smoke test plan

This is the test procedure I ran on my robot. While #8151 is outstanding, it remains easier to test this with JSON protocols than Python protocols, but the behavior should be exactly the same.

  1. POST /protocols with files: Simple Test Protocol With Pause.json
    • Or: modify testosaur_v3.py to have a ctx.pause() in there
  2. POST /sessions
  3. POST /sessions/:id/actions { actionType: "start" }
  4. Protocol will pause after one transfer
    • Take this opportunity to inspect state
    • The pause action should have a status of running while the robot is waiting
  5. POST /sessions/:id/actions { actionType: "resume" }
    • Robot should continue moving
  6. Continue issuing { actionType: "pause" } and { actionType: "resume" } requests if you want

Risk assessment

Low, but keep in mind this is foundational work for important future functionality.

@mcous mcous added robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience). protocol-engine Ticket related to the Protocol Engine project and associated HTTP APIs labels Jul 26, 2021
@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #8161 (9c16a35) into edge (881a67a) will increase coverage by 0.87%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             edge    #8161      +/-   ##
==========================================
+ Coverage   87.31%   88.19%   +0.87%     
==========================================
  Files         429      430       +1     
  Lines       22572    24633    +2061     
==========================================
+ Hits        19708    21724    +2016     
- Misses       2864     2909      +45     
Impacted Files Coverage Δ
...service/session/command_execution/base_executor.py 66.66% <0.00%> (-13.34%) ⬇️
robot-server/robot_server/system/time_utils.py 79.36% <0.00%> (-4.36%) ⬇️
...bot_server/robot/calibration/deck/state_machine.py 90.47% <0.00%> (-1.84%) ⬇️
...rver/robot_server/service/pipette_offset/router.py 91.83% <0.00%> (-1.50%) ⬇️
...t-server/robot_server/service/tip_length/router.py 95.65% <0.00%> (-1.13%) ⬇️
...rver/service/session/models/command_definitions.py 96.07% <0.00%> (-1.11%) ⬇️
...ot-server/robot_server/service/protocol/analyze.py 95.23% <0.00%> (-0.54%) ⬇️
..._server/service/notifications/handle_subscriber.py 92.10% <0.00%> (-0.49%) ⬇️
.../robot_server/service/legacy/routers/networking.py 98.14% <0.00%> (-0.39%) ⬇️
robot-server/robot_server/settings.py 100.00% <0.00%> (ø)
... and 93 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 881a67a...9c16a35. Read the comment docs.

Base automatically changed from engine_action-reaction to edge July 28, 2021 22:28
@mcous mcous marked this pull request as ready for review July 28, 2021 22:38
@mcous mcous requested a review from a team as a code owner July 28, 2021 22:38
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me! I'll ✅ after I smoke-test on a robot.

With this PR, a protocol-engine protocol may pause itself and be resumed by an POST .../actions { "actionType": "resume" } HTTP request.


Edit: Moved this comment onto #8041.

Not for this PR:

I don't have a strong use case for this yet, but I've been thinking we should have three separate actions:

  1. A "pause from the outside" action. Issued by HTTP clients, and by pressing the OT-2's front button.
  2. A resume action that undoes (1).
  3. A resume action that undoes pauses that are built into the protocol.

Because I think an HTTP client should be able to naively undo its own pause and be confident that the robot will be left exactly how it was before, without knowing anything deep about the protocol.

For example, say you're writing a client to issue HTTP commands to the OT-2 to coordinate it with other equipment. You want to implement a "stop everything" button. You also want to implement a button that resumes from the "stop everything."

If the OT-2 was in a protocol-issued pause state when the user clicked "stop everything," you probably wouldn't want the OT-2 to start moving on its own when they click resume. You'd expect the OT-2 to go back to doing exactly what it was before—and what it was doing was waiting.

Currently, to work around this, you'd have to know that the robot is currently on a protocol-issued pause, and, as a special case, avoid resuming it.

Edit: And I haven't thought about how this interacts with the door opening and closing.

api/src/opentrons/file_runner/json_command_translator.py Outdated Show resolved Hide resolved
Comment on lines -21 to -24
def __init__(self) -> None:
"""Construct a command translator"""
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

Not necessarily for this PR, but would decoy still work if we made all these methods @staticmethod, too?

Copy link
Contributor Author

@mcous mcous Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, Decoy doesn't do a whole lot with the spec object other than try to figure out if a given spy should be sync or async. So in practice @staticmethod doesn't really make a difference. I think SessionView in robot-server is all static, and we mock it out in the router tests.

Thinking about it, I don't have a test case in Decoy covering @staticmethod and inspect.signature support, so there's a chance that's not quite right, but inspect.signature support is only really needed in specific cases, like not blowing up FastAPI's DI system


Edit: Yeah so inspect.signature support for @staticmethod was definitely broken: mcous/decoy#51

Co-authored-by: Max Marrone <max@opentrons.com>
Copy link
Member

@sanni-t sanni-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Tested on a robot with a python protocol and it works as expected! The only non-intuitive thing was that both a start and a resume action resumes a paused protocol. But that's a UX question and easy to change later so I'm good with this for now.
▶️

)

data = PauseData(message="hello world")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get confused whether/when to decoy.verify that a method wasn't called before our test call (before subject.execute in this case). Is it that in this case we know all dependencies are decoys so we are sure that a pause would not have been called before, and so we don't need to verify that it was not called before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, the key is that we haven't interacted with our test subject yet. Usually, if I'm putting a times=0 check (or assertNotCalled or whatever), I'm doing it because something in the test has interacted with the subject, so it's conceivable that something could go wrong.

Maybe, for example, you need to call a setup method of the subject before you call the specific method you're testing. But that also might be a sign that the interaction is too complicated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Makes sense.

But that also might be a sign that the interaction is too complicated

Good point

class Params4(BaseModel):
wait: Union[float, Literal[True]] = Field(
class DelayCommandParams(BaseModel):
wait: Union[Literal[True], float] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this try to cast a wait value to the first type in the Union. What will happen if we specify a delay of 1 second.. or actually, any non-zero value?
I know it doesn't apply to this PR but something to keep an eye out for when we implement delay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good call. I changed this because { "wait": true } in JSON was getting cast to wait=1 in Python, but I didn't think to check that it worked the other way once I reordered it.

If it breaks in the way you theorize (makes a lot of sense to me that it would break this way), I think maybe if we switch the order back and go with a StrictFloat instead we might get the behavior we need

@mcous
Copy link
Contributor Author

mcous commented Aug 2, 2021

The only non-intuitive thing was that both a start and a resume action resumes a paused protocol

Oops, good callout. That wasn't intentional on my part, but makes sense why it's happening. That behavior feels like continuing pressure to change the runner / engine relationship, to me.

  • POST { "actionType": "start" } calls runner.run, which also calls engine.play
  • POST { "actionType": "resume" } calls engine.play
  • Neither HTTP request is validated as something that is allowed to happen at that time

If it's cool with y'all, would like to punt that HTTP layer stuff to another ticket. Ideally, I would like:

  • Validation of actions at the router level
  • More alignment between HTTP action names and ProtocolEngine method names
  • Potentially, a single HTTP action to both start and pause

@sanni-t
Copy link
Member

sanni-t commented Aug 2, 2021

If it's cool with y'all, would like to punt that HTTP layer stuff to another ticket. Ideally, I would like:

  • Validation of actions at the router level
  • More alignment between HTTP action names and ProtocolEngine method names
  • Potentially, a single HTTP action to both start and pause

Ya, we haven't really had the design discussion about the endpoints and action rules. Definitely worth a separate ticket.

@SyntaxColoring
Copy link
Contributor

SyntaxColoring commented Aug 2, 2021

Tested a modified testosaur_v3.py and it works as expected, with the caveat that @sanni-t pointed out.

The only non-intuitive thing was that both a start and a resume action resumes a paused protocol

And both start and resume actions can start loaded protocols.

Oops, good callout. That wasn't intentional on my part, but makes sense why it's happening.
[...]
If it's cool with y'all, would like to punt that HTTP layer stuff to another ticket....

Yep, broadly makes sense to me.

Ideally, I would like:

  • Validation of actions at the router level

Yep.

Well, I think Protocol Engine should be the thing determining if an action is valid. Since Protocol Engine is the single source of truth of protocol state, it should also be the single source of truth for protocol state transitions. But maybe that's already what you have in mind.

  • More alignment between HTTP action names and ProtocolEngine method names

Yep.

  • Potentially, a single HTTP action to both start and pause

I find myself wondering if we need separate start and resume actions, and separate "ready" and "paused" states.

Like, maybe creating a protocol session automatically sets it up paused at step 0. And the "Start run" button just unpauses it from that state.

Instead of clients doing this:

if protocol.paused:
    show_resume_button()
elif protocol.loaded:
    show_start_button()
else:
    show_pause_button()

They'd do this:

if protocol.paused:
    if protocol.next_step == protocol.steps[0]:
        show_start_button()  # Never before started.
    else:
        show_resume_button()  # Started and then paused mid-run.
else:
    show_pause_button()

And the start button and resume button would be implemented through the same underlying resume HTTP action.

@SyntaxColoring SyntaxColoring self-requested a review August 2, 2021 16:30
@mcous mcous merged commit 65d64ae into edge Aug 2, 2021
@mcous mcous deleted the engine_pause-command branch August 2, 2021 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol-engine Ticket related to the Protocol Engine project and associated HTTP APIs robot-svcs Falls under the purview of the Robot Services squad (formerly CPX, Core Platform Experience).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protocol Engine: Allow a run to pause itself and resume upon user interaction
3 participants