feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

mjhuff · 2024-05-13T12:54:11Z

Overview

Adds the ability to retry the command that caused the run to enter Error Recovery, resuming the run after the issued command completes. Implementing the retry logic requires laying out (hopefully) most of the remaining app-side foundational work, therefore that is included here as well.

There is a good bit of noise in this PR. A few specifics are mentioned via comments. Below are the major highlights and discussion points:

Chaining Fixit Commands

In order to chain (or even create) commands with a fixit intent, we utilize useRecoveryCommands. I currently make the assumption that we do not want to handle fixit commands that fail during error recovery, so we always continuePastCommandFailure. Whether we actually want this behavior or not (or make this dependent on the exact fixit command) is an involved conversation and must also ensure error recovery can't possibly cause an infinite loop (such as a faulty sensor always detecting no tip attached after the user dispatches additional fixit pick up tip commands).

Identifying the Failed Run Command

Designs utilize splash pages that require knowledge of the failed command (example). While identifying and retaining the failed command is simple on the ODD, the complexity increases when we add desktop apps into the mix. A desktop user may be anywhere in the app and then may navigate to Error Recovery while a user engages (or has engaged) with Error Recovery on the ODD. In order to determine which command is the failed command, we must navigate backwards through the active run's list of commands, finding the most recent failed command with a protocol intent. And that's the issue; because fixit commands are added to the run command list, the following scenario may occur:

User A on the ODD creates a significantly large number of fixit commands. These commands are appended to the protocol run commands list.
User B on the desktop app navigates to the run, and the app attempts to find the last failed protocol command. No failed command is found when querying the server, which occurs due to the failed command being farther back in history than the pageLength sent to GET /runs/:runId/commands returned. See this discussion for some background as to why this is the case.

@sanni-t's recent PR #15031 solves the serialization issue for completed runs. We could consider extending this functionality for commands during an on-going run (and maybe add a rate limiter?!?). @sfoster1 's thoughts are highlighted in the slack convo linked above after discussion in-person.

Robot Motion State

If you look at designs, there are a lot of different "robot in motion" screens. Sometimes we only want the motion screen when the robot does a command, and sometimes we want to persist it longer or indefinitely. I think the best way to handle motion is to explicitly control it by setting the motion screen before we do a command and then control it afterwards. This is a departure from the implicit RobotInMotion component takeover that all the other wizard flows use, but I don't think that's going to work for us, and even if it is possible, the control flow would be very hard to follow.

Test Plan

Turn on the BE/FE FFs for ER.
Run any protocol with a Pick up tip command and don't pick up the tip. Proceed through the ER flows. Verify the initial Z-home occurs when launching ER, and clicking "resume" (see Review requests - designs are quite different than current flows and that's ok) causes a pick up tip command.
NOTE: The retry failed command pick up tip will actually fail until fix(api): only add tip if pickup successful #15143 merges.
Verify the run resumes. There may be some oddness with the command text on the desktop app/odd, but that's ok for now.

Changelog

Added "retry failed command" option in Error Recovery.

Review requests

Designs have changed quite a bit in the past couple of weeks. This PR is just focused on functionality and structure!

Risk assessment

low -- behind a ff

…covery Adds the ability to retry command that caused the run to enter error recovery. Resumes the run after completing the failed command.

mjhuff

Some explanatory comments below.

mjhuff · 2024-05-13T12:56:00Z

app/src/organisms/ErrorRecoveryFlows/ErrorRecoveryWizard.tsx

Diff this against the old index.tsx to see changes.

The differences are the new route for the "retrying command" view, the useInitialPipetteHome, and the new name, ErrorRecoveryWizard.

app/src/organisms/ErrorRecoveryFlows/utils.ts

mjhuff · 2024-05-13T13:02:55Z

app/src/organisms/ErrorRecoveryFlows/index.tsx

+  runStatus: RunStatus | null
+): UseErrorRecoveryResult {
+  const [isEREnabled, setIsEREnabled] = React.useState(false)
+  const failedCommand = useCurrentlyFailedRunCommand(runId, runStatus)


Not directly exporting useCurrentlyFailedRunCommand may be something we have to refactor out later, as this implies that we only would ever need a failedCommand for reasons related to Error Recovery.

DerekMaggio · 2024-05-13T19:57:41Z

app/src/organisms/ErrorRecoveryFlows/ErrorRecoveryWizard.tsx

+   * Recovery Step: Analogous to a "step" in other wizard flows.
+   */
+  const [recoveryMap, setRecoveryMap] = React.useState<IRecoveryMap>({
+    route: RECOVERY_MAP.ROBOT_IN_MOTION.ROUTE, // Initialize the route to "in motion", since wizard always start with a home command.


Do we expect the robot to already be in motion when we start this wizard?
Or does the robot not start homing until the useInitialPipetteHome call like 15 lines down?

If it is already in motion, then this makes sense, if it is not, should useInitialPipetteHome be the thing that adds the ROBOT_IN_MOTION step to the route?

Good Q - doing it this way prevents us initially rendering some other screen for one render cycle before the in-motion screen. This problem is really only noticeable if you test it on the physical ODD, and it looks bad enough IMO that I'd rather eagerly set the robot to 'in motion' immediately and not have the screen immediately changing as soon as we launch ER.

@DerekMaggio After further investigation, I found a better React way of doing this while initializing the route to something more sensible, so thank you for calling this out!

sfoster1 · 2024-05-13T20:32:10Z

app/src/organisms/ErrorRecoveryFlows/ErrorRecoveryWizard.tsx

+  React.useEffect(() => {
+    void setRobotInMotion(true)
+      .then(() => homePipetteZAxes())
+      .then(() => setRobotInMotion(false))


probably wants to be .finally()

sfoster1 · 2024-05-13T20:33:24Z

app/src/organisms/ErrorRecoveryFlows/ErrorRecoveryWizard.tsx

+      <ErrorRecoveryHeader errorKind={props.errorKind} />
+      <ErrorRecoveryContent {...props} />
+    </Flex>,
+    getTopPortalEl()


probably wants to be getModalPortalEl if this is on the desktop

Ah yep, great point. Given hi-fi designs, it looks like we may have some modal-within-modals as well, so we'd want the lower zIndex for ODD as well.

sfoster1 · 2024-05-13T20:36:50Z

app/src/organisms/ErrorRecoveryFlows/index.tsx

-    getTopPortalEl()
-  )
+interface UseErrorRecoveryResult {
+  isEREnabled: boolean


kind of a nit, but the use of "enabled" here makes you think this is a wrapper for the feature flag at first glance. can we call it like isERActive and so on?

sfoster1 · 2024-05-13T20:55:25Z

app/src/organisms/ErrorRecoveryFlows/utils.ts

+// Otherwise, returns null.
+const ALL_COMMANDS_POLL_MS = 5000
+
+export function useCurrentlyFailedRunCommand(


credit to @SyntaxColoring for this one - what if we added a cursor to GET /runs/:runid/commands that was "last failed command" so we didn't have to parse the command list at all?

Are you talking about a server side change? If app side, how would this work with a desktop app that navigates to the run after 100+ fixit commands? Wouldn't it need knowledge of which command failed to know which cursor to use? CC @SyntaxColoring

Yeah, a server side change.

EXEC-458

I'm working on the server part of this now.

sfoster1

Looks good to me!

feat(app, react-api-client): add retry failed command during error re…

0ff7b51

…covery Adds the ability to retry command that caused the run to enter error recovery. Resumes the run after completing the failed command.

mjhuff requested a review from a team May 13, 2024 12:54

mjhuff requested a review from a team as a code owner May 13, 2024 12:54

mjhuff requested review from smb2268 and removed request for a team and smb2268 May 13, 2024 12:54

mjhuff commented May 13, 2024

View reviewed changes

Merge branch 'edge' into retry-failed-command

cd46e87

DerekMaggio reviewed May 13, 2024

View reviewed changes

sfoster1 reviewed May 13, 2024

View reviewed changes

mjhuff added 4 commits May 20, 2024 10:04

Merge branch 'edge' into retry-failed-command

0878f4e

fix in motion routing

7f119a1

feedback

a4f5c28

lint

79bd847

mjhuff requested review from sfoster1 and a team May 20, 2024 20:36

lower z-index for both desktop & odd

c7a7fb0

sfoster1 approved these changes May 21, 2024

View reviewed changes

mjhuff merged commit dac2589 into edge May 21, 2024
21 checks passed

mjhuff deleted the retry-failed-command branch May 21, 2024 13:57

SyntaxColoring mentioned this pull request May 22, 2024

feat(app,robot-server): Subscribe to currentlyRecoveringFrom notifications #15242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

mjhuff commented May 13, 2024 •

edited

mjhuff left a comment

mjhuff May 13, 2024

mjhuff May 13, 2024

DerekMaggio May 13, 2024

mjhuff May 13, 2024

mjhuff May 20, 2024

sfoster1 May 13, 2024

sfoster1 May 13, 2024

mjhuff May 21, 2024

sfoster1 May 13, 2024

sfoster1 May 13, 2024

mjhuff May 13, 2024 •

edited

sfoster1 May 14, 2024

SyntaxColoring May 16, 2024 •

edited

sfoster1 left a comment

feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

Conversation

mjhuff commented May 13, 2024 • edited

Overview

Chaining Fixit Commands

Identifying the Failed Run Command

Robot Motion State

Test Plan

Changelog

Review requests

Risk assessment

mjhuff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjhuff May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring May 16, 2024 • edited

Choose a reason for hiding this comment

sfoster1 left a comment

Choose a reason for hiding this comment

mjhuff commented May 13, 2024 •

edited

mjhuff May 13, 2024 •

edited

SyntaxColoring May 16, 2024 •

edited