feat(async-flow): asyncFlow #9097

erights · 2024-03-18T09:25:07Z

closes: #9302
refs: #9125, #9126 #9153 #9154, #9280

Description

Upgrade while suspended at await points! Uses membrane to log and replay everything that happened before each upgrade.

In the first incarnation, somewhere, using a closed async function argument

const wrapperFunc = asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);

then elsewhere, as often as you'd like

const outcomeVow = wrapperFunc(...);

For all these asyncFlow calls that happened in the first incarnation, in the first crank of all later incarnations

asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);

with async functions that reproduce the original's logged behavior. In these later incarnations, you only need to capture the returned wrapperFunc if you want to create new activations. Regardless, the old activations continue.

Future Growth

I designed this PR so it could grow in the following ways:

TODO: The membrane should use the HandledPromise API to make proper remote presences and handled promises, so that the guest function can use E on objects or promises it receives from the host as expected. I commented out the additional ops needed for these: doSend and checkSend.
TODO: Currently, I assume that the host side only presents vows, not promises. However, imported remote promises can be stored durably, and be durably watched, so the membrane should be extended to handle those.
TODO: We currently impose the restriction that the guest cannot export to the host guest-created remotables or promises. (It can pass back to the host remotables or promises it received from the host.) I commented out the additional ops needed for these: doCall, checkReturn and checkThrow. I wrote the bijection and equate subsystems so that old durable host wrappers can be hooked back up on replay to the corresponding new guest remotables and promises.

Security Considerations

Nothing enforces that the argument async function is closed, i.e., does not capture (lexically "close over") any direct access to mutable state or ability to cause effects. If it does, it still cannot harm anything but itself. But it -- or its replayings -- may be more confusable, and so more vulnerable to confusion attacks.

Since this entire framework itself is written as a helper (well, a huge helper) with no special privilege, it cannot be used to do anything that could not have otherwise been done. It is not a source of new authority.

See caveats in Upgrade Considerations about isolation of effects following a replay failure.

Scaling Considerations

We assume that the total number of async functions, i.e., calls to asyncFlow, are low cardinality. This is essential to the design, in exactly the same sense as our assumption that exoClasses are low cardinality. The RAM cost is proportional to the number of these.

The current implementation further assumes that the total number of activations of these replayable async functions are low cardinality. This will likely be a scaling problem at some point, but is unlikely to be painful for the initial expected use cases.

The current implementation imposes two simplifying restrictions that allow us to relieve some of this memory pressure: Currently, the async function argument cannot make and export new remotables or promises to the host. Thus, once its outcomeVow is settled, its job is done. There is very little more it can do that is observable. Thus, once this outcome is settled, the activation is shut down and most of the memory it held onto is dropped.

Of the activations not shut down, they must replay from the beginning in each incarnation. If a given activation has a long history of past activity, this can become expensive.

How do we verify in CI that when an asyncFlow is in use & when it has completed, resource usage in RAM & on disk meet our expectations?

The PR assumes low cardinality of asyncFlows. what scale is low cardinality - Is 10^3, 10&5? What is the risk if cardinality is too high?

Documentation Considerations

For the normal developer documentation, asyncFlow should make things simpler and more like "just JavaScript". The membrane translates between host vows and guest promises, so the async function can simply await on promises without needing the when from @agoric/vow.

Testing Considerations

This PR is structured as a tower of building blocks, where I unit tested each as I went, in bottom up order, in order to build with confidence. Currently, each of these building blocks is also very paranoid about internal consistency checking, so I'd get early indications if I made a mistake. Probably some of this internal consistency checking can be reduced over time, as we gain more static confidence.

This PR is currently using the fake upgrade testing framework from the @agoric/zone package. This caused bug #9126. Instead, we need to redo all these tests with a real upgrade testing framework, like the one in bootstrapTests. See https://github.com/Agoric/agoric-sdk/blob/master/packages/boot/test/bootstrapTests/test-zcf-upgrade.ts

Upgrade Considerations

The point.

In an reviving incarnation, if the async function argument of

asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);

fails to recapitulate the logs of each of its activations, those activations will not have done any damage. They will simply be stuck, with a diagnostic available via

adminAsyncFlow.getFailures(),

To unstick these, so those stuck activations can continue to make progress, upgrade again using an async function argument that does reproduce the logged behavior of each of its activations.

Caveat: Imperfect isolation of effects following a replay failure

Once a replay failure is detected, we attempt to isolate the replaying activation from its outside world, and to also shut down its further execution as soon as possible. But it may be in the midst of activity with a non-empty stack. Following a replay failure, we have no way to preemptively terminate it without any further execution of the activation. This further execution may therefore be arbitrarily confused. We simply try to isolate it as much as possible, immediately revoking all access it had through the membrane to all authority to cause observable effects. However,

We do not consider console logging activity to be an observable effect. These might be caused by diagnostics emitted by this framework in response to its "isolated" confused behavior.
Because we do not consider console logging to be an observable effect, we also allow this as an exception to our closed function rule. Messages it sends directly to the console are not logged, and can differ without causing replay failure. During its post-replay-failure confused execution, it can still directly log to the console.
It is not resource limited, so its post-replay confused execution can accidentally engage in resource exhaustion attacks, including infinite loops. However, the vat as a whole is resource limited. An infinite loop will eventually crash the vat, which can then be recovered with yet another upgrade.
- Because of metering, an activation that executed successfully in a previous incarnation might not replay correctly, even if it doesn't cause any explicit side-effects. That's because metering is a hidden side-effect of any execution.

erights · 2024-03-22T05:55:37Z

OMG it's ready for review!

mhofman

The biggest issue I have is with how host vow fulfillment is handled. I think there is a big problem with the current approach, but maybe I missed something. Let's chat.

packages/async-flow/README.md

packages/async-flow/docs/async-flow-states.md

packages/async-flow/src/replay-membrane.js

packages/async-flow/src/weak-bijection.js

packages/async-flow/src/async-flow.js

packages/async-flow/src/replay-membrane.js

Staged on #9097 closes: #XXXX refs: #9231 #9321 #9329 #9097 ## Description Now that `watchPromise` is abstracted over zones and vows make use of that, we can simplify the test cases a lot. Should be a pure refactor with no externally observable effect. ### Security Considerations none ### Scaling Considerations none ### Documentation Considerations none ### Testing Considerations tests simpler. Otherwise, none ### Upgrade Considerations none

erights

Github won't let me request changes of myself, but me, please fix this.

packages/async-flow/src/bijection.js

mhofman

Audited the membrane for stopped state check. I think we could add an extra check in the guest call handler to avoid relying on the bijection being reset.

packages/async-flow/src/replay-membrane.js

mhofman

In the process of reviewing test, I realized the wakeAll behavior was not quite as expected it was. Given the change to the data model, I think this needs to be addressed before first release, either now or in a closely following PR.

Furthermore in the upgrading considerations, and maybe somewhere in the documentation, we should mention that because of metering, an activation that executed successfully in a previous incarnation might not replay correctly, even if it doesn't cause any explicit side-effects. That's because metering is a hidden side-effect of any execution.

packages/async-flow/src/async-flow.js

Staged on #9097 closes: #XXXX refs: #9231 #9321 #9329 #9097 ## Description Now that `watchPromise` is abstracted over zones and vows make use of that, we can simplify the test cases a lot. Should be a pure refactor with no externally observable effect. ### Security Considerations none ### Scaling Considerations none ### Documentation Considerations none ### Testing Considerations tests simpler. Otherwise, none ### Upgrade Considerations none

packages/async-flow/src/bijection.js

packages/async-flow/src/convert.js

packages/async-flow/src/equate.js

mhofman

The tests look mostly good. I'm a little concerned that the waking behavior on new incarnation is not quite tested right, in particular if a flow is not failed and not eagerly waking, whether the durable vow watcher kicks in.

Conditional approval on tweaking this if easily doable (I think it should be now that we fixed auto waking behavior)

packages/async-flow/test/async-flow.test.js

packages/async-flow/test/convert.test.js

packages/async-flow/test/equate.test.js

packages/async-flow/test/log-store.test.js

packages/async-flow/test/replay-membrane-settlement.test.js

erights self-assigned this Mar 18, 2024

erights force-pushed the markm-async-flow branch 15 times, most recently from 00ae6ae to 95efeaa Compare March 21, 2024 02:10

This was referenced Mar 21, 2024

test(zone,vow): is this a watch revival bug? #9125

Draft

old vow watch not woken up by fulfillment in new incarnation. #9126

Open

erights requested review from mhofman and michaelfig March 21, 2024 03:53

erights force-pushed the markm-async-flow branch from 9774ce3 to 0ccd48f Compare March 22, 2024 04:36

erights changed the title ~~DRAFT feat(zone): asyncFlow~~ feat(zone): asyncFlow Mar 22, 2024

erights marked this pull request as ready for review March 22, 2024 04:38

erights requested a review from dtribble March 22, 2024 04:39

erights force-pushed the markm-async-flow branch from 0ccd48f to fb9b798 Compare March 22, 2024 05:02

erights force-pushed the markm-async-flow branch from c10ed0e to 0164eaf Compare March 23, 2024 22:51

This was referenced Mar 24, 2024

Obsolete membrane spike. Pushed just for reference endojs/endo#2158

Closed

feat(marshal): dot-membrane logs for replay endojs/endo#2130

Closed

feat(internal): durable dot-membrane #9053

Closed

erights force-pushed the markm-patch-exo-track-2247 branch from cba0176 to fa9766a Compare May 6, 2024 22:36

erights force-pushed the markm-async-flow branch from 7e3857c to 814768a Compare May 6, 2024 22:38

erights force-pushed the markm-patch-exo-track-2247 branch from fa9766a to a326ce8 Compare May 7, 2024 20:02

erights force-pushed the markm-async-flow branch from 814768a to a71b3a0 Compare May 7, 2024 20:03

erights force-pushed the markm-patch-exo-track-2247 branch from a326ce8 to 0e527f5 Compare May 8, 2024 01:17

erights force-pushed the markm-async-flow branch from 2cab795 to 9af46e4 Compare May 8, 2024 23:12

erights changed the base branch from markm-patch-exo-track-2247 to master May 8, 2024 23:18

erights force-pushed the markm-async-flow branch from 596304e to 4cb7ed6 Compare May 8, 2024 23:24

mhofman requested changes May 10, 2024

View reviewed changes

erights commented May 10, 2024

View reviewed changes

packages/async-flow/src/replay-membrane.js Outdated Show resolved Hide resolved

erights mentioned this pull request May 10, 2024

refactor(async-flow): simplify with zonified vow #9336

Merged

mhofman mentioned this pull request May 14, 2024

memory friendly bijection for async-flow membrane #9365

Open

erights force-pushed the markm-async-flow branch from 7303b1a to 2ddbd78 Compare May 14, 2024 20:04

erights commented May 15, 2024

View reviewed changes

packages/async-flow/src/bijection.js Outdated Show resolved Hide resolved

mhofman reviewed May 16, 2024

View reviewed changes

packages/async-flow/src/replay-membrane.js Outdated Show resolved Hide resolved

packages/async-flow/src/replay-membrane.js Outdated Show resolved Hide resolved

packages/async-flow/src/replay-membrane.js Outdated Show resolved Hide resolved

mhofman reviewed May 16, 2024

View reviewed changes

erights force-pushed the markm-async-flow branch from 2ebefd4 to 15afe03 Compare May 16, 2024 19:21

erights commented May 16, 2024

View reviewed changes

packages/async-flow/src/bijection.js Outdated Show resolved Hide resolved

This was referenced May 16, 2024

async-flow: consider refactoring between per-flowTools operations and per-flow operations #9375

Open

async-flow: postpone wakeAll after first crank rather than first turn. #9377

Open

erights commented May 17, 2024

View reviewed changes

packages/async-flow/src/convert.js Show resolved Hide resolved

packages/async-flow/src/convert.js Show resolved Hide resolved

packages/async-flow/src/equate.js Show resolved Hide resolved

mhofman approved these changes May 18, 2024

View reviewed changes

erights mentioned this pull request May 19, 2024

async-flow: deeper wake testing #9383

Open

feat(async-flow): asyncFlow

bbb328f

erights force-pushed the markm-async-flow branch from 377cdd9 to bbb328f Compare May 19, 2024 21:35

erights added the automerge:squash Automatically squash merge label May 19, 2024

mergify bot merged commit 16095c5 into master May 19, 2024
71 checks passed

mergify bot deleted the markm-async-flow branch May 19, 2024 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(async-flow): asyncFlow #9097

feat(async-flow): asyncFlow #9097

erights commented Mar 18, 2024 •

edited

erights commented Mar 22, 2024

mhofman left a comment

erights left a comment

mhofman left a comment

mhofman left a comment

mhofman left a comment

feat(async-flow): asyncFlow #9097

feat(async-flow): asyncFlow #9097

Conversation

erights commented Mar 18, 2024 • edited

Description

Future Growth

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

Caveat: Imperfect isolation of effects following a replay failure

erights commented Mar 22, 2024

mhofman left a comment

Choose a reason for hiding this comment

erights left a comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

erights commented Mar 18, 2024 •

edited