feat(swingset): allow slow termination of vats #9227

warner · 2024-04-12T05:50:55Z

This introduces new runPolicy() controls which enable "slow
termination" of vats. When configured, terminated vats are immediately
dead (all promises are rejected, all new messages go splat, they never
run again), however the vat's state is deleted slowly, one piece at a
time. This makes it safe to terminate large vats, with a long history,
lots of c-list imports/exports, or large vatstore tables, without fear
of causing an overload (by e.g. dropping 100k references all in a
single crank).

See docs/run-policy.md for details and configuration instructions.

Also changes swing-store to enable budget-limited deletion of vat
transcripts and snapshots.

refs #8928

cloudflare-pages · 2024-04-12T20:18:13Z

Deploying agoric-sdk with Cloudflare Pages

Latest commit:	`a31549a`
Status:	✅ Deploy successful!
Preview URL:	https://334422f0.agoric-sdk.pages.dev
Branch Preview URL:	https://warner-8928-terminate-vats-s.agoric-sdk.pages.dev

View logs

warner · 2024-04-15T15:51:30Z

Note: in addition to having the kernel spread c-list deletion processing over time (to spread out the GC consequences in other vats), I had to change the swing-store to let the kernel spread transcript/snapshot deletion over time (to limit the size of the DB txn). The swingstore work is in the first commit of this PR, the kernel side is in the second.

The swingstore needs to maintain the invariant that exports and imports still work. I arranged it so that transcript spans are deleted starting at the highest startPos (ORDER BY startPos DESC), so the isCurrent=1 record is the very first one deleted. And then I changed the export code to ignore any vat which is missing an isCurrent=1 record. The result is that we'll omit partially-deleted transcripts from any exports, so the import code won't ever observe a partial transcript, so its assertComplete() checks will not fail. Without that, any exports created after the first span deletion but before the final span deletion would be unimportable in mode='replay' or mode='archival'.

The snapshots are still deleted oldest-first (ORDER BY snapPos ASC), since the snapStore's assertComplete does not care about old snapshots.

The resulting data-deletion and export-size profiles, starting from the block where the vat is terminated, will look like:

when	SQL contents	export contents
vat terminated	`(vats.terminated).push(vatID)`	everything
draining kvStore	kvStore shrinks	kvStore shrinks
drained kvStore	kvStore empty	no kvStore for that vatID
draining heap snapshots	old/unpopulated snapstore rows deleted	old IAVL records removed but inUse=1 artifact remains
last heap snapshot	the only populated snapstore row deleted	the only snapshot artifact is removed
drained snapstore	no snapshot data	no snapshot artifacts or IAVL records
draining transcripts
latest span deleted	up-to-200 items deleted, one IAVL deletion	immediately stops including the whole transcript
		export is now minimum size
N-1 span deleted	200 items deleted, one IAVL deletion	no change
.. earliest span deleted	200 items deleted, final IAVL deletion	no change
deletion complete	`(vats.terminated).remove(vatID)`	IAVL shadow of `vats.terminated` changed

Both `snapStore.deleteVatSnapshots()` and `transcriptStore.deleteVatTranscripts()` now take a numeric `budget=` argument, which will limit the number of snapshots or transcript spans deleted in each call. Both return a `{ done, cleanups }` record so the caller knows when to stop calling. This enables the slow deletion of large vats (lots of transcript spans or snapshots), a small number of items at a time. Recommended budget is 5, which (given SwingSet's `snapInterval=200` default) will cause the deletion of 1000 rows from the `transcriptItems` table each call, which shouldn't take more than 100ms. Without this, the kernel's attempt to slowly delete a terminated vat would succeed in slowly draining the kvStore, but would trigger a gigantic SQL transaction at the end, as it deleted every transcript item in the vat's history. The worst-case example I found would be the mainnet chain's v43-walletFactory, which (as of apr-2024) has 8.2M transcript items in 40k spans. A fast machine takes two seconds just to count all the items, and deletion took 22 *minutes*, with a `swingstore.wal` file that peaked at 27 GiB. This would cause an enormous chain stall at some surprising point in time weeks or months after the vat was first terminated. In addition, both the transcript spans and the snapshot records are shadowed into IAVL (via `export-data`) for integrity, and deleting 40k+40k=80k IAVL records in a single block might cause some significant churn too. refs #8928

This introduces new `runPolicy()` controls which enable "slow termination" of vats. When configured, terminated vats are immediately dead (all promises are rejected, all new messages go splat, they never run again), however the vat's state is deleted slowly, one piece at a time. This makes it safe to terminate large vats, with a long history, lots of c-list imports/exports, or large vatstore tables, without fear of causing an overload (by e.g. dropping 100k references all in a single crank). See docs/run-policy.md for details and configuration instructions. refs #8928

mhofman

Preliminary review of the first commit introducing budgeted deletion.

I think we should have a first commit changing the semantics of termination to set inUse/isCurrent to null for the active snapshot/span, and assert in the deletion function that there is no active snapshot/span before proceeding. Then a second commit can introduce an optional budgeted deletion, which I believe it should do in a consistent order (either old to new, or opposite, but not mix and match).

It would also avoid unnecessarily exporting snapshot/transcript span artifacts while their slow deletion is in progress (since the kv entries are processed first).

mhofman · 2024-04-26T00:19:53Z

packages/swing-store/src/snapStore.js

+    // Unlike transcripts, here we delete the oldest snapshots first,
+    // to simplify the logic: we delete the only inUse=1 snapshot
+    // last, and then immediately delete the .current record, at which
+    // point we're done. This has a side-effect of keeping the unused
+    // snapshot in the export artifacts longer, but it doesn't seem
+    // worth fixing.


I find a little weird to do things in a different order. It also causes the metadata entries between transcript and snapshot to be inconsistent between each other.

mhofman · 2024-04-26T00:23:21Z

packages/swing-store/src/snapStore.js

   *
   * @param {string} vatID
+   * @param {number} budget
+   * @returns {{ done: boolean, cleanups: number }}


This kind of interface really feels like a generator.

It does, but I didn't find a way to take advantage of that fact.

An actual function* generator wouldn't work, of course, because the process can be killed and a new process started while the deletion is going on, and a real generator would lose state when the application is rebooted.

And, I think changing the function signature to match that of a normal function* generator is only an improvement if the caller gets to use for..of syntax, but as long as the snapStore function is doing internal iteration (deleting more than one thing per call), the vatKeeper.js deleteSnapshotsAndTranscripts() caller is only going to call it once per block (per terminated vat), so there's no good place for a for..of loop. (the real loop is higher up, with one iteration per block).

To get one, we'd need to change snapStore's deleteSomeVatSnapshots into maybeDeleteOneVatSnapshot, to delete at most one per call, and then have vatKeeper's deleteSnapshotsAndTranscripts() use a for..of loop. We'd still need to return whether a cleanup was done or not, and have the caller accumulate them, so deleteSnapshotsAndTranscripts knows when to switch from snapshots to transcripts. maybeDeleteOneVatSnapshot would always make one DB query (with a LIMIT 1) to get which snapshot to delete, if any. Then it either returns, or does a second DB query to delete the one row, and a third to noteExport the deletion, making the cost 3 small DB queries until all the snapshots are gone, then 1 small DB query each block until all the transcripts are gone (since we always check for remaining snapshots on the way to checking for transcript spans).

That's compared to the current cost (with a budget of 5) of one moderate-sized query every time (using LIMIT 5, returning anywhere from 0 to 5 rows), followed by 0 to 5 noteExports, maybe followed by a single DELETE query removing 1 to 5 rows at once.

And we'd need snapStore to expose maybeDeleteOneVatSnapshot separately from deleteVatSnapshots (unlimited), so the latter could to queries without LIMIT constraints, and delete everything in one shot.

In general, it de-amortizes the DB queries, because to make use of the iterator, we have to move responsibility for doing more than one deletion (per block) up into vatKeeper, which then can't give a hint to swingstore about how many deletions are coming up, so it could query them in a batch.

mhofman · 2024-04-26T00:26:16Z

packages/swing-store/src/snapStore.js

+      // if you didn't set a budget, you won't be counting deletions
+      return { done: true, cleanups: 0 };


we could fairly easily return deletions.length from deleteAllVatSnapshots to be consistent

mhofman · 2024-04-26T00:33:53Z

packages/swing-store/src/snapStore.js

+    // if we reach here, the last sqlDeleteOneVatSnapshot() in that
+    // loop had deleted the inUse=1 snapshot and the corresponding
+    // snapshotMetadataKey, so now it is time to delete the .current
+    // record and inform the kernel that we're done
    noteExport(currentSnapshotMetadataKey({ vatID }), undefined);


So I'm wondering if:

vat termination should not set the inUse snapshot to null and remove the .current snapshot marker

assert when we call deleteAllVatSnapshots that there are no inUse snapshots for the vat

mhofman · 2024-04-26T00:38:32Z

packages/swing-store/src/transcriptStore.js

+    // isCurrent=1 span first, which causes export to ignore the
+    // entire vat (good, since it's deleted)


Can you clarify? That does not sound good. An export / import in the middle of a slow prune must reconstitute the partially deleted swing-store so that the slow deletion can continue in consensus on that restored node.

I think we need to be careful differentiating items and spans metadata here.

My understanding is that the isCurrent only impacts the artifacts yielded during export, and the completeness checks of items during import. Yielding no artifacts and skipping checks is indeed consistent and the right behavior, and since the metadata is always restored, the pruning behavior will be the same on restore.

That said, I am uneasy to rely on the deletion operation to impact the completeness checks. Imagine we switched things around and started deleting from the oldest span. The operational check would fail. I believe that vat termination should explicitly "close" the span (set isCurrent = null), and only allow deletion of transcripts for which there is no current span. In the future this could be modified to slowly delete transcripts of old incarnations by just adding a constraint on incarnation number on the queries.

mhofman · 2024-04-26T01:15:05Z

packages/swing-store/src/transcriptStore.js

+      // no budget? no accounting.
+      return { done: true, cleanups: 0 };


Same here, we could return deletions.length from deleteAllVatTranscripts

warner · 2024-05-10T02:03:25Z

I think we should have a first commit changing the semantics of termination to set inUse/isCurrent to null for the active snapshot/span, and assert in the deletion function that there is no active snapshot/span before proceeding. Then a second commit can introduce an optional budgeted deletion, which I believe it should do in a consistent order (either old to new, or opposite, but not mix and match).

Hm, in general I like it, now I'm trying to walk through how that would work.

Say we terminate a vat in block 1, then start deleting parts of it in block 2, and continue on through block 100. We delete the kvStore entries first (say blocks 2-40), then the snapshots (say blocks 41-50), then the transcript spans/items (say blocks 51-100).

I think you're aiming to have swingstore exports immediately stop including artifacts for the terminated vat as of block 1. No transcript span artifacts, no snapshot artifacts. The exports at that point continue to have export-data for everything. We start losing export-data for kvStore entries during 2-40, but an export at block 40 still has all the snapshot export-data (hashes), plus transcript span records. Then in 41-50 we start seeing fewer and fewer snapshot export-data records, and in 51-100 we start losing transcript span records, until by block 100 we see no export-data records for anything related to the now-fully-deleted vat.

We can't afford to delete all the export-data records during block 1, since they're all shadowed into IAVL, which we're protecting/rate-limiting just as much as SQLite. But we want getArtifactNames() to not include names of artifacts that could be produced (the rows are still present), but which are being suppressed because the vat was terminated.

transcriptStore.js getArtifactNames() does that already, with the initial sqlGetCurrentSpanMetadata query (which filters on inUse=1), in all modes except debug. snapStore.js behaves the same way.

Ok, so clearing the inUse/isCurrent flag, or deleting that one row, will suffice to prune the artifacts from exports immediately. But we need to make sure the importer won't think this is a broken import (missing artifact names that the export-data says should be present).

For the snapStore, assertComplete uses sqlListPrunedCurrentSnapshots, which only pays attention to inUse=1, so it won't complain. For transcriptStore, it uses sqlGetCurrentSpanMetadata, which likewise only looks at isCurrent=1.

So.. I think it would just work? We delete the inUse/isCurrent record when the vat is terminated, and we immediately stop observing that vat's heap/transcript-span artifact names or artifacts in the export. The importer would import them if they were present, but it won't complain if they are not.

Then, slowly, we delete the actual DB items, budget-limited, until they're all gone, at which point vatKeeper learns that there was nothing left to delete, and it deletes the record that says the vat was still being deleted. We still delete a transcriptSpans row and its matching group of 200-ish transcriptItems rows as an atomic unit, so the DB remains consistent, and we remove one export-data row for each span, so we eventually clear out the IAVL data.

warner · 2024-05-10T02:05:58Z

Now let's see, should we delete the inUse/isCurrent entry, or should we set the flag to NULL and then delete the entry along with all the rest? I think it simplifies the deleteSome code if there are no special cases, so I'm inclined to delete the current entry at the time of vat termination, so the only thing left for the rate-limited API is to delete the old non-inUse/isCurrent rows.

I bet it would work to clear the flag too, but the IAVL .current export-data row would need to be deleted specially.

mhofman · 2024-05-10T05:51:43Z

So.. I think it would just work?

That was my conclusion as well.

should we delete the inUse/isCurrent entry, or should we set the flag to NULL and then delete the entry along with all the rest?

I was thinking of setting it to NULL.

I bet it would work to clear the flag too, but the IAVL .current export-data row would need to be deleted specially.

Yes, I think that's the only change needed (and on import making sure we don't choke if there is no .current)

warner added SwingSet package: SwingSet swing-store labels Apr 12, 2024

warner force-pushed the warner/8928-terminate-vats-slowly branch from 7841696 to 55452ec Compare April 12, 2024 14:02

warner force-pushed the warner/8928-terminate-vats-slowly branch from 5d23745 to b3beede Compare April 12, 2024 20:49

warner marked this pull request as ready for review April 13, 2024 04:09

warner requested a review from mhofman April 13, 2024 04:09

warner force-pushed the warner/8928-terminate-vats-slowly branch from b3beede to 701a1a2 Compare April 13, 2024 04:10

warner force-pushed the warner/8980-boyd-scheduler branch from 62aa511 to 0fe9f39 Compare April 13, 2024 04:10

warner mentioned this pull request Apr 13, 2024

terminate vats slowly #8928

Open

warner force-pushed the warner/8928-terminate-vats-slowly branch from 701a1a2 to c3299e5 Compare April 15, 2024 15:36

warner force-pushed the warner/8980-boyd-scheduler branch from 0fe9f39 to 967e458 Compare April 15, 2024 15:36

warner assigned warner and mhofman and unassigned warner Apr 16, 2024

warner added 2 commits April 23, 2024 13:33

warner force-pushed the warner/8928-terminate-vats-slowly branch from c3299e5 to a31549a Compare April 23, 2024 19:03

warner force-pushed the warner/8980-boyd-scheduler branch from 967e458 to 402811a Compare April 23, 2024 19:03

mhofman reviewed Apr 26, 2024

View reviewed changes

aj-agoric assigned warner and unassigned mhofman May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(swingset): allow slow termination of vats #9227

feat(swingset): allow slow termination of vats #9227

warner commented Apr 12, 2024

cloudflare-pages bot commented Apr 12, 2024 •

edited

warner commented Apr 15, 2024

mhofman left a comment

mhofman Apr 26, 2024

mhofman Apr 26, 2024

warner May 10, 2024

mhofman Apr 26, 2024

mhofman Apr 26, 2024

mhofman Apr 26, 2024

mhofman Apr 26, 2024

warner commented May 10, 2024

warner commented May 10, 2024

mhofman commented May 10, 2024

		// if you didn't set a budget, you won't be counting deletions
		return { done: true, cleanups: 0 };

		// isCurrent=1 span first, which causes export to ignore the
		// entire vat (good, since it's deleted)

		// no budget? no accounting.
		return { done: true, cleanups: 0 };

feat(swingset): allow slow termination of vats #9227

Are you sure you want to change the base?

feat(swingset): allow slow termination of vats #9227

Conversation

warner commented Apr 12, 2024

cloudflare-pages bot commented Apr 12, 2024 • edited

Deploying agoric-sdk with Cloudflare Pages

warner commented Apr 15, 2024

mhofman left a comment

Choose a reason for hiding this comment

mhofman Apr 26, 2024

Choose a reason for hiding this comment

mhofman Apr 26, 2024

Choose a reason for hiding this comment

warner May 10, 2024

Choose a reason for hiding this comment

mhofman Apr 26, 2024

Choose a reason for hiding this comment

mhofman Apr 26, 2024

Choose a reason for hiding this comment

mhofman Apr 26, 2024

Choose a reason for hiding this comment

mhofman Apr 26, 2024

Choose a reason for hiding this comment

warner commented May 10, 2024

warner commented May 10, 2024

mhofman commented May 10, 2024

cloudflare-pages bot commented Apr 12, 2024 •

edited