Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(swingset): allow slow termination of vats #9227

Open
wants to merge 2 commits into
base: warner/8980-boyd-scheduler
Choose a base branch
from

Conversation

warner
Copy link
Member

@warner warner commented Apr 12, 2024

This introduces new runPolicy() controls which enable "slow
termination" of vats. When configured, terminated vats are immediately
dead (all promises are rejected, all new messages go splat, they never
run again), however the vat's state is deleted slowly, one piece at a
time. This makes it safe to terminate large vats, with a long history,
lots of c-list imports/exports, or large vatstore tables, without fear
of causing an overload (by e.g. dropping 100k references all in a
single crank).

See docs/run-policy.md for details and configuration instructions.

Also changes swing-store to enable budget-limited deletion of vat
transcripts and snapshots.

refs #8928

@warner warner added SwingSet package: SwingSet swing-store labels Apr 12, 2024
@warner warner force-pushed the warner/8928-terminate-vats-slowly branch from 7841696 to 55452ec Compare April 12, 2024 14:02
Copy link

cloudflare-pages bot commented Apr 12, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: a31549a
Status: ✅  Deploy successful!
Preview URL: https://334422f0.agoric-sdk.pages.dev
Branch Preview URL: https://warner-8928-terminate-vats-s.agoric-sdk.pages.dev

View logs

@warner warner force-pushed the warner/8928-terminate-vats-slowly branch from 5d23745 to b3beede Compare April 12, 2024 20:49
@warner warner marked this pull request as ready for review April 13, 2024 04:09
@warner warner requested a review from mhofman April 13, 2024 04:09
@warner warner force-pushed the warner/8928-terminate-vats-slowly branch from b3beede to 701a1a2 Compare April 13, 2024 04:10
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 62aa511 to 0fe9f39 Compare April 13, 2024 04:10
@warner warner mentioned this pull request Apr 13, 2024
@warner warner force-pushed the warner/8928-terminate-vats-slowly branch from 701a1a2 to c3299e5 Compare April 15, 2024 15:36
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 0fe9f39 to 967e458 Compare April 15, 2024 15:36
@warner
Copy link
Member Author

warner commented Apr 15, 2024

Note: in addition to having the kernel spread c-list deletion processing over time (to spread out the GC consequences in other vats), I had to change the swing-store to let the kernel spread transcript/snapshot deletion over time (to limit the size of the DB txn). The swingstore work is in the first commit of this PR, the kernel side is in the second.

The swingstore needs to maintain the invariant that exports and imports still work. I arranged it so that transcript spans are deleted starting at the highest startPos (ORDER BY startPos DESC), so the isCurrent=1 record is the very first one deleted. And then I changed the export code to ignore any vat which is missing an isCurrent=1 record. The result is that we'll omit partially-deleted transcripts from any exports, so the import code won't ever observe a partial transcript, so its assertComplete() checks will not fail. Without that, any exports created after the first span deletion but before the final span deletion would be unimportable in mode='replay' or mode='archival'.

The snapshots are still deleted oldest-first (ORDER BY snapPos ASC), since the snapStore's assertComplete does not care about old snapshots.

The resulting data-deletion and export-size profiles, starting from the block where the vat is terminated, will look like:

when SQL contents export contents
vat terminated (vats.terminated).push(vatID) everything
draining kvStore kvStore shrinks kvStore shrinks
drained kvStore kvStore empty no kvStore for that vatID
draining heap snapshots old/unpopulated snapstore rows deleted old IAVL records removed but inUse=1 artifact remains
last heap snapshot the only populated snapstore row deleted the only snapshot artifact is removed
drained snapstore no snapshot data no snapshot artifacts or IAVL records
draining transcripts
latest span deleted up-to-200 items deleted, one IAVL deletion immediately stops including the whole transcript
export is now minimum size
N-1 span deleted 200 items deleted, one IAVL deletion no change
.. earliest span deleted 200 items deleted, final IAVL deletion no change
deletion complete (vats.terminated).remove(vatID) IAVL shadow of vats.terminated changed

@warner warner assigned warner and mhofman and unassigned warner Apr 16, 2024
Both `snapStore.deleteVatSnapshots()` and
`transcriptStore.deleteVatTranscripts()` now take a numeric `budget=`
argument, which will limit the number of snapshots or transcript spans
deleted in each call. Both return a `{ done, cleanups }` record so the
caller knows when to stop calling.

This enables the slow deletion of large vats (lots of transcript spans
or snapshots), a small number of items at a time. Recommended budget
is 5, which (given SwingSet's `snapInterval=200` default) will cause
the deletion of 1000 rows from the `transcriptItems` table each call,
which shouldn't take more than 100ms.

Without this, the kernel's attempt to slowly delete a terminated vat
would succeed in slowly draining the kvStore, but would trigger a
gigantic SQL transaction at the end, as it deleted every transcript
item in the vat's history. The worst-case example I found would be the
mainnet chain's v43-walletFactory, which (as of apr-2024) has 8.2M
transcript items in 40k spans. A fast machine takes two seconds just
to count all the items, and deletion took 22 *minutes*, with a
`swingstore.wal` file that peaked at 27 GiB. This would cause an
enormous chain stall at some surprising point in time weeks or months
after the vat was first terminated. In addition, both the transcript
spans and the snapshot records are shadowed into IAVL (via
`export-data`) for integrity, and deleting 40k+40k=80k IAVL records in
a single block might cause some significant churn too.

refs #8928
This introduces new `runPolicy()` controls which enable "slow
termination" of vats. When configured, terminated vats are immediately
dead (all promises are rejected, all new messages go splat, they never
run again), however the vat's state is deleted slowly, one piece at a
time. This makes it safe to terminate large vats, with a long history,
lots of c-list imports/exports, or large vatstore tables, without fear
of causing an overload (by e.g. dropping 100k references all in a
single crank).

See docs/run-policy.md for details and configuration instructions.

refs #8928
@warner warner force-pushed the warner/8928-terminate-vats-slowly branch from c3299e5 to a31549a Compare April 23, 2024 19:03
@warner warner force-pushed the warner/8980-boyd-scheduler branch from 967e458 to 402811a Compare April 23, 2024 19:03
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review of the first commit introducing budgeted deletion.

I think we should have a first commit changing the semantics of termination to set inUse/isCurrent to null for the active snapshot/span, and assert in the deletion function that there is no active snapshot/span before proceeding. Then a second commit can introduce an optional budgeted deletion, which I believe it should do in a consistent order (either old to new, or opposite, but not mix and match).

It would also avoid unnecessarily exporting snapshot/transcript span artifacts while their slow deletion is in progress (since the kv entries are processed first).

Comment on lines +392 to +397
// Unlike transcripts, here we delete the oldest snapshots first,
// to simplify the logic: we delete the only inUse=1 snapshot
// last, and then immediately delete the .current record, at which
// point we're done. This has a side-effect of keeping the unused
// snapshot in the export artifacts longer, but it doesn't seem
// worth fixing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a little weird to do things in a different order. It also causes the metadata entries between transcript and snapshot to be inconsistent between each other.

*
* @param {string} vatID
* @param {number} budget
* @returns {{ done: boolean, cleanups: number }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of interface really feels like a generator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but I didn't find a way to take advantage of that fact.

An actual function* generator wouldn't work, of course, because the process can be killed and a new process started while the deletion is going on, and a real generator would lose state when the application is rebooted.

And, I think changing the function signature to match that of a normal function* generator is only an improvement if the caller gets to use for..of syntax, but as long as the snapStore function is doing internal iteration (deleting more than one thing per call), the vatKeeper.js deleteSnapshotsAndTranscripts() caller is only going to call it once per block (per terminated vat), so there's no good place for a for..of loop. (the real loop is higher up, with one iteration per block).

To get one, we'd need to change snapStore's deleteSomeVatSnapshots into maybeDeleteOneVatSnapshot, to delete at most one per call, and then have vatKeeper's deleteSnapshotsAndTranscripts() use a for..of loop. We'd still need to return whether a cleanup was done or not, and have the caller accumulate them, so deleteSnapshotsAndTranscripts knows when to switch from snapshots to transcripts. maybeDeleteOneVatSnapshot would always make one DB query (with a LIMIT 1) to get which snapshot to delete, if any. Then it either returns, or does a second DB query to delete the one row, and a third to noteExport the deletion, making the cost 3 small DB queries until all the snapshots are gone, then 1 small DB query each block until all the transcripts are gone (since we always check for remaining snapshots on the way to checking for transcript spans).

That's compared to the current cost (with a budget of 5) of one moderate-sized query every time (using LIMIT 5, returning anywhere from 0 to 5 rows), followed by 0 to 5 noteExports, maybe followed by a single DELETE query removing 1 to 5 rows at once.

And we'd need snapStore to expose maybeDeleteOneVatSnapshot separately from deleteVatSnapshots (unlimited), so the latter could to queries without LIMIT constraints, and delete everything in one shot.

In general, it de-amortizes the DB queries, because to make use of the iterator, we have to move responsibility for doing more than one deletion (per block) up into vatKeeper, which then can't give a hint to swingstore about how many deletions are coming up, so it could query them in a batch.

Comment on lines +452 to +453
// if you didn't set a budget, you won't be counting deletions
return { done: true, cleanups: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could fairly easily return deletions.length from deleteAllVatSnapshots to be consistent

Comment on lines +415 to 419
// if we reach here, the last sqlDeleteOneVatSnapshot() in that
// loop had deleted the inUse=1 snapshot and the corresponding
// snapshotMetadataKey, so now it is time to delete the .current
// record and inform the kernel that we're done
noteExport(currentSnapshotMetadataKey({ vatID }), undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm wondering if:

  • vat termination should not set the inUse snapshot to null and remove the .current snapshot marker
  • assert when we call deleteAllVatSnapshots that there are no inUse snapshots for the vat

Comment on lines +356 to +357
// isCurrent=1 span first, which causes export to ignore the
// entire vat (good, since it's deleted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify? That does not sound good. An export / import in the middle of a slow prune must reconstitute the partially deleted swing-store so that the slow deletion can continue in consensus on that restored node.

I think we need to be careful differentiating items and spans metadata here.

My understanding is that the isCurrent only impacts the artifacts yielded during export, and the completeness checks of items during import. Yielding no artifacts and skipping checks is indeed consistent and the right behavior, and since the metadata is always restored, the pruning behavior will be the same on restore.

That said, I am uneasy to rely on the deletion operation to impact the completeness checks. Imagine we switched things around and started deleting from the oldest span. The operational check would fail. I believe that vat termination should explicitly "close" the span (set isCurrent = null), and only allow deletion of transcripts for which there is no current span. In the future this could be modified to slowly delete transcripts of old incarnations by just adding a constraint on incarnation number on the queries.

Comment on lines +397 to +398
// no budget? no accounting.
return { done: true, cleanups: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we could return deletions.length from deleteAllVatTranscripts

@warner
Copy link
Member Author

warner commented May 10, 2024

I think we should have a first commit changing the semantics of termination to set inUse/isCurrent to null for the active snapshot/span, and assert in the deletion function that there is no active snapshot/span before proceeding. Then a second commit can introduce an optional budgeted deletion, which I believe it should do in a consistent order (either old to new, or opposite, but not mix and match).

Hm, in general I like it, now I'm trying to walk through how that would work.

Say we terminate a vat in block 1, then start deleting parts of it in block 2, and continue on through block 100. We delete the kvStore entries first (say blocks 2-40), then the snapshots (say blocks 41-50), then the transcript spans/items (say blocks 51-100).

I think you're aiming to have swingstore exports immediately stop including artifacts for the terminated vat as of block 1. No transcript span artifacts, no snapshot artifacts. The exports at that point continue to have export-data for everything. We start losing export-data for kvStore entries during 2-40, but an export at block 40 still has all the snapshot export-data (hashes), plus transcript span records. Then in 41-50 we start seeing fewer and fewer snapshot export-data records, and in 51-100 we start losing transcript span records, until by block 100 we see no export-data records for anything related to the now-fully-deleted vat.

We can't afford to delete all the export-data records during block 1, since they're all shadowed into IAVL, which we're protecting/rate-limiting just as much as SQLite. But we want getArtifactNames() to not include names of artifacts that could be produced (the rows are still present), but which are being suppressed because the vat was terminated.

transcriptStore.js getArtifactNames() does that already, with the initial sqlGetCurrentSpanMetadata query (which filters on inUse=1), in all modes except debug. snapStore.js behaves the same way.

Ok, so clearing the inUse/isCurrent flag, or deleting that one row, will suffice to prune the artifacts from exports immediately. But we need to make sure the importer won't think this is a broken import (missing artifact names that the export-data says should be present).

For the snapStore, assertComplete uses sqlListPrunedCurrentSnapshots, which only pays attention to inUse=1, so it won't complain. For transcriptStore, it uses sqlGetCurrentSpanMetadata, which likewise only looks at isCurrent=1.

So.. I think it would just work? We delete the inUse/isCurrent record when the vat is terminated, and we immediately stop observing that vat's heap/transcript-span artifact names or artifacts in the export. The importer would import them if they were present, but it won't complain if they are not.

Then, slowly, we delete the actual DB items, budget-limited, until they're all gone, at which point vatKeeper learns that there was nothing left to delete, and it deletes the record that says the vat was still being deleted. We still delete a transcriptSpans row and its matching group of 200-ish transcriptItems rows as an atomic unit, so the DB remains consistent, and we remove one export-data row for each span, so we eventually clear out the IAVL data.

@warner
Copy link
Member Author

warner commented May 10, 2024

Now let's see, should we delete the inUse/isCurrent entry, or should we set the flag to NULL and then delete the entry along with all the rest? I think it simplifies the deleteSome code if there are no special cases, so I'm inclined to delete the current entry at the time of vat termination, so the only thing left for the rate-limited API is to delete the old non-inUse/isCurrent rows.

I bet it would work to clear the flag too, but the IAVL .current export-data row would need to be deleted specially.

@mhofman
Copy link
Member

mhofman commented May 10, 2024

So.. I think it would just work?

That was my conclusion as well.

should we delete the inUse/isCurrent entry, or should we set the flag to NULL and then delete the entry along with all the rest?

I was thinking of setting it to NULL.

I bet it would work to clear the flag too, but the IAVL .current export-data row would need to be deleted specially.

Yes, I think that's the only change needed (and on import making sure we don't choke if there is no .current)

@aj-agoric aj-agoric assigned warner and unassigned mhofman May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
swing-store SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants