-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(swingset): allow slow termination of vats #9227
base: warner/8980-boyd-scheduler
Are you sure you want to change the base?
feat(swingset): allow slow termination of vats #9227
Conversation
7841696
to
55452ec
Compare
Deploying agoric-sdk with Cloudflare Pages
|
5d23745
to
b3beede
Compare
b3beede
to
701a1a2
Compare
62aa511
to
0fe9f39
Compare
701a1a2
to
c3299e5
Compare
0fe9f39
to
967e458
Compare
Note: in addition to having the kernel spread c-list deletion processing over time (to spread out the GC consequences in other vats), I had to change the swing-store to let the kernel spread transcript/snapshot deletion over time (to limit the size of the DB txn). The swingstore work is in the first commit of this PR, the kernel side is in the second. The swingstore needs to maintain the invariant that exports and imports still work. I arranged it so that transcript spans are deleted starting at the highest startPos ( The snapshots are still deleted oldest-first ( The resulting data-deletion and export-size profiles, starting from the block where the vat is terminated, will look like:
|
Both `snapStore.deleteVatSnapshots()` and `transcriptStore.deleteVatTranscripts()` now take a numeric `budget=` argument, which will limit the number of snapshots or transcript spans deleted in each call. Both return a `{ done, cleanups }` record so the caller knows when to stop calling. This enables the slow deletion of large vats (lots of transcript spans or snapshots), a small number of items at a time. Recommended budget is 5, which (given SwingSet's `snapInterval=200` default) will cause the deletion of 1000 rows from the `transcriptItems` table each call, which shouldn't take more than 100ms. Without this, the kernel's attempt to slowly delete a terminated vat would succeed in slowly draining the kvStore, but would trigger a gigantic SQL transaction at the end, as it deleted every transcript item in the vat's history. The worst-case example I found would be the mainnet chain's v43-walletFactory, which (as of apr-2024) has 8.2M transcript items in 40k spans. A fast machine takes two seconds just to count all the items, and deletion took 22 *minutes*, with a `swingstore.wal` file that peaked at 27 GiB. This would cause an enormous chain stall at some surprising point in time weeks or months after the vat was first terminated. In addition, both the transcript spans and the snapshot records are shadowed into IAVL (via `export-data`) for integrity, and deleting 40k+40k=80k IAVL records in a single block might cause some significant churn too. refs #8928
This introduces new `runPolicy()` controls which enable "slow termination" of vats. When configured, terminated vats are immediately dead (all promises are rejected, all new messages go splat, they never run again), however the vat's state is deleted slowly, one piece at a time. This makes it safe to terminate large vats, with a long history, lots of c-list imports/exports, or large vatstore tables, without fear of causing an overload (by e.g. dropping 100k references all in a single crank). See docs/run-policy.md for details and configuration instructions. refs #8928
c3299e5
to
a31549a
Compare
967e458
to
402811a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review of the first commit introducing budgeted deletion.
I think we should have a first commit changing the semantics of termination to set inUse
/isCurrent
to null
for the active snapshot/span, and assert in the deletion function that there is no active snapshot/span before proceeding. Then a second commit can introduce an optional budgeted deletion, which I believe it should do in a consistent order (either old to new, or opposite, but not mix and match).
It would also avoid unnecessarily exporting snapshot/transcript span artifacts while their slow deletion is in progress (since the kv entries are processed first).
// Unlike transcripts, here we delete the oldest snapshots first, | ||
// to simplify the logic: we delete the only inUse=1 snapshot | ||
// last, and then immediately delete the .current record, at which | ||
// point we're done. This has a side-effect of keeping the unused | ||
// snapshot in the export artifacts longer, but it doesn't seem | ||
// worth fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find a little weird to do things in a different order. It also causes the metadata entries between transcript and snapshot to be inconsistent between each other.
* | ||
* @param {string} vatID | ||
* @param {number} budget | ||
* @returns {{ done: boolean, cleanups: number }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of interface really feels like a generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but I didn't find a way to take advantage of that fact.
An actual function*
generator wouldn't work, of course, because the process can be killed and a new process started while the deletion is going on, and a real generator would lose state when the application is rebooted.
And, I think changing the function signature to match that of a normal function*
generator is only an improvement if the caller gets to use for..of syntax, but as long as the snapStore function is doing internal iteration (deleting more than one thing per call), the vatKeeper.js deleteSnapshotsAndTranscripts()
caller is only going to call it once per block (per terminated vat), so there's no good place for a for..of
loop. (the real loop is higher up, with one iteration per block).
To get one, we'd need to change snapStore's deleteSomeVatSnapshots
into maybeDeleteOneVatSnapshot
, to delete at most one per call, and then have vatKeeper's deleteSnapshotsAndTranscripts()
use a for..of
loop. We'd still need to return whether a cleanup was done or not, and have the caller accumulate them, so deleteSnapshotsAndTranscripts
knows when to switch from snapshots to transcripts. maybeDeleteOneVatSnapshot
would always make one DB query (with a LIMIT 1) to get which snapshot to delete, if any. Then it either returns, or does a second DB query to delete the one row, and a third to noteExport
the deletion, making the cost 3 small DB queries until all the snapshots are gone, then 1 small DB query each block until all the transcripts are gone (since we always check for remaining snapshots on the way to checking for transcript spans).
That's compared to the current cost (with a budget of 5) of one moderate-sized query every time (using LIMIT 5, returning anywhere from 0 to 5 rows), followed by 0 to 5 noteExports
, maybe followed by a single DELETE
query removing 1 to 5 rows at once.
And we'd need snapStore to expose maybeDeleteOneVatSnapshot
separately from deleteVatSnapshots
(unlimited), so the latter could to queries without LIMIT
constraints, and delete everything in one shot.
In general, it de-amortizes the DB queries, because to make use of the iterator, we have to move responsibility for doing more than one deletion (per block) up into vatKeeper, which then can't give a hint to swingstore about how many deletions are coming up, so it could query them in a batch.
// if you didn't set a budget, you won't be counting deletions | ||
return { done: true, cleanups: 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could fairly easily return deletions.length
from deleteAllVatSnapshots
to be consistent
// if we reach here, the last sqlDeleteOneVatSnapshot() in that | ||
// loop had deleted the inUse=1 snapshot and the corresponding | ||
// snapshotMetadataKey, so now it is time to delete the .current | ||
// record and inform the kernel that we're done | ||
noteExport(currentSnapshotMetadataKey({ vatID }), undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm wondering if:
- vat termination should not set the
inUse
snapshot tonull
and remove the.current
snapshot marker - assert when we call
deleteAllVatSnapshots
that there are noinUse
snapshots for the vat
// isCurrent=1 span first, which causes export to ignore the | ||
// entire vat (good, since it's deleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify? That does not sound good. An export / import in the middle of a slow prune must reconstitute the partially deleted swing-store so that the slow deletion can continue in consensus on that restored node.
I think we need to be careful differentiating items and spans metadata here.
My understanding is that the isCurrent
only impacts the artifacts yielded during export, and the completeness checks of items during import. Yielding no artifacts and skipping checks is indeed consistent and the right behavior, and since the metadata is always restored, the pruning behavior will be the same on restore.
That said, I am uneasy to rely on the deletion operation to impact the completeness checks. Imagine we switched things around and started deleting from the oldest span. The operational check would fail. I believe that vat termination should explicitly "close" the span (set isCurrent = null
), and only allow deletion of transcripts for which there is no current span. In the future this could be modified to slowly delete transcripts of old incarnations by just adding a constraint on incarnation number on the queries.
// no budget? no accounting. | ||
return { done: true, cleanups: 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we could return deletions.length
from deleteAllVatTranscripts
Hm, in general I like it, now I'm trying to walk through how that would work. Say we terminate a vat in block 1, then start deleting parts of it in block 2, and continue on through block 100. We delete the kvStore entries first (say blocks 2-40), then the snapshots (say blocks 41-50), then the transcript spans/items (say blocks 51-100). I think you're aiming to have swingstore exports immediately stop including artifacts for the terminated vat as of block 1. No transcript span artifacts, no snapshot artifacts. The exports at that point continue to have export-data for everything. We start losing export-data for kvStore entries during 2-40, but an export at block 40 still has all the snapshot export-data (hashes), plus transcript span records. Then in 41-50 we start seeing fewer and fewer snapshot export-data records, and in 51-100 we start losing transcript span records, until by block 100 we see no export-data records for anything related to the now-fully-deleted vat. We can't afford to delete all the export-data records during block 1, since they're all shadowed into IAVL, which we're protecting/rate-limiting just as much as SQLite. But we want
Ok, so clearing the For the snapStore, So.. I think it would just work? We delete the inUse/isCurrent record when the vat is terminated, and we immediately stop observing that vat's heap/transcript-span artifact names or artifacts in the export. The importer would import them if they were present, but it won't complain if they are not. Then, slowly, we delete the actual DB items, budget-limited, until they're all gone, at which point vatKeeper learns that there was nothing left to delete, and it deletes the record that says the vat was still being deleted. We still delete a |
Now let's see, should we delete the inUse/isCurrent entry, or should we set the flag to NULL and then delete the entry along with all the rest? I think it simplifies the I bet it would work to clear the flag too, but the IAVL |
That was my conclusion as well.
I was thinking of setting it to NULL.
Yes, I think that's the only change needed (and on import making sure we don't choke if there is no |
This introduces new
runPolicy()
controls which enable "slowtermination" of vats. When configured, terminated vats are immediately
dead (all promises are rejected, all new messages go splat, they never
run again), however the vat's state is deleted slowly, one piece at a
time. This makes it safe to terminate large vats, with a long history,
lots of c-list imports/exports, or large vatstore tables, without fear
of causing an overload (by e.g. dropping 100k references all in a
single crank).
See docs/run-policy.md for details and configuration instructions.
Also changes swing-store to enable budget-limited deletion of vat
transcripts and snapshots.
refs #8928