Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: separate statuses of inbound and outbound migrations #661

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gjcolombo
Copy link
Contributor

N.B. This is a breaking change to the Propolis API that will need a corresponding Omicron change before it can be merged there.

Change the way the server reports migration statuses:

  • Instead of asking clients to identify a specific migration to query, create a migration-status endpoint that reports on all migrations currently known to the server. (The server wasn't keeping any history of prior migration attempts anyway, so specifying an ID wasn't buying very much; if the client cares about a specific ID they can compare it to the IDs the server now returns.)
  • When reporting migration status (either through the migration status endpoint or the instance monitor endpoint), report the ID and status of both the inbound migration (if there was one) and the most recent attempted outbound migration.

This should allow us to simplify a few things in sled agent. Today:

  • Sled agent has to handle mismatched migration IDs carefully to cover cases such as this one:
    • Propolis was initialized via incoming migration with ID X
    • Sled agent knows about outgoing migration with ID Y
    • Propolis reports an instance state change that still bears ID X This sort of case is easier to reason about (or at least to make fewer assumptions about) when the outbound and inbound migrations are properly distinguished.
  • If Propolis reports a status of "destroyed, migration finished," it's ambiguous whether the VM was migrated into and then stopped or was migrated out of. With these changes it's obvious that a successful migration-out occurred.

Additional fixes/improvements:

  • The pending_migration_id field in the VM controller wasn't being used for anything; now it is.
  • Add a log line to the code path that publishes a new externally-visible instance state.

Tests:

  • cargo test
  • local PHD run (migration-from-base tests are broken because of the breaking API change)
  • ran an ad hoc live migration with propolis-cli to test both that the CLI works properly and that the right messages appear in the Propolis logs

Fixes #508.

**N.B.** This is a breaking change to the Propolis API that will need a
corresponding Omicron change before it can be merged there.

Change the way the server reports migration statuses:

- Instead of asking clients to identify a specific migration to query, create
  a `migration-status` endpoint that reports on all migrations currently known
  to the server. (The server wasn't keeping any history of prior migration
  attempts anyway, so specifying an ID wasn't buying very much; if the client
  cares about a specific ID they can compare it to the IDs the server now
  returns.)
- When reporting migration status (either through the migration status endpoint
  or the instance monitor endpoint), report the ID and status of both the
  inbound migration (if there was one) and the most recent attempted outbound
  migration.

This should allow us to simplify a few things in sled agent. Today:

- Sled agent has to handle mismatched migration IDs carefully to cover cases
  such as this one:
  - Propolis was initialized via incoming migration with ID X
  - Sled agent knows about outgoing migration with ID Y
  - Propolis reports an instance state change that still bears ID X
  This sort of case is easier to reason about (or at least to make fewer
  assumptions about) when the outbound and inbound migrations are properly
  distinguished.
- If Propolis reports a status of "destroyed, migration finished," it's
  ambiguous whether the VM was migrated into and then stopped or was migrated
  out of. With these changes it's obvious that a successful migration-out
  occurred.

Additional fixes/improvements:

- The `pending_migration_id` field in the VM controller wasn't being used for
  anything; now it is.
- Add a log line to the code path that publishes a new externally-visible
  instance state.

Tests:

- cargo test
- local PHD run (migration-from-base tests are broken because of the breaking
  API change)
- ran an ad hoc live migration with propolis-cli to test both that the CLI works
  properly and that the right messages appear in the Propolis logs
@gjcolombo
Copy link
Contributor Author

This should be OK to review as-is, but I'm going to try to cook up the corresponding sled agent change before sending actual review requests here.

@gjcolombo
Copy link
Contributor Author

This should be OK to review as-is

falcon Failing after 4m — Failure!

Oops. I think this is just an out-of-date OpenAPI doc--will work on the fix.

@hawkw hawkw self-requested a review March 13, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

want migration status endpoints to report source and target migration IDs separately
1 participant