Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to new (v2) VFIO migration interface #741

Closed
wants to merge 9 commits into from

Conversation

w-henderson
Copy link
Contributor

@w-henderson w-henderson commented Jul 6, 2023

This PR will (when done):

  • Update to version of the spec as submitted to qemu-devel
  • Add new migration protocol based on VFIO migration v2, replacing the migration region with a more intuitive stream-like mechanism for transferring migration data (alongside other changes)
  • Implement the new protocol in libvfio-user
  • Update and add more tests for the new protocol
  • Update the samples for the new protocol

Closes #654

Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
Signed-off-by: William Henderson <william.henderson@nutanix.com>
@jlevon
Copy link
Collaborator

jlevon commented Jul 7, 2023

Hi William, thanks! Looks like a great start.
Let's split things up from your branch into individual PRs - this will let us review each logical change on its own. Some context here: https://movementarian.org/blog/posts/github-prs/

As for the first change here, this looks good, but could you add some context to the commit message? In particular:

  • mention where you're syncing from
  • explain that while you're removing LM in the spec, the code will remain in the library until updated to v2 by later changes
  • mention that we don't implement all of the other new changes in the spec update - but we are otherwise compatible
  • as a useful side quest, would be good to file issues on any of those changes. For example we already have a PR for multi-write, but we definitely don't have issues filed for some of the new capabilities during negotiation

include/vfio-user.h Outdated Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
ssize_t ret;

if (req->flags & VFIO_DEVICE_FEATURE_PROBE) {
msg->out.iov.iov_base = msg->in.iov.iov_base;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation doesn't make much sense to me: it has no effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the reply to equal the request, as this is what VFIO and hence vfio-user dictates for the success response - this has since been replaced by the following, as the aliasing caused a double free: 😳

msg->out.iov.iov_base = malloc(msg->in.iov.iov_len);
msg->out.iov.iov_len = msg->in.iov.iov_len;
memcpy(msg->out.iov.iov_base, msg->in.iov.iov_base,
       msg->out.iov.iov_len);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such an API would not be useful. I think you're supposed to respond with the set of flags known to the server. (And, if get, or set, is set, whether the server supports that verb for the given flag(s)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to diverge from VFIO here?

From documentation of VFIO_DEVICE_FEATURE:

Get, set, or probe feature data of the device. The feature is selected using the FEATURE_MASK portion of the flags field. Support for a feature can be probed by setting both the FEATURE_MASK and PROBE bits. A probe may optionally include the GET and/or SET bits to determine read vs write access of the feature respectively. Probing a feature will return success if the feature is supported and all of the optionally indicated GET/SET methods are supported. The format of the data portion of the structure is specific to the given feature. The data portion is not required for probing. GET and SET are mutually exclusive, except for use with PROBE.

include/vfio-user.h Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Outdated Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
test/py/libvfio_user.py Show resolved Hide resolved
docs/vfio-user.rst Show resolved Hide resolved
docs/vfio-user.rst Show resolved Hide resolved
docs/vfio-user.rst Show resolved Hide resolved
* ``VFIO_DEVICE_FEATURE_SET`` instructs the server to set the feature data to
that given in the ``data`` field of the payload.

* ``VFIO_DEVICE_FEATURE_PROBE`` instructs the server to probe for feature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probing for one or more of these three things:

  1. is the flag supported at all
  2. can I do a "get" for this flag
  3. can I do a "set" for this flag

right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is if you don't set GET/SET, it is successful if the feature is supported at all, but if you set GET/SET it is only successful if whatever commands you probed for are also supported. How would you suggest I word this more clearly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a whole section on probe, saying something like:

Probing

If just VFIO_DEVICE_FEATURE_PROBE is set, the server should return the corresponding flags that are supported.

If VFIO_DEVICE_FEATURE_PROBE is set along with VFIO_DEVICE_FEATURE_GET, VFIO_DEVICE_FEATURE_SET, or both, the server should return whether the corresponding operation(s) are supported for the given flags.

docs/vfio-user.rst Show resolved Hide resolved
indicated via the flags field; at least ``VFIO_MIGRATION_STOP_COPY`` must be
set.

There is no data field of the request message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of the spec is pretty much just lifted from VFIO, I don't believe you can call SET on this feature...?

https://elixir.bootlin.com/linux/v6.3.8/source/include/uapi/linux/vfio.h#L814

| VFIO_DEVICE_STATE_PRE_COPY_P2P | 7 | (not used in vfio-user) |
+--------------------------------+-------+---------------------------------------------------------------------+

* *data_fd* is unused in vfio-user, as the ``VFIO_USER_MIG_DATA_READ`` and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is always going to be completely meaningless, we should discuss whether we should diverge here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, the only reason why we could keep it is to slightly simplify QEMU implementation, though I'm not sure that's a strong reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we haven't discussed whether we're going to support memory mapped data transfers for migration (this FD is something we could use), @w-henderson have you given any thought to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about this - I quite like the idea of following VFIO as closely as we can so my preference is probably to leave it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support memory mapped data transfers for migration (this FD is something we could use)

What I said doens't make sense, the FD needs to be passed via ancillary data. I don't have a strong opinion about keeping it, perhaps qemu-devel may help us decide.

``data_fd`` file descriptor in ``<linux/vfio.h>``
(``struct vfio_device_feature_mig_state``) to enable all data transport to use
the single already-established UNIX socket. Hence, the migration data is
treated like a stream, so the client must continue reading until no more
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look in vfio? Is a stream "safe", or would we ever want to support sparse read/writes in some way?

I'm not clear how this would even work if we wanted to do device state dirty tracking in PRE_COPY state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in VFIO we never "seek", only read/write until no more reading/writing can be done. I'm also unsure about how we'd do device state dirty tracking in PRE_COPY state - I think the idea is we track the dirty pages in PRE_COPY so we can only send the necessary migration data once we get to STOP_COPY but I'm not too sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in VFIO we never "seek", only read/write until no more reading/writing can be done.

That's correct.

I'm not sure I understand the rest of the discussion though, especially how dirty page tracking is related to device state tracking.

I have the following question (perhaps we're talking about the same thing?), regarding PRE_COPY, from VFIO:

 * RUNNING -> PRE_COPY
 * RUNNING_P2P -> PRE_COPY_P2P
 * STOP -> STOP_COPY
 *   PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states
 *   which share a data transfer session. Moving between these states alters
 *   what is streamed in session, but does not terminate or otherwise affect
 *   the associated fd.
 *
 *   These arcs begin the process of saving the device state and will return a
 *   new data_fd. The migration driver may perform actions such as enabling
 *   dirty logging of device state when entering PRE_COPY or PER_COPY_P2P.
 *
 *   Each arc does not change the device operation, the device remains
 *   RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below
 *   in PRE_COPY_P2P -> STOP_COPY.

IIUC PRE_COPY returns an FD as this starts a new copy session, so in vfio-user this means that the client can start sending VFIO_USER_MIG_DATA_READ messages until the server returns 0 length read (equivalnt to the data_fd returning no more data in VFIO). While in PRE_COPY state, the device migth generate more device data, how does the VFIO client know this? Does reading from the data_fd return data (it would previously return EOF).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VFIO defines PRE_COPY to mean

The device is running normally but tracking internal state changes

which maybe implies that we don't actually transfer any data in that state, rather just do whatever we can in the background to prepare for transfer and track the changes to keep this up to date? I'm not sure if this makes sense though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arcs begin the process of saving the device state and will return a new data_fd.

This suggests otherwise, but I agree it's not clear at all.

Each arc does not change the device operation, the device remains RUNNING, P2P quiesced or in STOP.

Can you comment on this? Does the device stay in RUNNING state or does it transition to RUNNING | PRE_COPY?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely doesn't stay in the exact same state though.

By state do you mean RUNNING etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, RUNNING and PRE_COPY are distinct states that have the same behaviour from the client's perspective but do things differently behind the scenes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, the documentation says:

Each arc does not change the device operation, the device remains RUNNING, P2P quiesced or in STOP.

while you say:

It definitely doesn't stay in the exact same state though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused too, I had been thinking of them as distinct states but I guess it's supposed to be more like different behaviour in the same state? That sounds a bit dodgy?

Copy link
Member

@tmakatos tmakatos Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, and I was hoping you'd look at the kernel code to figure it out 😄.

@tmakatos tmakatos marked this pull request as ready for review July 7, 2023 09:32
Copy link
Member

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've skimmed through it, need to sit down and compare with VFIO v2.

Could you include in the commit message from which commit/kernel version you're basing this? If master doesn't differ too much from latest stable version, then we could probably use the latter.

docs/vfio-user.rst Show resolved Hide resolved
docs/vfio-user.rst Show resolved Hide resolved
docs/vfio-user.rst Show resolved Hide resolved
docs/vfio-user.rst Show resolved Hide resolved
| VFIO_DEVICE_STATE_PRE_COPY_P2P | 7 | (not used in vfio-user) |
+--------------------------------+-------+---------------------------------------------------------------------+

* *data_fd* is unused in vfio-user, as the ``VFIO_USER_MIG_DATA_READ`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, the only reason why we could keep it is to slightly simplify QEMU implementation, though I'm not sure that's a strong reason.

Copy link
Member

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this WIP, can you explain in the commit message how libvfio-user handles migration v2? E.g. it understands all migration-related operations but fails them, etc

include/vfio-user.h Show resolved Hide resolved
include/vfio-user.h Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
test/py/libvfio_user.py Show resolved Hide resolved
include/libvfio-user.h Show resolved Hide resolved
type->subtype = VFIO_REGION_SUBTYPE_MIGRATION;
vfio_reg->cap_offset = sizeof(struct vfio_region_info);
}

if (vfu_reg->mmap_areas != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, can you add a separate commit where you invert this check so that identation is decreased?

Copy link
Member

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also explain what this commit does?

lib/migration_priv.h Show resolved Hide resolved
include/vfio-user.h Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
lib/migration.c Show resolved Hide resolved
Copy link
Member

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the client/server sample work?

samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Outdated Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Outdated Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
Copy link
Member

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed from migration_read_data onwards, can you explain in the commit message how the code changes and why? Also, some comments in the code would also help, I know it wasn't there in v1 but it should have been, let's improve with v2.

include/libvfio-user.h Show resolved Hide resolved
@@ -267,13 +267,13 @@ enum vfio_device_mig_state {
// used for read request and write response
struct vfio_user_mig_data_without_data {
uint32_t argsz;
uint32_t size;
uint64_t size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm honest I can't remember why I changed this, should I change it back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it probably doesn't make sense to be 64-bit given that the entire message's length is 32-bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should change it back to 32-bit

lib/libvfio-user.c Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
lib/libvfio-user.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/client.c Show resolved Hide resolved
samples/server.c Show resolved Hide resolved
samples/server.c Show resolved Hide resolved
@swapnili
Copy link
Collaborator

swapnili commented Aug 2, 2023

@w-henderson Is this PR up for review or WIP I see that the build is failing? Are you going to split this into small logical PRs?

@w-henderson
Copy link
Contributor Author

@swapnili This PR has since been split into the smaller #745 #746 #747 #754 which are at various different stages of development, I'll close this one since it's no longer relevant.

@w-henderson w-henderson closed this Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adapt to new (v2) VFIO migration interface
4 participants