Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reevaluate how we reconnect VMs from other Virtual Centers #553

Open
agrare opened this issue Mar 5, 2020 · 1 comment
Open

Reevaluate how we reconnect VMs from other Virtual Centers #553

agrare opened this issue Mar 5, 2020 · 1 comment
Assignees

Comments

@agrare
Copy link
Member

agrare commented Mar 5, 2020

Overview

Currently the VMware provider has the ability to "reconnect" virtual machines which have archived or orphaned records in VMDB and are registered as new VMs to a VMware provider that we are monitoring.

There are a number of scenarios where this is helpful:

  1. You "Remove From Inventory" a virtual machine accidentally
  2. You want to move a VM from one active vCenter to another
  3. You want to upgrade your vCenter by adding a completely new one and moving hosts over then deleting the old one

Registering a VM can be done by browsing the Datastore, selecting the .vmx file and "Register" the file. This will create a new VirtualMachine with a new ManagedObjectReference but it will have the same summary.config.uuid which comes from the uuid.bios property of the vmx file.

In all of these cases we would have an archived vm record in our database, and the next refresh would pick up the new VM because when saving VMs we first build an index of all vms by uid_ems. We then try to find any existing archived VMs with the same uid_ems to reconnect before we create a new VM.

Benefits

Reconnecting an existing VM record allows you to keep all of the associated records that go along with that VM, this includes events, metrics, SSA, tags for automate, links to what it was provisioned from, etc... come over with the new VM.

Problems

The problem is that this is way more expensive than simply using the equivalent VmOrTemplate.find_by(:ems_id => ems.id, :ems_ref => ems_ref) and creating a new VM if that is nil. We are doing queries on the whole VMs table (not even scoped to the EMS) every single time we do a refresh.

It is also error prone because it has to ensure that VMs aren't being stolen from other active providers so it only considers vms with a nil ems_id, most people don't delete the old EMS when they upgrade, so the vms aren't archived yet and we end up creating duplicates anyway.

This is also complicated by the fact that it is extremely common to find multiple active VMs with duplicate BIOS UUIDs on the same vCenter. Since the bios uuid is written to the VMX file if you copy the VM directory and register the VM you can create a new VM with a duplicate UUID. VMware knows this and when you register a VM it asks you if you "Moved or Copied" the VM. What they're really asking is "should I generate a new bios uuid for this VM or use the existing one". If you answer this question wrong boom duplicate UUID.

Alternatives

VMs are removed and re-registered pretty infrequently, and most of the time it is planned (e.g. the vSphere upgrade case). Our approach to reconnecting VMs happens every single refresh.

An alternative to this approach could be to replace this lookup with a helper script or other operation.

We could go ahead and use the standard mechanism to find or create records based off of the ems_id and the ems_ref but allow users to "reconnect" vms when they know that something is wrong or that they planned on migrating VMs to another provider.

We have an existing script to help people do this which finds VMs with the same uid_ems where one VM is archived and the other is active and the archived vm is older: tools/reconnect_vms.rb as a result of customers doing this upgrade dance incorrectly and creating lots of duplicates instead of reconnecting vms.

We could even bake this into the product more by allowing users to search for re-connectable VMs through the UI/API either on a specific VM or on an entire EMS.

@miq-bot
Copy link
Member

miq-bot commented Mar 6, 2023

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

@agrare agrare added pinned and removed stale labels Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants