Skip to content

Latest commit

 

History

History
319 lines (214 loc) · 10.8 KB

leapfrog.rst

File metadata and controls

319 lines (214 loc) · 10.8 KB

Rackspace Private Cloud - Leapfrog

Overview

Leapfrog

A Leapfrog upgrade is a major upgrade that skips at least one release. Currently rpc-upgrades repo supports:

Leapfrog upgrades from:

  • kilo to r14.23.0 (newton)
  • liberty to r14.23.0 (newton)
  • mitaka to r14.23.0 (newton)

Job Testing

The status of supported versions can be viewed from the periodic jobs located on the RPC Jenkins server.

Repos Used

Leapfrog no longer uses the upstream git.openstack.org repos for deployment of openstack-ansible. It now utilizes the repo hosted at:

https://github.com/rcbops/openstack-ansible

This allows us to maintain the various branches used by leapfrog for when upstream dependencies break since upstream openstack-ansible repos for the older branches are EOL and frozen.

Terms

  • RPCO: Rackspace Private Cloud powered by OpenStack
  • OSA: OpenStack Ansible
  • OSA-OPS: OpenStack Operations
  • Kilo: The RPCO release of OpenStack Kilo
  • Liberty: The RPCO release of OpenStack Liberty
  • r14.23.0: The RPCO release of OpenStack Newton.

Pre Upgrade Tasks

  • Verify that the deployment is healthy and at the latest version.
  • Perform database housekeeping to prevent unnecessary migrations.

Prestaging Apt Packages

For large environments it make be worth prestaging the apt packages that will be downloaded for infra hosts and computes ahead of time to speed up the leapfrog deployment process. This will prevent issues from slamming the mirror servers and will hopefully decrease the time of the actual maintenance since the packages may already be staged in the apt cache.

cd /opt/rpc-upgrades/playbooks
openstack-ansible preload-apt-packages.yml -e target_release=newton

This will temporarily install the apt sources for the target_release and apt download packages for infra and compute hosts. It also removes any rpco and uca repos that are currently in place as the upgrade will install those again. This can be ran in production and will not install anything, only download so it can be ran outside of a maintenance.

Executing a leapfrog upgrade

The first step is to checkout the rpc-upgrades repo.

git clone https://github.com/rcbops/rpc-upgrades.git /opt/rpc-upgrades

Two variables will need to be set in /etc/openstack_deploy/user_variables.yml before proceeding with the upgrade

lxc_container_backing_store: "dir" # 'dir' is the tested value. Other options are "lvm" and "overlayfs"
neutron_legacy_ha_tool_enabled: "yes"

These variables are required by later versions, but are not defined in Kilo or Liberty.

By default Elasticsearch data will be kept and Elasticsearch will be upgraded at the end of the leapfrog. If you'd like to reset the Elasticsearch data, you can override the upgrade and remove the container during the upgrade by setting these environment variables:

export UPGRADE_ELASTICSEARCH="no"
export CONTAINERS_TO_DESTROY='all_containers:!galera_all:!neutron_agent:!ceph_all:!rsyslog_all'

Swift is upgraded by default during the upgrade process. If you desire to skip the Swift upgrade, set the SKIP_SWIFT_UPGRADE variable and set CONTAINERS_TO_DESTROY to exclude deletion of Swift containers:

export SKIP_SWIFT_UPGRADE=yes
export CONTAINERS_TO_DESTROY='all_containers:!galera_all:!neutron_agent:!ceph_all:!rsyslog_all:!swift_all'

Note: If you are Skipping Swift and desire to upgrade ElasticSearch, you'll need to exclude both types of containers on CONTAINERS_TO_DESTROY as this will override the default that includes the elasticsearch_all group.

export CONTAINERS_TO_DESTROY='all_containers:!galera_all:!neutron_agent:!ceph_all:!rsyslog_all:!elasticsearch_all:!swift_all'

Note: Currently the rpc-upgrades repo targets r14.23.0. If you want to deploy the previous version you can:

export RPC_TARGET_CHECKOUT=r14.22.0

If you cannot locate /etc/openstack-release or it is outdated. Export the release version which upgrade from manually:

export CODE_UPGRADE_FROM='KILO/LIBERTY'

The next step is to execute the leapfrog upgrade script and follow the prompts:

cd /opt/rpc-upgrades
scripts/ubuntu14-leapfrog.sh

Structure of the leapfrog process

doc/images/leapfrog_structure_diagram.png

The RPCO leapfrog scripts are a thin wrapper around OSA-OPS leapfrog tools.

For details please refer to the scripts themselves. Paths are omitted for brevity, scripts may not be in the root of the relevant repo.

Pre Leap

This step removes modifications to RPCO Kilo that aren't compatible with RPCO Newton. Currently this only contains an Ansible 1.9 compatibility workaround.

Prep

This step executes pre-flight checks, and prompts the user for confirmation. It also ensures that the databases are backed up. Backups are stored in /openstack/backup on the physical host that houses the first galera container.

Upgrade

This step has a section for each major version between the source (Kilo) and target (Newton) versions.

Each section includes:
  • Variable & Secrets Migration (OSA)
  • Fact Cleanup
  • Hostname Compatibility checks/modifications.
  • Inventory Upgrades

Migrations

This step runs the database migrations for each major upgrade in sequence:
  • Kilo Deployments will run Liberty, Mitaka and Newton migrations
  • Liberty Deployments will run Mitaka and Newton Migrations

Re-Deploy

This step first runs the pre_redeploy script which handles RPC variable and secret migrations. Next the target version of OSA is deployed. During this stage, one of the original neutron agents containers is left running to minimise downtime.

Post Leap

This step deploys the RPC additions on top of the already deployed OSA.

Rollback and Interruption

The Leapfrog upgrade is a one way process, once it has begun there is no rollback. Once the services have been upgraded, they make changes to the virtualisation and networking layers that would be difficult and disruptive to reverse.

If a leapfrog upgrade is interrupted, it can be resumed. Each major step of the leapfrog upgrade process creates a marker file, which will be used to skip completed tasks on subsequent runs.

To resume, run ./ubuntu14-leapfrog.sh.

If a step fails information about that step will be printed along with all the remaining steps. The operator must fix the failure before re-running the leapfrog or completing the remaining steps manually.

Confirmation Prompts

The leapfrog process does not run unattended, it requires the operator to confirm at two points.

Near the start:
  • Confirm intention
  • Check source version
Before deploying the target version:
  • This gives the user the opportunity to check the integrity of migrated rpco variables and secrets before continuing.

F5 Modifications

In cases where an F5 is used to facilitate load balancing several monitors, virtual-servers and pools will need to be added or modified. While our F5 processing script will provide an actual diff on a per-environment basis, here are the high-level changes that will need to be made.

ADD monitors:
  • Add the git repo pointed at the repo server on port 9418
  • Add the repo cache pointed at the repo server on port 3142
  • Add the novnc console pointed at the console containers on port 6080
  • Add an http monitor for the horizon containers on port 80
ADD pools:
  • Add a new pool for galera on port 3306
  • Add a new pool for the git repo on port 9418
  • Add a new pool for the repo cache on port 3142
  • Add a new pool for the novnc console on port 6080
MODIFY pools:
  • Update the horizon pool for port 443
  • Update the horizon pool to forward port 80 to 443
ADD virtual-servers:
  • Add a new virtual-server for galera on port 3307
  • Add a new virtual-server for novnc on port 6080
  • Add a new virtual-server for novnc with SSL on port 6080
  • Add a new virtual-server for the git repo on port 9418
  • Add a new virtual-server for the repo cache on port 3142
MODIFY virtual-servers:
  • Update the galera virtual-server for mirroring
  • Update the horizon virtual-server for an ssl cert

Problems

Clone Failures

The leapfrog process includes many git clones from github, if these requests are rate limited, tasks can fail due to timeouts. As github is beyond our control, the only solution is to wait for rate limits to reset before retrying.

Galera

Occasionally the galera cluster may be in a non-functional state after the leapfrog. If this is the case, follow the Galera Maintenance section of the OSA operations guide.

Confirmation Prompts and the Ctrl-c warning

The confirmation prompts instruct the user not to interrupt the leapfrog process via ctrl-c. While an uninterrupted upgrade is the smoothest, the consequences of interruption are not as dire as implied. The process can be resumed by re-running the top level script, which will skip the steps that have already been completed by checking for the existence of marker files.

Testing

In the event you would like to simulate a leapfrog upgrade, follow the instructions in the testing document. Using vagrant, it will set up an AIO deployment of the desired version which can then be leapfrog upgraded. This allows you to test the scenario in the lab or development environment before actually running the upgrade on a production deployment.

Incremental Upgrades

Make sure a user_variables.yml exists in /etc/openstack_deploy/, cd into /opt/rpc-upgrades/, and run ./script/ubuntu16-newton-to-ocata.sh followed by ./script/ubuntu16-ocata-to-pike.sh, and finally ./script/ubuntu16-pike-to-queens.sh. Consider capturing the output of these scripts somewhere convenient for debugging.

Also note that you can run full, functioning upgrades to ocata and/or pike by making sure SKIP_INSTALL is set to no in the environment, like so export SKIP_INSTALL='no'.