Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

Software Updates

Yashodhan Pise edited this page May 15, 2021 · 13 revisions

Software Update

NOTE: Software update refers to just yum package update and doesn't consider or is supposed to affect any software configurations. If there are any configuration changes or schema changes it qualifies as Software Upgrade and not Software Update

Process followed during Software Update

  1. Remember current last yum txns ids (to be able to rollback later if needed)

  2. Turn on hctl maintenance mode

  3. Update provisioner

  4. If salt-master config is changed due to Provisioner update, then restart salt-master service

  5. If salt-minions configs are changed as well, then restart salt-minion service

  6. Update Cortx components

  7. Turn off hctl maintenance mode

  8. Run HA upgrade script:

    /opt/seagate/cortx/hare/libexec/build-ha-update /var/lib/hare/cluster.yaml /var/lib/hare/build-ha-args.yaml /var/lib/hare/build-ha-csm-args.yaml

In case of any error:

  1. rollback all yum updates (using yum history)
  2. If error happened during cluster maintenance enablement
    1. Call hctl to turn off maintenance mode in the background (no wait for the result since it may lead to nodes reboot)
  3. Otherwise:
    1. if configs for salt-master and/or minion are changed back due to that yum rollback - restart them
    2. turn off pacemaker maintenance mode

Software Update Using Provisioner CLI

  1. Check cluster status

    [user@host ~]# pcs status
    
  2. Take snapshot of installed rpms:

    [user@host ~]# (rpm -qa|grep cortx) |tee before_update.txt
    
  3. Set update repo

    Command:

    [user@host ~]# provisioner set_swupdate_repo --source "<URL> or <Path to ISO file>" <Release_Tag>
    

    Example:

    [user@host ~]# provisioner set_swupdate_repo --source "http://<cortx_release_repo>/releases/cortx/integration/centos-7.7.1908/last_successful/" build_2382

    Verify the repo is set correctly:

    [user@host ~]# salt-call pillar.get release:update:repos
    local:
        ----------
        Cortx-1.0.0-11-rc6-interim:
            http://<cortx_release_repo>/releases/cortx/Cortx-1.0.0-11-rc6-interim/
    
  4. Trigger update
    Command:

    [user@host ~]# provisioner sw_update 
    
  5. Take snapshot of new rpms:

    [user@host ~]# (rpm -qa|grep cortx) |tee after_update.txt
    
  6. Check cluster status

    [user@host ~]# pcs status
    
  7. Verify update:

    [user@host ~]# diff before_update.txt after_update.txt
    1a2
    > cortx-core-1.0.0-366_git65ca4ad0e_3.10.0_1062.el7.x86_64
    4c5
    < cortx-prvsnr-1.0.0-309_gitd4fabec_el7.x86_64
    ---
    > cortx-hare-1.0.0-641_git3aa5c9d.el7.x86_64
    7d7
    < cortx-s3server-1.0.0-865_git83c3bc2e_el7.x86_64
    9,10d8
    < cortx-s3iamcli-1.0.0-865_git83c3bc2e.noarch
    < cortx-hare-1.0.0-639_git3aa5c9d.el7.x86_64
    11a10,11
    > cortx-prvsnr-1.0.0-310_gitb0273ad_el7.x86_64
    > cortx-s3iamcli-1.0.0-869_git44b1198a.noarch
    15c15
    < cortx-core-1.0.0-364_git932e52fb4_3.10.0_1062.el7.x86_64
    ---
    > cortx-s3server-1.0.0-869_git44b1198a_el7.x86_64
    

Software Rollback

NOTE: Rollback is designed to be only reverting the packages changes effected by rpm update using YUM rollback capabilities.

Please find below the steps that might be used to rollback the cluster to the state before sw update was triggered.

Steps to Software Rollback

Note (All commands are to be run on primary node)

Before the update please explore last yum transaction ids for each node and remember (later it would be also possible but will require more exploration of the yum history):

salt '*' cmd.run "yum history | grep ID -A 5"

  1. Turn on maintenence mode:

    hctl node [--verbose] maintenance --all --timeout-sec=600
    
  2. Do yum rollback to stored ids

    salt <node1> cmd.run "yum history rollback -y <ID>"
    
  3. Apply salt-master possibly changed configuration

    salt-run salt.cmd state.apply components.provisioner.salt_master.config

    Check that salt-master is running and all minions are connected

    Salt-master might be restarted during the previous step, so minions need time to reconnect to check list of connected minions

    salt-run manage.up

  4. Apply salt-minions possibly changed configuration

    salt '*' state.apply components.provisioner.salt_minion.config

    salt-minions might be restarted during that, so check they are re-connected

    salt-run manage.up

  5. Turn off maintenence mode

    hctl node [--verbose] unmaintenance --all --timeout-sec=600