Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

Provisioner KnowledgeBase

Yashodhan Pise edited this page May 31, 2021 · 20 revisions

https://docs.google.com/document/d/1LXOYYpbFCAOMQCiWpltEqI3Zr-5OVMI-5LGgK7BqyNA/edit?ts=5d031b12#heading=h.1ui14hyuzs9n

Dos and Don'ts

Dos

  • Before you begin with provisioning ensure you have had a good review of files that are expected to be changed:
    • /opt/seagate/eos-prvsnr/pillar/components/release.sls
    • /opt/seagate/eos-prvsnr/pillar/components/cluster.sls
      These files require customization for every node for node and release specific details. Such customization cannot be skipped.

Dont's

Frequently Asked Questions

Known Issues/Workarounds

Issue. Error ModuleNotFoundError: No module named 's3iamcli' while creating iam account

[root@srvnode1 cortx-prvsnr]# s3iamcli CreateAccount -n cloud -e cloud@seagate.com
Traceback (most recent call last):
  File "/bin/s3iamcli", line 5, in <module>
    from s3iamcli.main import S3IamCli
ModuleNotFoundError: No module named 's3iamcli'

Solution

[root@srvnode1 cortx-prvsnr]# ls -l /usr/bin/python3
lrwxrwxrwx. 1 root root 9 Jun 19 00:22 /usr/bin/python3 -> python3.6
[root@srvnode1 cortx-prvsnr]# ls -l /usr/bin/python
lrwxrwxrwx. 1 root root 7 Mar 22 02:04 /usr/bin/python -> python2
[root@srvnode1 cortx-prvsnr]# ls -l /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Mar 22 02:04 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Mar 22 02:04 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7216 Apr 11  2018 /usr/bin/python2.7
lrwxrwxrwx. 1 root root     9 Jun 19 00:22 /usr/bin/python3 -> python3.6
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4m
lrwxrwxrwx. 1 root root    18 Jun 19 00:22 /usr/bin/python36 -> /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6m  

[root@srvnode1 cortx-prvsnr]# rm /usr/bin/python3
rm: remove symbolic link ‘/usr/bin/python3’? y  

[root@srvnode1 cortx-prvsnr]# ln -s /usr/bin/python3.4 /usr/bin/python3
[root@srvnode1 cortx-prvsnr]# ls -l /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Mar 22 02:04 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Mar 22 02:04 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7216 Apr 11  2018 /usr/bin/python2.7
lrwxrwxrwx. 1 root root    18 Jun 19 02:26 /usr/bin/python3 -> /usr/bin/python3.4
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4
-rwxr-xr-x. 2 root root 11384 Apr  7 22:19 /usr/bin/python3.4m
lrwxrwxrwx. 1 root root    18 Jun 19 00:22 /usr/bin/python36 -> /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11408 Apr 25 17:05 /usr/bin/python3.6m

Issue. Error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600) while CreateAccount

[root@srvnode1 .sgs3iamcli]# s3iamcli createaccount -n s3user1 -e s3user1@seagate.com
Enter Ldap User Id: sgiamadmin
Enter Ldap password:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)

Solution

  • Modify file /etc/haproxy/haproxy.cfg
#----------------------------------------------------------------------
# BackEnd roundrobin as balance algorith for s3 auth server
#----------------------------------------------------------------------
backend s3-auth
    balance static-rr                                     #Balance algorithm


    server s3authserver-instance1 0.0.0.0:9086 check ssl verify required ca-file /etc/ssl/stx-s3/s3auth/s3authserver.crt
    # server s3authserver-instance1 0.0.0.0:9085 check  # s3 auth server instance 1
    # server s3authserver-instance2 0.0.0.0:9086 check  # s3 auth server instance 2

  • Restart haproxy service
systemctl restart haprxoy

Issue. s3server is not getting online while Bootstrap.

Solution

  • set the "S3_REUSEPORT" parameter to "true" in /opt/seagate/s3/config/s3config.yaml file.]]

Remove the mismatch (extra) ones
[root@srvnode-1 ~]# rm -rf /etc/sysconfig/network-scripts/ifcfg-enp0s*

Correct the rest

[root@srvnode-1 ~]# mv /etc/sysconfig/network-scripts/ifcfg-ens36 /etc/sysconfig/network-scripts/ifcfg-ens34
[root@srvnode-1 ~]# mv /etc/sysconfig/network-scripts/ifcfg-ens37 /etc/sysconfig/network-scripts/ifcfg-ens35

Reboot node
[root@srvnode-1 ~]# shutdown -r now

Check network service

[root@srvnode-1 ~]# systemctl status network -l● network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
Active: active (running) since Thu 2019-09-26 23:27:19 IST; 2min 25s ago
Docs: man:systemd-sysv-generator(8)
Process: 913 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/network.service
└─1263 /sbin/dhclient q -lf /var/lib/dhclient/dhclient-ens33.lease -pf /var/run/dhclient-ens33.pid -H srvnode-1 ens33

Sep 26 23:27:12 srvnode-1 network[913]: Bringing up interface ens33:
Sep 26 23:27:12 srvnode-1 dhclient[1212]: DHCPREQUEST on ens33 to 255.255.255.255 port 67 (xid=0x36816470)
Sep 26 23:27:12 srvnode-1 dhclient[1212]: DHCPACK from 10.237.128.1 (xid=0x36816470)
Sep 26 23:27:14 srvnode-1 dhclient[1212]: bound to 10.237.128.210 – renewal in 564149 seconds.
Sep 26 23:27:14 srvnode-1 network[913]: Determining IP information for ens33... done.
Sep 26 23:27:14 srvnode-1 network[913]: [ OK ]
Sep 26 23:27:15 srvnode-1 network[913]: Bringing up interface mgmt0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device ens36 does not seem to be present, delaying initialization.
Sep 26 23:27:15 srvnode-1 network[913]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to start device ifcfg-ens34 for master mgmt0.
Sep 26 23:27:19 srvnode-1 network[913]: [ OK ]
Sep 26 23:27:19 srvnode-1 systemd[1]: Started LSB: Bring up/down networking.

Multiple kernels causing conflict with Mero installation

Solution

  1. Find installed kernels
    awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
  2. Check default kernel
    grub2-editenv list
  3. Set the default kernel to desired one
    grub2-set-default <number against the desired kernel>
  4. Reboot the node

Learnings

Packer

What has been learnt:

  • 'packer' can source from existent VBox machine that allows to use VBoxManage calls as a part of packer spec to adjust the machine. Thus we may perform initial hardware and software configuration in that way to provide some base level for upper stacks of env.
  • 'packer' can output to vagrant boxes which then might be utilized as source for the 'packer' builds again. It means that we can construct upper levels of the env in a iterative way.
  • Note: vagrant box as source for the packer doesn't allow to apply VBoxManage calls directly, options: prepare all common HW specific things for the base level as described before use runtime adjustment for running machines by direct VBoxManage calls if need some isolated changes per machine which are light enough (e.g. creation of medium and attaching them to some controller should work fast) use some temporary layer of active VBox machine to apply initial solution if we need some general env adjustment on an non-base level (valuable for a set of upper levels) and it is quite expensive (e.g. CPU and time bound) to apply that during the runtime
  • packer can build docker as well
  • packer operates the term "provisioner" for the configuration engines that applies changes to the environment. e.g. shell, salt, ansible and [many others|https://www.packer.io/docs/provisioners/index.html]. It is builder engine (docker, vagrant, virtualbox) agnostic. Thus we can use the same scripts for all providers

Vargant

NOTE: vagrant snapshot push/pop [--no-start] [--no-delete] may provide a good way for the fast env reset if a machine is used by a sequence of isolated tests

Overview what vagrant does when installing initial connection to VBox machine:

  • it uses ssh configuration that comes from vagrantfiles (it merges a set of them)
  • also tries to detect some things itself (like listening port, host of the machine)
  • also it considers some predefined things (like vagrant insecure key)
  • Usually base boxes come with insecure key installed so they can be easily initially accessed during "vagrant up" or e.g. when packer is asked to use an already running machine as a source
  • and it's likely that for non public boxes other more secure keys might be inserted then, also vagrant itself by default replaces it with some randomly generated key during "vagrant up" if it detects the insecure key (there is a good explanation as well) and it's a place where most of issues come into play
  • also some words regarding ssh port: by default machines are started in a private NAT network and internal service ports are forwarded to the host so e.g. sshd might be reachable using "127.0.0.1:2222". Usually vagrant/packer are ok with that but issues also take place...

Issues and solutions/workarounds:

  • packer/vagrant can't connect to machine since vagrant inserted randomly generated secure key: do not allow vagrant to insert that key and insert the one which we manage ourselves (related parameters: config.ssh.insert_key = false and config.ssh.private_key_path)
  • seems packer vagrant builder ignores ssh_private_key_file parameter (or may be I missed something) thus for cases when we set up custom secure key (instead of insecure one) in the source box we can workaround that by creation of vagrantfile template that would be packaged with the box, related parameters: output_vagrantfile and vagrantfile_template
  • packer for virtualbox-vm builder can't connect since can't get the right connection port: the reason might be in ssh_skip_nat_mapping options - if it's set to true than packer will require to set current ssh forwarded port on a localhost using ssh-port

Components

Check Logs

SSPL
$ tail -f /var/log/eos/sspl/sspl.log

S3server
$ var/log/seagate/s3/s3server.INFO

Debug Mero
http://gitlab.mero.colo.seagate.com/eos/provisioner/cortx-prvsnr/wikis/Steps-to-Debug-Mero

Note: Find m0reportbug
$ m0reportbug -b

If hctl mero bootstrap is not getting Online
$ sudo journalctl -b0

Clone this wiki locally