Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error while bringing up minion for multi-master. Minion unable to successfully connect to a Salt Master. #66438

Open
3 of 9 tasks
alrf opened this issue Apr 25, 2024 · 6 comments
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@alrf
Copy link

alrf commented Apr 25, 2024

Description
Minion can't connect to Master, both are 3006.7:

Error while bringing up minion for multi-master. Is master at serverXXX.example.com responding? The error message was Unable to sign_in to master: Attempt to authenticate with the salt master failed with timeout error

Setup
Master is onedir installation (Debian).
Minion is regular installation, based on Fedora CoreOS (FCOS) - not sure if onedir can be used there.

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior
Logs:

2024-04-25 15:20:39,673 [salt.cli.daemons :284 ][INFO    ][3591] Starting up the Salt Minion
2024-04-25 15:20:39,674 [salt.utils.event :284 ][INFO    ][3591] Starting pull socket on /var/run/salt/minion/minion_event_b375127e98_pull.ipc
2024-04-25 15:20:39,928 [salt.minion      :284 ][INFO    ][3591] Creating minion process manager
2024-04-25 15:21:15,079 [salt.minion      :284 ][ERROR   ][3591] Error while bringing up minion for multi-master. Is master at serverXXX.example.com responding? The error message was Unable to sign_in to master: Attempt to authenticate with the salt master failed with timeout error
2024-04-25 15:21:39,934 [salt.minion      :284 ][ERROR   ][3591] Minion unable to successfully connect to a Salt Master.

Not a firewall/network issue, salt-master ports are available from minion:

telnet serverXXX.example.com 4505
Trying XX.XX.XX.XX...
Connected to serverXXX.example.com.
Escape character is '^]'.
�quit

telnet serverXXX.example.com 4506
Trying XX.XX.XX.XX...
Connected to serverXXX.example.com.
Escape character is '^]'.
�quit

Expected behavior
Minion should be able to connect to Master.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Master:
salt --versions-report
Salt Version:
          Salt: 3006.7

Python Version:
        Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.3
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.15.1
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: debian 11 bullseye
        locale: utf-8
       machine: x86_64
       release: 5.10.0-26-amd64
        system: Linux
       version: Debian GNU/Linux 11 bullseye



Minion:
salt-call --versions-report
/usr/lib/python3.12/site-packages/salt/ext/tornado/util.py:246: SyntaxWarning: invalid escape sequence '\d'
  """Unescape a string escaped by `re.escape`.
Salt Version:
          Salt: 3006.7

Python Version:
        Python: 3.12.2 (main, Feb 21 2024, 00:00:00) [GCC 13.2.1 20231205 (Red Hat 13.2.1-6)]

Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.8.2
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.3
       libgit2: Not Installed
  looseversion: 1.3.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.5
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 23.1
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.20.0
        pygit2: Not Installed
  python-gnupg: Not Installed
        PyYAML: 6.0.1
         PyZMQ: 25.1.0
        relenv: Not Installed
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 6.3.3
           ZMQ: 4.3.4

System Versions:
          dist: fedora 39.20240407.3.0
        locale: utf-8
       machine: x86_64
       release: 6.8.4-200.fc39.x86_64
        system: Linux
       version: Fedora Linux 39.20240407.3.0

@alrf alrf added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 25, 2024
@alrf
Copy link
Author

alrf commented Apr 29, 2024

It seems that python versions must match. I was able to get onedir installation on FCOS and Minion connected to Master.

@alrf alrf closed this as completed Apr 29, 2024
@sasidharjetb
Copy link

sasidharjetb commented May 3, 2024

i got the same issue but how can i use one dir for bootstrap i am currently using this for installation i am using ubuntu 22

curl -o bootstrap-salt.sh -L https://bootstrap.saltproject.io ;

[ERROR ][3301590] Error while bringing up minion for multi-master. Is master at salt01 responding?
2024-05-03 08:35:50,727 [salt.minion :819 ][DEBUG ][3301590] Connecting to master. Attempt 1 of 1
2024-05-03 08:35:50,727 [salt.utils.network:2314][DEBUG ][3301590] "salt01" Not an IP address? Assuming it is a hostname.
2024-05-03 08:35:50,736 [salt.minion :256 ][DEBUG ][3301590] Master URI: tcp://10.16.1.6:4506
2024-05-03 08:35:50,737 [salt.crypt :514 ][DEBUG ][3301590] Re-using AsyncAuth for ('/etc/salt/pki/minion', 'aksdevminiongcp01', 'tcp://10.16.1.6:4506')
2024-05-03 08:35:50,758 [salt.transport.zeromq:158 ][DEBUG ][3301590] Generated random reconnect delay between '1000ms' and '11000ms' (10627)
2024-05-03 08:35:50,758 [salt.transport.zeromq:165 ][DEBUG ][3301590] Setting zmq_reconnect_ivl to '10627ms'
2024-05-03 08:35:50,759 [salt.transport.zeromq:169 ][DEBUG ][3301590] Setting zmq_reconnect_ivl_max to '11000ms'
2024-05-03 08:35:50,759 [salt.crypt :208 ][DEBUG ][3301590] salt.crypt.get_rsa_key: Loading private key
2024-05-03 08:35:50,759 [salt.crypt :900 ][DEBUG ][3301590] Loaded minion key: /etc/salt/pki/minion/minion.pem
2024-05-03 08:35:50,770 [salt.utils.event :315 ][DEBUG ][3301590] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_ccc4af074d_pub.ipc
2024-05-03 08:35:50,770 [salt.utils.event :316 ][DEBUG ][3301590] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_ccc4af074d_pull.ipc
2024-05-03 08:35:50,770 [salt.transport.zeromq:212 ][DEBUG ][3301590] Connecting the Minion to the Master publish port, using the URI: tcp://10.16.1.6:4505
2024-05-03 08:35:50,771 [salt.transport.zeromq:216 ][DEBUG ][3301590] <salt.transport.zeromq.PublishClient object at 0x72cd64195c00> connecting to tcp://10.16.1.6:4505
2024-05-03 08:35:50,773 [salt.utils.event :823 ][DEBUG ][3301590] Sending event: tag = __master_connected; data = {'master': 'salt01', '_stamp': '2024-05-03T08:35:50.773481'}
2024-05-03 08:35:50,774 [salt.crypt :208 ][DEBUG ][3301590] salt.crypt.get_rsa_key: Loading private key
2024-05-03 08:35:50,774 [salt.crypt :900 ][DEBUG ][3301590] Loaded minion key: /etc/salt/pki/minion/minion.pem
2024-05-03 08:35:50,786 [salt.transport.ipc:372 ][DEBUG ][3301590] Closing IPCMessageClient instance

@alrf
Copy link
Author

alrf commented May 7, 2024

I found another issue on FCOS: SELinux.
While the Enforcing policy is set, salt-minion can't connect to a salt-master.

However, the documentation is extremely old (it contains examples for CentOS/RHEL 5 and 6):
https://docs.saltproject.io/en/latest/topics/troubleshooting/index.html#salt-and-selinux
and useless in case of FCOS

# chcon system_u:object_r:rpm_exec_t:s0 /usr/bin/salt-minion
chcon: failed to change context of '/usr/bin/salt-minion' to 'system_u:object_r:rpm_exec_t:s0': Read-only file system
# chcon system_u:object_r:rpm_exec_t:s0 /usr/bin/salt-call
chcon: failed to change context of '/usr/bin/salt-call' to 'system_u:object_r:rpm_exec_t:s0': Read-only file system

due to immutable / and read only /usr in FCOS:
https://docs.fedoraproject.org/en-US/fedora-coreos/storage/#_immutable_read_only_usr

@alrf
Copy link
Author

alrf commented May 7, 2024

SELinux denies these actions (bunch of them in the output):

# ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -ts today
time->Tue May  7 16:00:17 2024
type=AVC msg=audit(1715097617.514:1641): avc:  denied  { name_connect } for  pid=5396 comm="/usr/lib/opt/sa" dest=4506 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:salt_port_t:s0 tclass=tcp_socket permissive=0

However:

# semanage port -l | grep salt
salt_port_t                    tcp      4505, 4506

@alrf
Copy link
Author

alrf commented May 7, 2024

All described issues with Minion/Master connections and SELinux are on FCOS 39.20231101.3.0 and 39.20240210.3.0 versions.
The latest FCOS version 39.20240407.3.0 (as of today) doesn't have such problems, everything works out of the box.

But it can't be used in my case as OKD4 cluster (even latest version) is tied to a specific FCOS version (not the latest one).

@alrf
Copy link
Author

alrf commented May 8, 2024

SELinux denies these actions (bunch of them in the output):

# ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -ts today
time->Tue May  7 16:00:17 2024
type=AVC msg=audit(1715097617.514:1641): avc:  denied  { name_connect } for  pid=5396 comm="/usr/lib/opt/sa" dest=4506 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:salt_port_t:s0 tclass=tcp_socket permissive=0

However:

# semanage port -l | grep salt
salt_port_t                    tcp      4505, 4506

I managed to solve this by:

  1. # rpm-ostree install setroubleshoot - install required tools in FCOS
  2. # ausearch -m AVC | audit2allow -m salt_fix > salt_fix.te - generate an allow policy based on audit.log
  3. # more salt_fix.te - check the policy generated by audit2allow. In my case it was:
module salt_fix 1.0;

require {
	type getty_t;
	type etc_t;
	type sudo_exec_t;
	type dmidecode_exec_t;
	type var_t;
	type systemd_hwdb_t;
	type kernel_t;
	type init_t;
	type systemd_notify_t;
	type ssh_exec_t;
	type salt_port_t;
	type http_port_t;
	class capability dac_override;
	class capability2 checkpoint_restore;
	class unix_dgram_socket sendto;
	class file { append create execute execute_no_trans ioctl map open read rename unlink write };
	class tcp_socket name_connect;
}

#============= getty_t ==============
allow getty_t self:capability2 checkpoint_restore;

#============= init_t ==============
allow init_t dmidecode_exec_t:file { execute execute_no_trans open read };

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow init_t dmidecode_exec_t:file map;
allow init_t etc_t:file write;

#!!!! This avc can be allowed using the boolean 'nis_enabled'
allow init_t http_port_t:tcp_socket name_connect;
allow init_t salt_port_t:tcp_socket name_connect;
allow init_t ssh_exec_t:file execute;
allow init_t sudo_exec_t:file execute;
allow init_t var_t:file { append create ioctl open read rename unlink write };

#============= systemd_hwdb_t ==============
allow systemd_hwdb_t self:capability dac_override;

#============= systemd_notify_t ==============
allow systemd_notify_t kernel_t:unix_dgram_socket sendto;
  1. If the policy looks legit:
    # ausearch -m AVC | audit2allow -M salt_fix - create the compiled policy
  2. # semodule -i salt_fix.pp - import the policy package (.pp)
  3. # semodule -l | grep salt_fix - verify it's working

After all these manipulations, the connection between Minion and Master was established, the minion process was able to start, test.ping was successful, BUT!!!: most of the applied salt-states failed again due to SELinux - seems that on each specific state you should generate a new SELinux policy and apply it.

So, finally the problem is NOT fully solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

2 participants