Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix image build to override root homedir in /etc/passwd #1027

Merged
merged 5 commits into from May 5, 2022

Conversation

nitzmahone
Copy link
Member

fixes #1024

  • using only HOME envvar to override homedir causes mismatches with anything that asks for it a different way (eg, echo ~root)
  • kicking the can down the road on "why are we overriding /root as root's homedir anyway?"

@nitzmahone nitzmahone requested a review from a team as a code owner March 16, 2022 21:39
@github-actions github-actions bot added the needs_triage New item that needs to be triaged label Mar 16, 2022
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@nitzmahone
Copy link
Member Author

argh, looks like we'll also need to fix this hack: https://github.com/ansible/ansible-runner/blob/devel/utils/entrypoint.sh#L14-L15

Copy link
Member

@ssbarnea ssbarnea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a serious bug but at the same time, I doubt that changing room home directory to make it use another folder is ever a good idea.

I do see two paths here: switch to non root user or go for using /root.

@AlanCoding
Copy link
Member

I'd like @shanemcd to be looped into this

@nitzmahone
Copy link
Member Author

So would I- I'd love to know the history on this, since the commit history on the container definition isn't telling me much other than "we've dorked around with this an awful lot".

@ssbarnea
Copy link
Member

I am curious about the history too but not for blaming reasons.

To be honest I would prefer to go for a standard setup:

either:

a) root user with its default home directory
b) non root user with its home directory in either /home/username or /username - and ensuring that HOME == tilde

Black magic is never a good idea, can hit you back when you expect the least.

@AlanCoding
Copy link
Member

My own understanding is that there are two requirements which are challenging to satisfy simultaneous (1) rootless podman usage and (2) ability to use in OCP. I would like to note that integration tests are already failing. I would like to have more integration testing with more of the platform, as this is the type of change where we would like to test as much as possible in advance.

@shanemcd
Copy link
Member

shanemcd commented Mar 17, 2022

The primary point of complexity here is supporting arbitrary UIDs. We need to support this for 2 main use cases:

  • By default, the 'restricted' SCC in OCP will make your pod run with a uid like 1000680000.
  • When ansible-runner is used with docker, we pass --user so that files written inside of the EE are visible to the host user.

In response to the changes made so far in this PR:

If we remove ENV HOME from the Dockerfile and run as some random uid, HOME will be set to /:

$ docker run -ti --user 1000680000 quay.io/shanemcd/ansible-runner:homedir_fix bash
bash-4.4$ echo $HOME 
/

This causes container group jobs to fail with an error like:

Traceback (most recent call last):
  File "/usr/local/bin/ansible-playbook", line 65, in <module>
    import ansible.constants as C
    import ansible.constants as C
  File "/usr/local/lib/python3.8/site-packages/ansible/constants.py", line 180, in <module>
    config = ConfigManager()
  File "/usr/local/lib/python3.8/site-packages/ansible/config/manager.py", line 304, in __init__
    self.update_config_data()
  File "/usr/local/lib/python3.8/site-packages/ansible/config/manager.py", line 604, in update_config_data
    raise AnsibleError("Invalid settings supplied for %s: %s\n" % (config, to_native(e)), orig_exc=e)
ansible.errors.AnsibleError: Invalid settings supplied for DEFAULT_LOCAL_TMP: Unable to create local directories(/.ansible/tmp): [Errno 13] Permission denied: b'/.ansible'

One alternative would be to remove ENV HOME from the Dockerfile and add some conditional logic to the entrypoint before we call exec. But this would not work if anyone ever wanted to podman exec into an EE, because those sessions do not run through the entrypoint.

As awful as it looks, I dont think there would be an issue with changing the root user's home directory as long as $HOME and /etc/passwd both match. If anyone can point out a scenario where this wouldn't work, I'm always ok with being wrong. But I think it is fairly straight forward to say HOME inside of an EE is always /home/runner.

@shanemcd
Copy link
Member

Tried that here and it seems to work: https://github.com/shanemcd/ansible-runner/pull/2/checks (The failing test was quay.io throwing a 500 error.)

@nitzmahone
Copy link
Member Author

I think this should now cover the various cases- still some cleanup to do, it's still spewing noise on startup because I suck at bash, but I think the overall approach should ensure that we've always got a workable homedir that's consistent everywhere, whether it's a dynamic uid or a real user.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@shanemcd
Copy link
Member

In addition to the noise at startup, the tests are also failing because of how we mount runner's display plugin into the container from the host. This code was also written with the assumption that HOME is always /home/runner:

container_dot_ansible = '/home/runner/.ansible'

@shanemcd
Copy link
Member

shanemcd commented Mar 18, 2022

If we want to keep trying to go down this route, I think it would work if we just changed the code to mount the callback under both /home/runner/.ansible/plugins/callback/ and /root/.ansible/plugins/callback/.

@shanemcd
Copy link
Member

Actually... we can just do this:

diff --git a/ansible_runner/utils/__init__.py b/ansible_runner/utils/__init__.py
index 214b0a7..c8869e2 100644
--- a/ansible_runner/utils/__init__.py
+++ b/ansible_runner/utils/__init__.py
@@ -68,7 +68,6 @@ def callback_mount(copy_if_needed=False):
     if copy_if_needed is set, and the install is owned by another user,
     it will copy the plugin to a tmpdir for the mount in anticipation of SELinux problems
     '''
-    container_dot_ansible = '/home/runner/.ansible'
     rel_path = ('callback', '',)
     host_path = os.path.join(get_plugin_dir(), *rel_path)
     if copy_if_needed:
@@ -78,7 +77,7 @@ def callback_mount(copy_if_needed=False):
             register_for_cleanup(tmp_path)
             host_path = os.path.join(tmp_path, 'callback')
             shutil.copytree(callback_dir, host_path)
-    container_path = os.path.join(container_dot_ansible, 'plugins', *rel_path)
+    container_path = os.path.join('/usr/share/ansible', 'plugins', *rel_path)
     return (host_path, container_path)

@shanemcd
Copy link
Member

After trying my proposed change locally, some of the tests are now intermittently failing with:

[WARNING]: Error accessing plugin paths: [Errno 13] Permission denied:
b'/usr/share/ansible/plugins/callback'
ERROR! Invalid callback for stdout specified: awx_display

This might be a race condition because we are using the :Z label:

self._update_volume_mount_paths(new_args, "{}".format(self.private_data_dir), dst_mount_path="/runner", labels=":Z")


From man podman run:

The z option tells Podman that two containers share the volume content. As a result, Podman labels the content with a shared content label. Shared volume labels allow all containers to read/write content. The Z option tells Podman to label the content with a private unshared label.

@shanemcd
Copy link
Member

shanemcd commented Mar 18, 2022

I've confirmed that the tests pass when changing :Z to :z.

So now, my current concern here is that these latest findings will not only require a change at the EE level, but would also require changes in ansible-runner itself. This would prevent users from being able to use the newer EEs with the versions of ansible-runner 2.x already out in the wild. Unless there is a valid reason about why the more surgical solution I proposed above will not work, I do not think this warrants all of the cascading work that will fall out of making this change.

@nitzmahone
Copy link
Member Author

nitzmahone commented Mar 18, 2022

I do not think this warrants all of the cascading work that will fall out of making this change

One way or another, we still have to fix the HOME/passwd mismatches by the time Ansible runs (which your change alone doesn't)- the current test suite doesn't have any tests that reflect the failure we're actually trying to fix. But yeah, I definitely want to limit the blast radius of whatever this fix ends up looking like to avoid changing the outer runner's volume mount behavior, because IIUC we don't currently have a good way to "sniff" the inner runner version or validate that the assumptions each will make are compatible (something else I have some ideas about- we have to deal with that soon).

There might be some latent weirdness in certain become cases if we just make every user's homedir /home/runner (including root), but it'd certainly be a lot less broken than it is now, and we could probably fix it in core if we hit something. So I can adjust the entrypoint changes to rewrite every user's homedir to /home/runner and bomb out/warn/fix+subshell/something if the user's primary gid isn't 0 (which is the only way we can make that work reasonably robustly, and there will always be problems with things explicitly creating things with owner-only modes).

This also works with the exec case BTW, at least if we're more aggressive about passwd rewrites in the entrypoint, because it won't let you exec under a UID that's not in /etc/passwd, so if we ensure that every user + a potential dynamic uid in run are properly reflected in /etc/passwd, we never need to set HOME at the container level (or if we do it's redundant), but by the time entrypoint has done its thing on the initial run, anything that asks for a homedir after that will always get the right answer, no matter how it asks, which solves the original problem.

@ziegenberg
Copy link

Now that https://github.com/ansible-community/toolset is deprecated, more and more people tend to turn towards https://github.com/ansible/creator-ee, which builds on top of ansible-runner. More issues related to #1024 and this PR are raised. Is there anything that can be done to move things forward?

@eqrx
Copy link
Contributor

eqrx commented Apr 25, 2022

Hey, can I help with with bringing this forward?:)

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

utils/entrypoint.sh Outdated Show resolved Hide resolved
@nitzmahone
Copy link
Member Author

@shanemcd the only additional change I can think of right now that might be needed here is to restore the runner group and try to force all users into it as a secondary group, but I'm not sure if it's necessary or not... Any other testing you want to do on other container hosts, or anything else?

* using only `HOME` envvar to override homedir causes mismatches with anything that asks for it a different way (eg, `echo ~root`)
* kicking the can down the road on "why are we overriding /root as root's homedir anyway?"
@nitzmahone nitzmahone merged commit c181daa into ansible:devel May 5, 2022
nitzmahone added a commit to nitzmahone/ansible-runner that referenced this pull request May 5, 2022
* override root homedir in /etc/passwd

* using only `HOME` envvar to override homedir causes mismatches with anything that asks for it a different way (eg, `echo ~root`)
* kicking the can down the road on "why are we overriding /root as root's homedir anyway?"

* more dynamic homedir handling

* just make everybody's home /home/runner

* restore ENV HOME override for builds

* comment wording

(cherry picked from commit c181daa)
nitzmahone added a commit to nitzmahone/ansible-runner that referenced this pull request May 5, 2022
* override root homedir in /etc/passwd

* using only `HOME` envvar to override homedir causes mismatches with anything that asks for it a different way (eg, `echo ~root`)
* kicking the can down the road on "why are we overriding /root as root's homedir anyway?"

* more dynamic homedir handling

* just make everybody's home /home/runner

* restore ENV HOME override for builds

* comment wording

(cherry picked from commit c181daa)
nitzmahone added a commit that referenced this pull request May 5, 2022
* override root homedir in /etc/passwd

* using only `HOME` envvar to override homedir causes mismatches with anything that asks for it a different way (eg, `echo ~root`)
* kicking the can down the road on "why are we overriding /root as root's homedir anyway?"

* more dynamic homedir handling

* just make everybody's home /home/runner

* restore ENV HOME override for builds

* comment wording

(cherry picked from commit c181daa)
Shrews pushed a commit that referenced this pull request May 9, 2022
* override root homedir in /etc/passwd

* using only `HOME` envvar to override homedir causes mismatches with anything that asks for it a different way (eg, `echo ~root`)
* kicking the can down the road on "why are we overriding /root as root's homedir anyway?"

* more dynamic homedir handling

* just make everybody's home /home/runner

* restore ENV HOME override for builds

* comment wording

(cherry picked from commit c181daa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ansible-runner base image improperly overrides $HOME, breaking various Ansible things
7 participants