New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mount the stdout callback plugin for containerized runs #957
Conversation
df1bc03
to
11530a8
Compare
ansible_runner/config/_base.py
Outdated
self._update_volume_mount_paths(new_args, "{}".format(self.private_data_dir), dst_mount_path="/runner", labels=":Z") | ||
|
||
# Mount the stdout callback plugin from the ansible-runner code base | ||
self._update_volume_mount_paths(new_args, "{}".format(get_plugin_dir()), dst_mount_path="/home/runner/.ansible/plugins", labels=":Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest open question for other people is - where should the mount be? Options include:
/home/runner/.ansible/plugins
(this)/home/runner/.ansible/plugins/callback
(changed to this)/home/runner/.ansible/plugins/callback/awx_display.py
(single file mount)/usr/share/ansible/plugins/callback
/usr/share/ansible/plugins/callback/awx_display.py
(single file mount)/usr/share/ansible/collections/ansible_collections/<collection>/<name>/plugins/callback/awx_display.py
The current option doesn't seem to have any benefits compared to the two options after it. After all, consolidating it into a single file was so that I could do the single file mount. I haven't made this change yet because I wanted to do a bit of a survey of preferences first.
Related to this, but still separate, is the question of where it should go within the code base. In this PR, I nested it in the "callback" folder, but to be able to collection-ify this in the future, I see the wisdom in nesting it in "plugins/callback" folders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to have ruled out the single file mount options for now. Doing this results in permission errors like:
ERROR! Unexpected Exception, this is probably a bug: [Errno 13] Permission denied: '/home/runner/.ansible/plugins/callback/awx_display.py'
Maybe this could be fixed with changes to the Dockerfile, but doing so would partially defeat the point.
With my latest push, the mount location is at /home/runner/.ansible/plugins/callback
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for detailing your thought process. I like /home/runner/.ansible/plugins/callback
as the result
c4fd693
to
11292eb
Compare
This patch contains actions suggested in review comments at ansible#763 plus some more In display_callback/module.py, avoid modifying os.environ Remove parent class AWXDefaultCallbackModule, because it makes less since when there is not a job vs. adhoc class split As a consequence, only 1 parent callback class is needed the class which is the default, in absense of awx_display Move modifications of self.env from the runner module to the config module to be most consistent with the rest of the code base
11292eb
to
9bd9a4d
Compare
Additional questions I have for the team:
|
Hook in stdout callback volume mount Share pdd and plugin mounts with the NONE base execution mode Remove PYTHONPATH assertions no longer relevant Attempt to patch up docs Refinement of docs page for the refactored display callback
This changes the location of the mount for the stdout callback prior value was at /home/runner/.ansible/plugins new mount location is at /home/runner/.ansible/plugins/callback This is more targeted and more desirable for this reason. We cannot mount single-files like awx_display.py, due to permissions error, which may be because changes are needed in the Dockerfile or some other reason To carry this out, new utils method callback_mount is introduced to make such changes easier to carry out in the future
9bd9a4d
to
851f4d3
Compare
I personally would
Yes!
I would be interested, can I help? |
@eqrx I think I messed up a copy+paste in my prior comment, I had wanted to ask if The logic I'm coming from there is that, in a Makefile target, I could template out a |
I started that experiment, investigating
and
The pylint failure is pretty mundane and not challenging to comply with. The ast error looks fairly concerning, but after following it, I'm less worried than I thought I would be. It caches both instances we assign In summary, this method didn't discover a whole bunch of important and actionable points, and I'm not as motivated to pursue it anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me, though I can't really test how this works under AWX. As far as mount point location, I would lean toward under $HOME
. Since this is continuation of work from @shanemcd, I'd like to get his approval before merging.
ansible_runner/config/_base.py
Outdated
self._update_volume_mount_paths(new_args, "{}".format(self.private_data_dir), dst_mount_path="/runner", labels=":Z") | ||
|
||
# Mount the stdout callback plugin from the ansible-runner code base | ||
self._update_volume_mount_paths(new_args, callback_mount()[0], dst_mount_path=callback_mount()[1], labels=":Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might break if you install runner as root in the system site packages and then run as a non-root user, because you won't be able to relabel a file you dont own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try :O
instead of :Z
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this sounds more right to me, but I haven't been able to figure out how to artificially create the error to validate it fixes it. I got it to these permissions:
bash-4.4$ ls -la /var/lib/awx/venv/awx/lib/python3.8/site-packages/ansible_runner/display_callback/callback/
total 44
drwxr-xr-x. 3 root root 4096 Jan 26 03:18 .
drwxr-xr-x. 3 root root 4096 Jan 26 03:18 ..
-rw-r--r--. 1 root root 30788 Jan 26 03:18 awx_display.py
-rw-r--r--. 1 root root 0 Jan 26 03:18 __init__.py
drwxr-xr-x. 2 root root 4096 Jan 26 03:18 __pycache__
Then as the awx user, it still works. Is there any tweak I can do which should make this break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, never mind, I was running the wrong thing. I am able to get an error with this type of setup. Right now it just gives the vague message
ERROR! Invalid callback for stdout specified: awx_display
But I'm still tinkering with the install by root user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing the :0
in one environment, I get
Error: mounting overlay failed "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/ansible_runner/display_callback/callback/": chown /var/lib/awx/.local/share/containers/storage/overlay-containers/78dce4e01f5d25bbfe804246e232815dd809cb5360237b1fddf35bc49bfe2fca/userdata/overlay/568435025/upper: invalid argument
I'm not sure how big of a problem this is going to be. In another environment it works. Reading related issues, people mention that kernel version needs to be relatively recent for a metacopy setting. Depending on what we support, I could see this getting complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, trying on my own machine, I was able to reproduce something by doing only this:
sudo chown -R root:root ansible_runner/display_callback/
Resulting in
$ ansible-runner run demo/ -p test.yml --process-isolation
Error: lsetxattr /home/alancoding/repos/ansible-runner/ansible_runner/display_callback/callback: operation not permitted
So maybe I am figuring this out. Here, specifically, with root ownership and -rw-rw-r--.
permissions, the display callback is not editable for the current user. Like you say, that shouldn't be a problem. I fear that :O
will not work in all the contexts that we need it to... but it wouldn't be atypical for runner to create a new tmpdir to make something work. We could do this with a copy of the display callback. We know it should only be a few files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a replicator, I developed a patch that fixes it.
I did not use the :O
, because I'm worried this won't work in all cases. Instead, I copied the files (possible if you only have read access) to a tmpdir, and mounted the tmpdir.
Tomorrow I can work on a test case for this logic.
I've put up a patch and a test for @shanemcd's concern about cases where ansible-runner is installed by the root user but used by a non-root user. Practically, I am kind of limited with what I can do with tests, because my actual reproducer was to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ner version (#988) Add compatibility code to the entrypoint so new EEs work with old runner version In #957 we combined 2 callback names (awx_display, minimal) into just 1, awx_display. Fundamentally, this makes it hard for the containers to work with ansible-runner versions before/after, and vice versa. Ping review by @shanemcd In that PR, I paid attention to the ability for current ansible-runner to work with old images. I was failing to fully appreciate the implications of old ansible-runner not working with current images. This PR adds a shim to the entrypoint so that the image with work even if used by old versions of ansible-runner. In the PR 957 I removed the use of the LAUNCHED_BY_RUNNER environment variable. This allows using that as a poor proxy for the version of ansible-runner that is starting the container. @rebeccahhh pointed out to me that our tests for AAP controller 4.1.x were failing due to these problems. I've re-tested those with a custom image built from this branch, quay.io/alancoding/ansible-runner:marty_youve_got_to_come_back_with_me, and this gets them to pass. There's no reasonable way to ever get test coverage, because the whole idea is compatibility with old versions. So I am working on running make test with that custom image tagged as quay.io/ansible/ansible-runner:devel while I have the commit f1b6c3a checked out, which was the last commit before the PR landed. Reviewed-by: Shane McDonald <me@shanemcd.com> Reviewed-by: Rebeccah Hunter <rhunter@redhat.com> Reviewed-by: None <None>
This is a continuation on from #763
I really want to keep @shanemcd's simplification which eliminates the aliasing to the
minimal
callback name. That PR was getting hung up on the fact that CI doesn't seem to build the new image correctly. This highlights a pain point that comes from the fact thatansible-runner
used from the CLI employs the callback plugin from a different install of ansible-runner, baked into the image. Here, I'm investigating an option to address this pain point where I volume mount the stdout callback plugin, so we assure that any changes to the plugin will be instantly picked up without rebuilding. This might be overly-optimistic, but I'm hoping that this will be compatible with the old image, so even tests will pass. So far I've found some race conditions locally (probably due to multiprocessing), and depending on what tests give me here, my next step might be to make a per-run copy of the plugin.So in short, instead of backing out some of our simplifications to get that other PR to merge, here, I'm entertaining the idea of leaning into even more simplifications that might hopefully solve several problems.