Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount the stdout callback plugin for containerized runs #957

Merged
merged 14 commits into from Feb 1, 2022

Conversation

AlanCoding
Copy link
Member

This is a continuation on from #763

I really want to keep @shanemcd's simplification which eliminates the aliasing to the minimal callback name. That PR was getting hung up on the fact that CI doesn't seem to build the new image correctly. This highlights a pain point that comes from the fact that ansible-runner used from the CLI employs the callback plugin from a different install of ansible-runner, baked into the image. Here, I'm investigating an option to address this pain point where I volume mount the stdout callback plugin, so we assure that any changes to the plugin will be instantly picked up without rebuilding. This might be overly-optimistic, but I'm hoping that this will be compatible with the old image, so even tests will pass. So far I've found some race conditions locally (probably due to multiprocessing), and depending on what tests give me here, my next step might be to make a per-run copy of the plugin.

So in short, instead of backing out some of our simplifications to get that other PR to merge, here, I'm entertaining the idea of leaning into even more simplifications that might hopefully solve several problems.

@github-actions github-actions bot added docs Changes to documentation needs_triage New item that needs to be triaged labels Jan 13, 2022
self._update_volume_mount_paths(new_args, "{}".format(self.private_data_dir), dst_mount_path="/runner", labels=":Z")

# Mount the stdout callback plugin from the ansible-runner code base
self._update_volume_mount_paths(new_args, "{}".format(get_plugin_dir()), dst_mount_path="/home/runner/.ansible/plugins", labels=":Z")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest open question for other people is - where should the mount be? Options include:

  • /home/runner/.ansible/plugins (this)
  • /home/runner/.ansible/plugins/callback (changed to this)
  • /home/runner/.ansible/plugins/callback/awx_display.py (single file mount)
  • /usr/share/ansible/plugins/callback
  • /usr/share/ansible/plugins/callback/awx_display.py (single file mount)
  • /usr/share/ansible/collections/ansible_collections/<collection>/<name>/plugins/callback/awx_display.py

The current option doesn't seem to have any benefits compared to the two options after it. After all, consolidating it into a single file was so that I could do the single file mount. I haven't made this change yet because I wanted to do a bit of a survey of preferences first.

Related to this, but still separate, is the question of where it should go within the code base. In this PR, I nested it in the "callback" folder, but to be able to collection-ify this in the future, I see the wisdom in nesting it in "plugins/callback" folders

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to have ruled out the single file mount options for now. Doing this results in permission errors like:

ERROR! Unexpected Exception, this is probably a bug: [Errno 13] Permission denied: '/home/runner/.ansible/plugins/callback/awx_display.py'

Maybe this could be fixed with changes to the Dockerfile, but doing so would partially defeat the point.

With my latest push, the mount location is at /home/runner/.ansible/plugins/callback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for detailing your thought process. I like /home/runner/.ansible/plugins/callback as the result

@AlanCoding AlanCoding changed the title [WIP] Mount the stdout callback plugin for containerized runs Mount the stdout callback plugin for containerized runs Jan 14, 2022
shanemcd and others added 10 commits January 18, 2022 16:14
This patch contains actions suggested in review comments
  at ansible#763
  plus some more

In display_callback/module.py, avoid modifying os.environ

Remove parent class AWXDefaultCallbackModule, because it makes
  less since when there is not a job vs. adhoc class split

As a consequence, only 1 parent callback class is needed
  the class which is the default, in absense of awx_display

Move modifications of self.env from
  the runner module to the config module
  to be most consistent with the rest of the code base
@AlanCoding AlanCoding marked this pull request as ready for review January 19, 2022 15:26
@AlanCoding AlanCoding requested a review from a team as a code owner January 19, 2022 15:26
@AlanCoding
Copy link
Member Author

Additional questions I have for the team:

  1. Mentioned in other comment, is the plugins folder as the container mount point okay? Is /home/runner/.ansible preferable, or is /usr/share?
  2. About the host machine location, would this be a good final location: ansible_runner/display_callback/callback/awx_display.py?
  3. Assuming (2), would there be interest in trying to see how running ansible-test goes, recognizing that it wouldn't ever be vendored, just used for CI?

Hook in stdout callback volume mount

Share pdd and plugin mounts with the NONE base execution mode

Remove PYTHONPATH assertions no longer relevant

Attempt to patch up docs
Refinement of docs page for the refactored display callback
This changes the location of the mount for the stdout callback
prior value was at
/home/runner/.ansible/plugins

new mount location is at
/home/runner/.ansible/plugins/callback

This is more targeted and more desirable for this reason.
We cannot mount single-files like awx_display.py,
due to permissions error, which
  may be because changes are needed in the Dockerfile
  or some other reason

To carry this out, new utils method callback_mount is introduced
  to make such changes easier to carry out in the future
@eqrx
Copy link
Contributor

eqrx commented Jan 20, 2022

Additional questions I have for the team:

  1. Mentioned in other comment, is the plugins folder as the container mount point okay? Is /home/runner/.ansible preferable, or is /usr/share?

I personally would /home/runner/.ansible since I would argue that it is more in line with other tools I have seen but objectively I would not mind either paths.

  1. About the host machine location, would this be a good final location: ansible_runner/display_callback/callback/awx_display.py?

Yes!

  1. Assuming (2), would there be interest in trying to see how running ansible-test goes, recognizing that it wouldn't ever be vendored, just used for CI?

I would be interested, can I help?

@AlanCoding
Copy link
Member Author

@eqrx I think I messed up a copy+paste in my prior comment, I had wanted to ask if ansible_runner/display_callback/plugins/callback/awx_display.py would be a preferable location.

The logic I'm coming from there is that, in a Makefile target, I could template out a galaxy.yml file in the display_callback folder and run some ansible-test sanity checks. This isn't to package it as a collection (formally), it's just to get some more checks on our radar. I've been meaning to tinker with that (as I still don't know how close it would be to passing), but I'll give it a shot on Monday and see if has any feedback of value.

@AlanCoding
Copy link
Member Author

I started that experiment, investigating ansible-test sanity as potentially a new type of CI. I got two kinds of feedback from it.

ERROR: Output on stderr from ansible-doc is considered an error.

Command "ansible-doc -t callback ansible_runner.display_callback.awx_display" returned exit status 0.
>>> Standard Error
[WARNING]: Failed to assign id for <ast.Attribute object at 0x7faea916a920> on 
/home/alancoding/.ansible/collections/ansible_collections/ansible_runner/displa
y_callback/plugins/callback/awx_display.py, skipping

and

ERROR: Found 3 pylint issue(s) which need to be resolved:
ERROR: plugins/callback/awx_display.py:109:19: ansible-format-automatic-specification: Format string contains automatic field numbering specification
ERROR: plugins/callback/awx_display.py:231:32: ansible-format-automatic-specification: Format string contains automatic field numbering specification
ERROR: plugins/callback/awx_display.py:239:23: ansible-format-automatic-specification: Format string contains automatic field numbering specification

The pylint failure is pretty mundane and not challenging to comply with.

The ast error looks fairly concerning, but after following it, I'm less worried than I thought I would be. It caches both instances we assign Display.verbose to something. The code comment in Ansible suggests this is a fairly normal situation.

In summary, this method didn't discover a whole bunch of important and actionable points, and I'm not as motivated to pursue it anymore.

Copy link
Contributor

@Shrews Shrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me, though I can't really test how this works under AWX. As far as mount point location, I would lean toward under $HOME. Since this is continuation of work from @shanemcd, I'd like to get his approval before merging.

@Shrews Shrews removed the needs_triage New item that needs to be triaged label Jan 24, 2022
self._update_volume_mount_paths(new_args, "{}".format(self.private_data_dir), dst_mount_path="/runner", labels=":Z")

# Mount the stdout callback plugin from the ansible-runner code base
self._update_volume_mount_paths(new_args, callback_mount()[0], dst_mount_path=callback_mount()[1], labels=":Z")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might break if you install runner as root in the system site packages and then run as a non-root user, because you won't be able to relabel a file you dont own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try :O instead of :Z?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this sounds more right to me, but I haven't been able to figure out how to artificially create the error to validate it fixes it. I got it to these permissions:

bash-4.4$ ls -la /var/lib/awx/venv/awx/lib/python3.8/site-packages/ansible_runner/display_callback/callback/
total 44
drwxr-xr-x. 3 root root  4096 Jan 26 03:18 .
drwxr-xr-x. 3 root root  4096 Jan 26 03:18 ..
-rw-r--r--. 1 root root 30788 Jan 26 03:18 awx_display.py
-rw-r--r--. 1 root root     0 Jan 26 03:18 __init__.py
drwxr-xr-x. 2 root root  4096 Jan 26 03:18 __pycache__

Then as the awx user, it still works. Is there any tweak I can do which should make this break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, never mind, I was running the wrong thing. I am able to get an error with this type of setup. Right now it just gives the vague message

ERROR! Invalid callback for stdout specified: awx_display

But I'm still tinkering with the install by root user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the :0 in one environment, I get

Error: mounting overlay failed "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/ansible_runner/display_callback/callback/": chown /var/lib/awx/.local/share/containers/storage/overlay-containers/78dce4e01f5d25bbfe804246e232815dd809cb5360237b1fddf35bc49bfe2fca/userdata/overlay/568435025/upper: invalid argument

I'm not sure how big of a problem this is going to be. In another environment it works. Reading related issues, people mention that kernel version needs to be relatively recent for a metacopy setting. Depending on what we support, I could see this getting complicated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, trying on my own machine, I was able to reproduce something by doing only this:

sudo chown -R root:root ansible_runner/display_callback/

Resulting in

$ ansible-runner run demo/ -p test.yml --process-isolation
Error: lsetxattr /home/alancoding/repos/ansible-runner/ansible_runner/display_callback/callback: operation not permitted

So maybe I am figuring this out. Here, specifically, with root ownership and -rw-rw-r--. permissions, the display callback is not editable for the current user. Like you say, that shouldn't be a problem. I fear that :O will not work in all the contexts that we need it to... but it wouldn't be atypical for runner to create a new tmpdir to make something work. We could do this with a copy of the display callback. We know it should only be a few files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a replicator, I developed a patch that fixes it.

I did not use the :O, because I'm worried this won't work in all cases. Instead, I copied the files (possible if you only have read access) to a tmpdir, and mounted the tmpdir.

Tomorrow I can work on a test case for this logic.

@AlanCoding
Copy link
Member Author

I've put up a patch and a test for @shanemcd's concern about cases where ansible-runner is installed by the root user but used by a non-root user. Practically, I am kind of limited with what I can do with tests, because my actual reproducer was to do sudo chown -R root:root ansible_runner/display_callback/, which we do not want to do in tests themselves. This just gets code coverage to assure this new special case doesn't break.

@Shrews Shrews added the gate label Feb 1, 2022
Copy link
Contributor

@ansible-zuul ansible-zuul bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ansible-zuul ansible-zuul bot merged commit 8813003 into ansible:devel Feb 1, 2022
ansible-zuul bot pushed a commit that referenced this pull request Feb 4, 2022
…ner version (#988)

Add compatibility code to the entrypoint so new EEs work with old runner version

In #957 we combined 2 callback names (awx_display, minimal) into just 1, awx_display. Fundamentally, this makes it hard for the containers to work with ansible-runner versions before/after, and vice versa. Ping review by @shanemcd
In that PR, I paid attention to the ability for current ansible-runner to work with old images.
I was failing to fully appreciate the implications of old ansible-runner not working with current images.
This PR adds a shim to the entrypoint so that the image with work even if used by old versions of ansible-runner. In the PR 957 I removed the use of the LAUNCHED_BY_RUNNER environment variable. This allows using that as a poor proxy for the version of ansible-runner that is starting the container.
@rebeccahhh pointed out to me that our tests for AAP controller 4.1.x were failing due to these problems. I've re-tested those with a custom image built from this branch, quay.io/alancoding/ansible-runner:marty_youve_got_to_come_back_with_me, and this gets them to pass.
There's no reasonable way to ever get test coverage, because the whole idea is compatibility with old versions. So I am working on running make test with that custom image tagged as quay.io/ansible/ansible-runner:devel while I have the commit f1b6c3a checked out, which was the last commit before the PR landed.

Reviewed-by: Shane McDonald <me@shanemcd.com>
Reviewed-by: Rebeccah Hunter <rhunter@redhat.com>
Reviewed-by: None <None>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Changes to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants