Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple controllers competing #467

Open
emilbillberg opened this issue Dec 8, 2022 · 3 comments
Open

Multiple controllers competing #467

emilbillberg opened this issue Dec 8, 2022 · 3 comments

Comments

@emilbillberg
Copy link

Hi,

I am running Flux with the image-automation-controller enabled in two clusters. They are both watching the same image tag pattern. I just realised that they have been affecting each other meaning that one controller rolled back to a previous version of an container image that was pushed. E.g:

Dec 6 11:21:29 2022 +0000  -> v1.2.0
Dec 6 11:20:53 2022 +0000 -> v.1.1.0
Dec 6 11:20:46 2022 +0000 -> v1.2.0

Obviously, the end result is correct but I am curious to hear if the "rollback" done at 11:20:53 is due to image-automation-controller competing with each other? Does the image-automation-controller support multiple instances watching the same image tags or is it built to only have a single image-automation-controller?
This is the first time running in to this problem, after using flux for over a year, so I would suspect some sort of race condition between the 2 controllers have happened. Unfortunately, I cannot reproduce so I suspect this is an edge case.

@kingdonb
Copy link
Member

kingdonb commented Dec 8, 2022

There are three resources as you know involved in the workflow for IAC: ImageRepository, ImageUpdateAutomation, and ImagePolicy. IAC does not "look back" or consider the value that it is overwriting and whether it is higher or lower than the currently "latest" image known to ImageRepository.

#396

This is a relevant issue with some discussion around a similar idea.

Top of mind, I would not recommend running Image Automation on more than one cluster targeting the same paths/repos as they will compete with each other, and there's no way to address that currently except to only run it one place.

Here's how it can happen:

If an Image gets published, and Cluster 1 reconciles ImageRepository, that updates ImagePolicy, which passes the news to ImageUpdateAutomation, which reconciles the git repo, finds a diff, then commits and pushes a change.

Then, on Cluster 2... ImageRepository and ImageUpdateAutomation are both waiting for their next reconcile interval. ImageUpdateAutomation reconciles first before ImageRepository (by random chance, let's say). ImageRepository is still behind. IUA resource reconciles its git repository, finds a diff, overwrites the version with the (stale) version that ImagePolicy still understands to be the latest, because ImageRepo has not reconciled yet.

Last, ImageRepository on Cluster 2 is reconciled and catches up both clusters ImagePolicy are now current, then IUA gets notified again, and the dance stops after a third commit, "finally completing" the change in Git.

There could be a flag in the future which would enable you to "prevent downgrades" but the fact that IUA does not care what version was used before, only what version ImagePolicy says is the latest, is most likely the root cause explanation for what behavior you described seeing.

@emilbillberg
Copy link
Author

Thanks for answering! That is what I suspected had happen to us and was a concern I had initially. See discussion.

I think the "prevent downgrades" flag sounds very interesting. I prefer to always try to roll-forward instead of reverting changes so that would work well in our setup. Any plans on implementing this or is it still on the draft table?

@jabbermouth
Copy link

I'd prefer a flag which rate limits updates to the Git repo. Rollback, for us, is perfectly possible (we run tests as part of the pipeline and if they fail, we withdraw the bad image) but having the automation only update n seconds after a previous update would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants