Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected downgrade of an application #396

Open
XtremeAI opened this issue Jun 23, 2022 · 4 comments
Open

Unexpected downgrade of an application #396

XtremeAI opened this issue Jun 23, 2022 · 4 comments

Comments

@XtremeAI
Copy link

XtremeAI commented Jun 23, 2022

Hi there,

In our environments we use Flux's image automation solution to automate delivery of the new software versions with the following setup: Gitlab container repo > Flux image repository > Flux image policy > Flux image update (image-automation-controller:v0.17.1)

Suddenly today we caught a very strange thing. An application attempted to start up for a couple of minutes from a 2 months old
image.

Image Update made 2 commits within 2-3 minutes frame. With the 1st one it replaced the current image (release-07ef7c53-1655866680) with an old image (release-3caa31f4-1651535374). With the 2nd one it put the current image back.

I checked and there were no new tags introduced to the Gitlab repo, no changes to Flux image policy or Flux image repository. What is even more weird is that that release-3caa31f4-1651535374 image tag was not even in the container repo any more. Sadly the cluster has previously cached the image and the pullPolicy is set to IfNotPresent so the pod was able to start.

Now the questions...

  1. Has anyone seen or experienced this behaviour before?
  2. Can someone give an explanation of what happened?
  3. Is there a way to find out what caused this unwanted behaviour?
  4. Is there a way to protect image automation from setting an older tag?

Many thanks

@pjbgf
Copy link
Member

pjbgf commented Jun 28, 2022

@XtremeAI the controller needs to sort the tags so it can understand what is in fact the latest.

The documentation cover this and how to ensure your image tags are sortable:
https://fluxcd.io/docs/guides/sortable-image-tags/

@XtremeAI
Copy link
Author

Hey @pjbgf ,

Yes, we have this in place and this mechanism was working pretty good for more than a year.
It was just recently that we caught this strange and unexpected incident with sudden change to an old tag which was completely outdated and even not present in the container registry.

What are the ways to debug the image controllers / set higher verbosity? (i.e. ImagerRepository shows messages like: successful scan, found 58 tags ... where can I see the list of tags it found?)

@stefanprodan
Copy link
Member

Other people had the same issue when Github Container Registry when down, in the first minutes after it recovered, it served an old cache of tags which fooled Flux into thinking these were the latest. I see no way around this as Flux can’t tell the registry API severs stale data. We may want to add some field to the ImagePolicy to tell Flux to ignore downgrades, but then if you want to rollback an app due to some bug, Flux will not let you do it…

@XtremeAI
Copy link
Author

XtremeAI commented Jun 29, 2022

Other people had the same issue when Github Container Registry when down, in the first minutes after it recovered, it served an old cache of tags which fooled Flux into thinking these were the latest. I see no way around this as Flux can’t tell the registry API severs stale data. We may want to add some field to the ImagePolicy to tell Flux to ignore downgrades, but then if you want to rollback an app due to some bug, Flux will not let you do it…

Hey @stefanprodan

That was my theory because there were issues with Cloudflare / Gitlab that day but I do not have evidences to proof that Gitlab actually provided the wrong list of tags to be able to blame Gitlab for that. Is that possible to dump responses Flux gets from Gitlab somehow?

I see value in the idea of a field to control downgrade and I'd use it for sure. Should that be needed to downgrade, it would likely be a manual change of tag anyway OR a new image tag built from an old code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants