Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local setup with extensions on Azure fails to recover if the dev shoot hosting the local seed is hibernated. #9700

Open
ashwani2k opened this issue May 2, 2024 · 1 comment
Labels
area/dev-productivity Developer productivity related (how to improve development) kind/bug Bug

Comments

@ashwani2k
Copy link
Contributor

ashwani2k commented May 2, 2024

How to categorize this issue?

/area dev-productivity
/kind bug

What happened:

Symptoms & Observation

When consuming the local setup with extensions for Azure environment, if the shoot hosting the local azure seed is hibernated, and then brought back up, the local-seed fails to recover.
This happens because during hibernation all the pods are removed including the ones from garden, registry, kyverno namespaces.

When the shoot which hosts the local seed is brought up, the gardenlet pod fails to come up as it throws ImagePullErr as it can't fetch the image from registry cache as the service is unable to serve in absence of any endpoints as there are no pods running in this namespace.
Checking the sts in registry namespace reveals that it cannot create pod as it is not able to reach the mutating webhook topology.azure.extensions.gardener.cloud as the service is not reachable.
Checking the extensions namespace, no pods are running and the replica set is failing as it is not able to reach gardener-resource-manager in garden namespace, which is down for the same reason as gardenlet with ImagePullErr as registry svc has no endpoints.

(local seed) --> (gardenlet/grm) --- > (registry) --- (topoogy.aware.extesnsion) ---> (grm)
The seed can not be reconciled due to the above deadlock.

Mitigation

This was fixed by ignoring the namespace registry for the topology.azure.extensions.gardener.cloud mutating webhook. This was done by labelling the namespace registry with values which qualify the ignore selector for the webhook.

k label ns registry gardener.cloud/role=extension

Additionally we also labelled kyverno namespace as the kyverno-svc has no endpoints for the same reasons and this stops make gardener-extensions-up to work as well.

k label ns kyverno gardener.cloud/role=extension 
Side note

This doesn't happen for dev seeds as:

  1. They are never hibernated.
  2. The registry pod runs in kube-system namespace which is ignored by the azure extensions mutating webhook for topology.

What you expected to happen:
It should be possible to recover the local seed when waking up the hosting shoot in Azure.
An easy fix is to add the labels as used above when creating kyverno and registry namespace in the local-extension based setup.

How to reproduce it (as minimally and precisely as possible):

  1. Create a local setup with extension on Azure with a local seed and local shoot running.
  2. Hibernate the dev shoot hosting the local-seed and cp of local-shoot.
  3. Wake up the dev shoot.

The local-seed will not be become ready as gardenlet and grm are down for the reasons quoted above.

Anything else we need to know?:

Environment:

  • Gardener version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:
@gardener-prow gardener-prow bot added area/dev-productivity Developer productivity related (how to improve development) kind/bug Bug labels May 2, 2024
@shafeeqes
Copy link
Contributor

An easy fix is to add the labels as used above when creating kyverno and registry namespace in the local-extension based setup.

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dev-productivity Developer productivity related (how to improve development) kind/bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants